Install Apache Pig in Ubuntu4 min read

Introduction

Apache Pig is a platform or a tool that is developed originally at Facebook and is used to perform MapReduce tasks on huge datasets. It is basically used to carry out the operations on top of Hadoop. Executing Pig commands for MapReduce tasks is fairly easy to perform and easy to understand for the people having difficulty with Java coding for MapReduce. In this article, we will be looking at how to install Apache Pig in Ubuntu.

Pre-requisites

The only prerequisites for installing Apache Pig are: you should have Java and Hadoop installed. Follow our guide: How to install Hadoop in Ubuntu to install Java and Hadoop.

Now, let us proceed with the installation.

Install Apache Pig in Ubuntu

1. Download Apache Pig

Create a new directory and download the apache pig tar.gz file in it with the below commands,

mkdir pig
cd pig/
wget https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz
Downloading Apache Pig
Downloading Apache Pig

If you want to download a different version of Apache Pig, you can find it here: Apache Pig Releases.

2. Extract the Apache Pig tar file

Extract the tar.gz file that you downloaded in the first step. Execute the below command,

tar -xvf pig-0.17.0.tar.gz

3. Set the Environment Variables

Now we need to set the Environment Variables of Apache Pig. This process is done so that Pig can be accessed from any directory. We need to edit the .bashrc file to set the variables. Execute the below command to edit the .bashrc file,

cd
nano .bashrc

Add the below lines in the file

#JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#Apache Pig Environment Variables
export PIG_HOME=/home/hiberstack/pig/pig-0.17.0
export PATH=$PATH:/home/hiberstack/pig/pig-0.17.0/bin
export PIG_CLASSPATH=$HADOOP_HOME/conf
Apache Pig Environment Variables
Apache Pig Environment Variables

Save and exit the file by pressing “Ctrl X” followed by “Y” and “Enter” keys. Execute the below command to persist the changes in the .bashrc file.

source .bashrc

4. Pig Version

We have successfully installed Apache Pig in Ubuntu. Now check the Pig version to verify the installation. Execute the below command

pig -version
pig -version

5. Start Apache Pig

When we start Apache Pig, it opens a grunt shell. We can start Apache Pig in 2 execution modes as below

(a) Local Mode: In Local mode, the execution of Pig commands will be performed on the local file system. Files will be read and written from and into the local file system only rather than HDFS. We can start Pig in Local Mode with the below command

pig -x local
Apache Pig Local Mode
Apache Pig Local Mode

Execute the quit command in the grunt shell to come out of it.

quitting grunt shell

(b) MapReduce Mode: In this mode, the Pig commands will be executed on the files present on HDFS. The file will be read from and written into HDFS. This is the default mode of Pig. We can start Pig in MapReduce Mode with the below commands

pig

OR

pig -x mapreduce
Apache Pig in MapReduce/default Mode
Apache Pig in MapReduce/default Mode

Share:

Leave a Reply