Table of Contents
Apache Pig is a platform or a tool that is developed originally at Facebook and is used to perform MapReduce tasks on huge datasets. It is basically used to carry out the operations on top of Hadoop. Executing Pig commands for MapReduce tasks is fairly easy to perform and easy to understand for the people having difficulty with Java coding for MapReduce. In this article, we will be looking at how to install Apache Pig in Ubuntu.
The only prerequisites for installing Apache Pig are: you should have Java and Hadoop installed. Follow our guide: How to install Hadoop in Ubuntu to install Java and Hadoop.
Now, let us proceed with the installation.
Install Apache Pig in Ubuntu
1. Download Apache Pig
Create a new directory and download the apache pig tar.gz file in it with the below commands,
mkdir pig cd pig/ wget https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz
If you want to download a different version of Apache Pig, you can find it here: Apache Pig Releases.
2. Extract the Apache Pig tar file
Extract the tar.gz file that you downloaded in the first step. Execute the below command,
tar -xvf pig-0.17.0.tar.gz
3. Set the Environment Variables
Now we need to set the Environment Variables of Apache Pig. This process is done so that Pig can be accessed from any directory. We need to edit the .bashrc file to set the variables. Execute the below command to edit the .bashrc file,
cd nano .bashrc
Add the below lines in the file
#JAVA_HOME export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #Apache Pig Environment Variables export PIG_HOME=/home/hiberstack/pig/pig-0.17.0 export PATH=$PATH:/home/hiberstack/pig/pig-0.17.0/bin export PIG_CLASSPATH=$HADOOP_HOME/conf
Save and exit the file by pressing “Ctrl X” followed by “Y” and “Enter” keys. Execute the below command to persist the changes in the .bashrc file.
4. Pig Version
We have successfully installed Apache Pig in Ubuntu. Now check the Pig version to verify the installation. Execute the below command
5. Start Apache Pig
When we start Apache Pig, it opens a grunt shell. We can start Apache Pig in 2 execution modes as below
(a) Local Mode: In Local mode, the execution of Pig commands will be performed on the local file system. Files will be read and written from and into the local file system only rather than HDFS. We can start Pig in Local Mode with the below command
pig -x local
Execute the quit command in the grunt shell to come out of it.
(b) MapReduce Mode: In this mode, the Pig commands will be executed on the files present on HDFS. The file will be read from and written into HDFS. This is the default mode of Pig. We can start Pig in MapReduce Mode with the below commands
pig -x mapreduce