Hadoop Commands with Examples10 min read

Apache Hadoop is an open-source project that provides a new method for storing and processing large amounts of data. The Java-based software architecture is designed for distributed storage and processing of very large data sets on commodity-based computer clusters. If you want to know more about Hadoop click here. In this article, we will look at the Hadoop Commands with Examples.

Basic Hadoop Commands

The Basic Hadoop Commands refer to the HDFS commands. With HDFS commands, we can perform multiple tasks in HDFS such as creating a directory, creating a file, transferring the directory/file from the local file system to HDFS, and vice-versa, etc. We will look at the most basic and important commands you should know if you are/will be working on Hadoop Distributed File System.


(a) ls

The ls command is used to display all the files and folders present in your hdfs.

hdfs dfs -ls /

Note: The default hdfs path can be assumed by giving just a forward slash ( / ).


(b) mkdir

The mkdir command is used to create a directory in hdfs.

hdfs dfs -mkdir /hdfs_path/directoy_name

(c) touchz

The touchz command is used to create a blank file in hdfs.

hdfs dfs -touchz /hdfs_path/filename.ext

(d) put

The put command is used to copy a file/directory from the local file system to the hdfs.

hdfs dfs -put /local_path/filename.ext /hdfs_path

(e) copyFromLocal

The copyFromLocal command is identical to the put command i.e. it is also used to copy a file/directory from the local file system to the hdfs.

hdfs dfs -copyFromLocal /path/filename.ext /hdfs_path

(f) get

The get command is opposite of put command. It is used to copy a file/directory from the hdfs to the local file system.

hdfs dfs -get /hdfs_path/filename.ext /local_path

(g) copyToLocal

The copyToLocal command is identical to the get command i.e. it is also used to copy a file/directory from the hdfs to the local file system .

hdfs dfs -copyToLocal /hdfs_path/filename.ext /local_path

(h) cat

The cat command is used to display the contents of a file.

hdfs dfs -cat /hdfs_path/filename.ext

(i) cp

The cp command is used to copy a file/directory from one location to the other in hdfs itself.

hdfs dfs -cp /hdfs_path/filename.ext /hdfs_target_path

(j) mv

The mv command is used to move a file/directory from one location to the other in hdfs itself.

hdfs dfs -mv /hdfs_path/filename.ext /hdfs_target_path

(k) rm

The rm command is used to delete a file/directory in hdfs.

hdfs dfs -rm /hdfs_path/filename.ext

To delete a directory, add -r argument in the command. For example: hdfs dfs -rm -r /directory_name


Hive Commands

Hive commands in Hadoop are executed to perform SQL-like operations on big data. Apache Hive is software that has a SQL-like querying capability. It was initially developed at Facebook. Refer to our tutorial here for more information on Hive and how to install Hive. In this article, let us look at some examples of Hive queries.

(a) Hive DDL Commands

DDL stands for Data Definition Language and the DDL commands are basically used to modify the database and the structure of a table. Below are the DDL commands with definition and example

CommandUsage
CREATEUsed to create a database or a table.
USEUsed to select the database we need to work on.
DESCRIBEUsed to display the structure of a table.
ALTERUsed to modify a database or a table.
TRUNCATEUsed to delete the contents of a table.
DROPUsed to delete a database or a table.
SHOWUsed to display all the databases or the tables in a database.
Hive DDL Commands

1. CREATE

  1. database
    • Syntax: CREATE DATABASE<database_name>;
    • Example: create database hiberstack;
  2. table
    • Syntax: Create TABLE <table_name> ( <column_name1> <data type>, <column_name2> <data type>,...) ROW FORMAT DELIMITED FIELDS TERMINATED BY '[delimiter]';
    • Example: create table emp ( id int, name string, salary int) row format delimited fields terminated by '\t';

2. USE

  • database
    • Syntax: USE <database_name>;
    • Example: use hiberstack;

3. DESCRIBE

  • table
    • Syntax: DESCRIBE<table_name>;
    • Example: describe emp;

4. ALTER

  1. database
    • Syntax: ALTER DATABASE<database_name> SET DBPROPERTIES (property_name=property_value, ...);
    • Example: alter database hiberstack set OWNER ROLE admin;
  2. table: suppose we want to change the name of our table. The syntax and command will be as below
    • Syntax: ALTER TABLE <table_name> RENAME TO <new_table_name>;
    • Example: alter table emp rename to employee;

5. TRUNCATE

  • table
    • Syntax: TRUNCATE <table_name>;
    • Example: truncate employee;

6. DROP

  1. database
    • Syntax: DROP DATABASE [IF EXISTS]<database_name>;
    • Example: drop database if exists hiberstack;
  2. table
    • Syntax: DROP TABLE [IF EXISTS]<table_name>;
    • Example: drop table if exists emp;

7. SHOW

  1. database
    • Syntax: SHOW DATABASES;
    • Example: show databases;
  2. table
    • Syntax: SHOW TABLES;
    • Example: show tables;

(b) Hive DML Commands

DML stands for Data Manipulation Language. As the name suggests, the DML Commands are used to manipulate the table such as load the data in it, display the contents of the table, etc. DML commands can be executed once the table is created in Hive using the DDL commands. Various Hive DML commands are as below:

CommandUsage
LOADUsed to load data into the table using a file that is present either in the local file system or HDFS.
SELECTUsed to display the contents of the table.
INSERTUsed to insert the data in the table directly from the command (not using a file as in LOAD).
UPDATEUsed to update the data in the table such as changing the values of the columns/rows.
DELETEUsed to delete the data in the table.
Hive DML Commands

1. LOAD

As mentioned earlier, the LOAD command is used to add the data to the table. You can add the data from a text file, CSV file, or any other format. If your input file is present in the local file system, then the command will be as below,

  • Syntax: LOAD DATA LOCAL INPATH 'local_path/file.ext' [OVERWRITE] INTO TABLE ;
  • Example: load data local inpath '/home/ubuntu/emp.txt' overwrite into table emp;

If your input file is present in HDFS, then the command will be as below,

  • Syntax: LOAD DATA INPATH 'hdfs_path/file.ext' [OVERWRITE] INTO TABLE <table_ name>;
  • Example: load data inpath '/emp.txt' overwrite into table emp;

2. SELECT

It is used to show the contents of a table. If the table contains huge number of rows but you want to display just the first 10 rows, then you can add the “limit” parameter in the command.

  • Syntax: SELECT [ * | column ] FROM emp [limit n];
  • Example:
    1. select * from emp; #display all the records in the emp table.
    2. select name from emp limit 15; #display only the ‘name’ column from the table and only the first 15 rows.

3. INSERT

It is executed to insert the data in the table. The data to be inserted is provided at the time of executing the command itself.

  • Syntax: INSERT INTO <table_name> (column1, column2, …) VALUES (row1), (row2), …;
  • Example: insert into emp (id, name, salary) values (1, 'abc', 1000), (2, 'xyz', 5000);

4. UPDATE

The update command is used to update the table. We can update a particular column name in a table using a where clause. This command can be executed only on the tables that support ACID properties.

  • Syntax: UPDATE TABLE <table_name> SET <column_name>=<value> WHERE <column_name>=<value>;
  • Example: update table emp set name='pqr' where salary=5000;

5. DELETE

The delete command is executed to delete a particular row from the table. Similar to the UPDATE command, this command can also be executed only on the tables that support ACID properties.

  • Syntax: DELETE FROM <table_name> WHERE <condition>;
  • Example: delete from emp where id=1;

Pig Commands

Apache Pig is a platform that is built to run Apache Hadoop programs. Pig Latin is the language for this platform. MapReduce job is executed in the backend when Pig commands are executed. Apache Pig was originally created at Yahoo for the researchers to perform MapReduce jobs on huge datasets. If you want to install ApachePig, click here.

Now let us see some of the basic commands in Apache Pig.

The Pig commands are executed in Grunt shell. The grunt shell is the native shell provided by Apache Pig to execute the pig commands. Execute the command pig in the terminal to start the grunt shell.

Note: All the below commands are executed in the default pig mode i.e. MapReduce mode.

(a) fs

The fs command is used to display the files present in HDFS

fs -ls

You can also execute the mkdir command within the grunt shell to create a directory in HDFS.

fs -mkdir temp

(b) clear

The clear command is used to clear the grunt shell screen. This will put the cursor at the top of the screen.

clear

(c) history

The history command, as the name suggests, shows the list of commands that have been executed so far.

history

(d) Reading/loading data in Pig

emp = LOAD 'hdfs://localhost:9000/pig_data/emp_data.txt' USING PigStorage(',') AS (id:int, firstname:chararray, lastname:chararray, designation:chararray);

The PigStorage() option is used to define the delimiter in the dataset.


(e) Storing data

STORE emp INTO 'hdfs://localhost:9000/pig_output/';

The output file will be stored in a directory named ‘pig_output’ in the HDFS.


(f) dump

The dump command is used to process the pig commands and display the output in the grunt shell itself. It will not save the output in HDFS.

dump emp;

Share:

Leave a Reply