Sunday, April 30, 2017

Export data to MySQL database from HDFS using Sqoop Commands


Export data to MySQL database from HDFS using Sqoop:

  1. First of all create table in mySql as it is mandatory to available in mySql db.

CREATE TABLE employee (id INT NOT NULL PRIMARY KEY,name VARCHAR(20),deg VARCHAR(20),salary INT,dept VARCHAR(10));
2. Now create a inputEmployee.txt file in local file system through vi Editor.
Like /home/cloudera/Export
Note: No space would be there after comma(,) saperator otherwise file will not export successfully.

1201,sumit,manager,50000,TP
1202,manisha,preader,50000,TP
1203,kalil,php dev,30000,AC
1204,prasanth,php dev,30000,AC
1205,kranthi,admin,20000,TP
1206,satish p,grp des,20000,GR

3.  Now place this input file into Hdfs file system by following command:

>hdfs dfs -put inputEmployee.txt /home/cloudera/exportResult/

4. Export this hdfs file into mysql through Sqoop:

>sqoop export --connect jdbc:mysql://localhost/test --username root --password cloudera --table employee --export-dir /home/cloudera/exportResult/

Note: No need to give input file name to export , only directory path is sufficient  to fetch the file.
All files would be fetched from this directory one by one to export.

5. Check the records in the table:

>select * form employee;

More Sqoop command:

  1. List of databases getting from sqoop command:

sqoop list-databases --connect jdbc:mysql://localhost/ --username root -P

2. List of tables getting from sqoop command:

sqoop list-tables --connect jdbc:mysql://localhost/test --username root -P

3. We can also use following command for export:
sqoop export -m 1 –connect jdbc:MySQL://localhost/db1 –username sqoop –password root –table acad –export-dir /sqoop_msql/

To obtain a filtered map, we can use the following option:
–input-fields-terminated-by ‘/t’ –MySQL-delmiters
Where ‘/t’ denotes tab.

By default, Sqoop export creates INSERT statements. If the –update-key argument is stated, UPDATE statements will be created instead.
The -m argument sets the number of map jobs for reading the file splits from HDFS. Each mapper will have its own connection to the MySQL Server.

Import data in HDFS from mysql db using Sqoop Commands

Import data in HDFS from mysql db using Sqoop:


1. when target dir is not available:
>sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table departments --m 1

2. To check departments part and success files:
>hdfs dfs -ls -R departments

3. To view content of departments part file:
>hdfs dfs -cat /departments/part-m-00000

=============

when target-dir is available:


1. >sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table departments -m 1 --target-dir /home/cloudera/sqoopQueryResult1/


2. To check departments part and success files:
>hdfs dfs -ls -R departments


3. To view content of departments part file:
>hdfs dfs -cat /home/cloudera/sqoopQueryResult1/part-m-00000


==============


Import records sqoop command while applying where clause in the table:


>sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table categories -m 1 --where "category_name like '%Golf%'" --target-dir /home/cloudera/sqoopQueryResult3/


=============
Incremental column value defines newly added rows in the table,
like here fetch the rows after the category_id last value is 57:

>sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table categories -m 1 --incremental append --check-column category_id --last-value 57 --target-dir /home/cloudera/sqoopQueryResult4/

Note:

You can also specified the –class-name for the generated code and specified the –bindir where the compiled code and .jar file should be located. (Without these arguments, Sqoop would place the generated Java source file in your current working directory and the compiled .class file and .jar file in /tmp/sqoop-/compile.)

The class name simply derives from the table name unless you specify a name with the help of the –class-name command line argument (CLA). The –target-diris the location in HDFS where you want the imported table to be placed.

>sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table departments -m 1 --class-name departmentsNew --target-dir /home/cloudera/sqoopQueryResult5/ --bindir /home/cloudera/sqoopQueryResult5/

This will create departmentsNew.class and departmentsNew.jar file in /home/cloudera/sqoopQueryResult5/ directory.