Wednesday, January 27, 2016

Setting up SQOOP 1.4 - Hadoop @ Desk (Single Node Cluster)

Hope you've setup your Hadoop Single Node Cluster @ Your Desk.

In this tutorial, we will setup SQOOP 1.4.6. (Before you start, snapshot your VM, if not already done).
For testing, We will use SQOOP, to import a RDBMS table from MySQL to
Hadoop Hive.

Note: MySQL installation and setting up dummy data has been discussed in the Appendix section.

Note: You need to change paths as per your environment (i.e in my case I'm using '/media/SYSTEM', you've to replace it with yours)

Steps below:

1. Start your VM (Or Host, if you've installed Hadoop directly on Host)

2. Get Sqoop 1.4.6 and move to our dedicated partition (as that of Hadoop) for better management


$ su hduser
Password: 

$ cd
$ wget http://www.eu.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
 
$ sudo tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
$ sudo mkdir -p /media/SYSTEM/hadoop/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
$ sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /media/SYSTEM/hadoop/sqoop/
$ sudo chown hduser /media/SYSTEM/hadoop/sqoop/

3. Update .bashrc file, to have 'Sqoop' specific configuration

$ vi .bashrc

#SQOOP VARIABLES START
export SQOOP_HOME=/media/SYSTEM/hadoop/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
export PATH=$PATH:$SQOOP_HOME/bin
#SQOOP VARIABLES END

Now close the terminal and reopen a new one again (to get the new environment variables to effect)


4. Editing configuration files for SQOOP

$ su hduser
$ cd $SQOOP_HOME/conf
$ sudo cp sqoop-env-template.sh sqoop-env.sh
$ vi sqoop-env.sh

export HADOOP_COMMON_HOME=/media/SYSTEM/hadoop/hadoop-2.7.0
export HADOOP_MAPRED_HOME=/media/SYSTEM/hadoop/hadoop-2.7.0
export HIVE_HOME=/media/SYSTEM/hadoop/hive/apache-hive-1.2.1

5. Setup MySQL drivers, to be used by SQOOP (for importing MySQL tables to Hive)


$ cd $SQOOP_HOME/lib
$ sudo wget http://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz

$ sudo tar -zxvf mysql-connector-java-5.1.38.tar.gz
$ sudo cp mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar ./ 
6. Reboot


7. Start hadoop

$ su hduser
$ cd
$ start-dfs.sh
$ start-yarn.sh

8. Copy Sqoop specific JAR's to HDFS

So that every datanode can access the libraries for Sqoop processing.
NB: Without this, SQOOP will not work properly!


$ hdfs dfs -mkdir -p $SQOOP_HOME/lib
$ hdfs dfs -copyFromLocal $SQOOP_HOME/lib/* $SQOOP_HOME/lib/
$ hdfs dfs -copyFromLocal $SQOOP_HOME/sqoop-1.4.6.jar $SQOOP_HOME/
9. Start MySQL 
$ mysql -u mysqluser -p

10. Now Import a Table from MySQL to Hive using SQOOP


$ sqoop import –bindir ./ --connect jdbc:mysql://localhost:3306/scooptest --username mysqluser --password pass1 --table employee --hive-import --hive-overwrite

Note: See Appendix section, regarding MySQL installation and table setup, for this test.

Now take the 'hive' prompt, and see your data has been populated inside Hive tables


$ hive
$ select * from employee where id >= 2;

1

11. Stop Hadoop, Shutdown and Snapshot your VM


$ stop-all.sh
$ sudo shutdown now

Appendix:

MySQL installation and setting up some tables for the SQOOP Test.

MySQL User: 'mysqluser' (To be used for SQOOP import)
Database: sqooptest
Table: employee


$ sudo apt-get install mysql-server
$ mysql -u root -p
mysql> create database scooptest;
mysql> grant all on scooptest.* to 'mysqluser' identified by 'pass1';
mysql> use scooptest;
mysql> create table employee(id int primary key, name text);
mysql> insert into employee values (1, 'smith');
mysql> insert into employee values (2, 'john');
mysql> insert into employee values (3, 'henry');

No comments:

Post a Comment