Hope you've setup your Hadoop Single Node Cluster @ Your Desk.
In this tutorial, we will setup SQOOP 1.4.6. (Before you start, snapshot your VM, if not already done).
For testing, We will use SQOOP, to import a RDBMS table from MySQL to Hadoop Hive.
Note: MySQL installation and setting up dummy data has been discussed in the Appendix section.
Note: You need to change paths as per your environment (i.e in my case I'm using '/media/SYSTEM', you've to replace it with yours)
Steps below:
1. Start your VM (Or Host, if you've installed Hadoop directly on Host)
2. Get Sqoop 1.4.6 and move to our dedicated partition (as that of Hadoop) for better management
$ su hduser Password:$ cd $ wget http://www.eu.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz$ sudo tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz $ sudo mkdir -p /media/SYSTEM/hadoop/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha $ sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /media/SYSTEM/hadoop/sqoop/ $ sudo chown hduser /media/SYSTEM/hadoop/sqoop/
3. Update .bashrc file, to have 'Sqoop' specific configuration
$ vi .bashrc#SQOOP VARIABLES START export SQOOP_HOME=/media/SYSTEM/hadoop/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha export PATH=$PATH:$SQOOP_HOME/bin #SQOOP VARIABLES END
Now close the terminal and reopen a new one again (to get the new environment variables to effect)
4. Editing configuration files for SQOOP
$ su hduser $ cd $SQOOP_HOME/conf $ sudo cp sqoop-env-template.sh sqoop-env.sh $ vi sqoop-env.shexport HADOOP_COMMON_HOME=/media/SYSTEM/hadoop/hadoop-2.7.0 export HADOOP_MAPRED_HOME=/media/SYSTEM/hadoop/hadoop-2.7.0 export HIVE_HOME=/media/SYSTEM/hadoop/hive/apache-hive-1.2.1
5. Setup MySQL drivers, to be used by SQOOP (for importing MySQL tables to Hive)
$ cd $SQOOP_HOME/lib$ sudo wget http://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
$ sudo tar -zxvf mysql-connector-java-5.1.38.tar.gz$ sudo cp mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar ./
6. Reboot
7. Start hadoop
$ su hduser $ cd $ start-dfs.sh $ start-yarn.sh
8. Copy Sqoop specific JAR's to HDFS
So that every datanode can access the libraries for Sqoop processing.
NB: Without this, SQOOP will not work properly!
$ hdfs dfs -mkdir -p $SQOOP_HOME/lib $ hdfs dfs -copyFromLocal $SQOOP_HOME/lib/* $SQOOP_HOME/lib/ $ hdfs dfs -copyFromLocal $SQOOP_HOME/sqoop-1.4.6.jar $SQOOP_HOME/
9. Start MySQL
$ mysql -u mysqluser -p
10. Now Import a Table from MySQL to Hive using SQOOP
$ sqoop import –bindir ./ --connect jdbc:mysql://localhost:3306/scooptest --username mysqluser --password pass1 --table employee --hive-import --hive-overwrite
Note: See Appendix section, regarding MySQL installation and table setup, for this test.
Now take the 'hive' prompt, and see your data has been populated inside Hive tables
$ hive$ select * from employee where id >= 2;
11. Stop Hadoop, Shutdown and Snapshot your VM
$ stop-all.sh $ sudo shutdown now
Appendix:
MySQL installation and setting up some tables for the SQOOP Test.
MySQL User: 'mysqluser' (To be used for SQOOP import)
Database: sqooptest
Table: employee
$ sudo apt-get install mysql-server $ mysql -u root -p mysql> create database scooptest; mysql> grant all on scooptest.* to 'mysqluser' identified by 'pass1'; mysql> use scooptest; mysql> create table employee(id int primary key, name text); mysql> insert into employee values (1, 'smith'); mysql> insert into employee values (2, 'john'); mysql> insert into employee values (3, 'henry');
No comments:
Post a Comment