Hope you've setup your Hadoop Single Node Cluster @ Your Desk.
In this tutorial, we will setup SQOOP 1.4.6. (Before you start, snapshot your VM, if not already done).
For testing, We will use SQOOP, to import a RDBMS table from MySQL to Hadoop Hive.
Note: MySQL installation and setting up dummy data has been discussed in the Appendix section.
Note: You need to change paths as per your environment (i.e in my case I'm using '/media/SYSTEM', you've to replace it with yours)
Steps below:
1. Start your VM (Or Host, if you've installed Hadoop directly on Host)
2. Get Sqoop 1.4.6 and move to our dedicated partition (as that of Hadoop) for better management
$ su hduser Password:$ cd $ wget http://www.eu.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz$ sudo tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz $ sudo mkdir -p /media/SYSTEM/hadoop/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha $ sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /media/SYSTEM/hadoop/sqoop/ $ sudo chown hduser /media/SYSTEM/hadoop/sqoop/
3. Update .bashrc file, to have 'Sqoop' specific configuration
$ vi .bashrc#SQOOP VARIABLES START export SQOOP_HOME=/media/SYSTEM/hadoop/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha export PATH=$PATH:$SQOOP_HOME/bin #SQOOP VARIABLES END
Now close the terminal and reopen a new one again (to get the new environment variables to effect)
4. Editing configuration files for SQOOP
5. Setup MySQL drivers, to be used by SQOOP (for importing MySQL tables to Hive)$ su hduser $ cd $SQOOP_HOME/conf $ sudo cp sqoop-env-template.sh sqoop-env.sh $ vi sqoop-env.shexport HADOOP_COMMON_HOME=/media/SYSTEM/hadoop/hadoop-2.7.0 export HADOOP_MAPRED_HOME=/media/SYSTEM/hadoop/hadoop-2.7.0 export HIVE_HOME=/media/SYSTEM/hadoop/hive/apache-hive-1.2.1
6. Reboot$ cd $SQOOP_HOME/lib$ sudo wget http://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
$ sudo tar -zxvf mysql-connector-java-5.1.38.tar.gz$ sudo cp mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar ./
7. Start hadoop
$ su hduser $ cd $ start-dfs.sh $ start-yarn.sh
8. Copy Sqoop specific JAR's to HDFS
So that every datanode can access the libraries for Sqoop processing.
NB: Without this, SQOOP will not work properly!
$ hdfs dfs -mkdir -p $SQOOP_HOME/lib $ hdfs dfs -copyFromLocal $SQOOP_HOME/lib/* $SQOOP_HOME/lib/ $ hdfs dfs -copyFromLocal $SQOOP_HOME/sqoop-1.4.6.jar $SQOOP_HOME/
9. Start MySQL
$ mysql -u mysqluser -p10. Now Import a Table from MySQL to Hive using SQOOP
Note: See Appendix section, regarding MySQL installation and table setup, for this test.$ sqoop import –bindir ./ --connect jdbc:mysql://localhost:3306/scooptest --username mysqluser --password pass1 --table employee --hive-import --hive-overwrite
Now take the 'hive' prompt, and see your data has been populated inside Hive tables
$ hive$ select * from employee where id >= 2;
11. Stop Hadoop, Shutdown and Snapshot your VM
Appendix:$ stop-all.sh $ sudo shutdown now
MySQL installation and setting up some tables for the SQOOP Test.
MySQL User: 'mysqluser' (To be used for SQOOP import)
Database: sqooptest
Table: employee
$ sudo apt-get install mysql-server $ mysql -u root -p mysql> create database scooptest; mysql> grant all on scooptest.* to 'mysqluser' identified by 'pass1'; mysql> use scooptest; mysql> create table employee(id int primary key, name text); mysql> insert into employee values (1, 'smith'); mysql> insert into employee values (2, 'john'); mysql> insert into employee values (3, 'henry');
No comments:
Post a Comment