Hope you've setup your Hadoop Single Node Cluster @ Your Desk.
In this tutorial, we will setup and test Pig 0.15.0. (Before you start, snapshot your VM, if not already done)
Note: You need to change paths as per your environment (i.e in my case I'm using '/media/SYSTEM', you've to replace it with yours)
Steps below:
1. Start your VM (Or Host, if you've installed Hadoop directly on Host)
2. Get Pig 0.15 and move to our dedicated partition (as that of Hadoop) for better management
$ su hduser $ cd $ wget http://www.eu.apache.org/dist/pig/latest/pig-0.15.0.tar.gz $ tar -xvf pig-0.15.0.tar.gz $ sudo mv pig-0.15.0 /media/SYSTEM/hadoop/pig/pig-0.15.0 $ sudo chown hduser pig
3. Update .bashrc file, to have 'Pig' specific configuration
$ vi .bashrc #To avoid 'Found interface jline.Terminal, but class was expected' #export HADOOP_USER_CLASSPATH_FIRST=false #PIG VARIABLES START export PIG_INSTALL=/media/SYSTEM/hadoop/pig/pig-0.15.0 export PATH=${PATH}:${PIG_INSTALL}/bin #PIG VARIABLES END
NB: Please note to include 'HADOOP_USER_CLASSPATH_FIRST' environment variable, otherwise, Pig will have compatibility issues with Java Libraries
4. Editing configuration files for Pig
Add a 'pigbootup' file with empty content (Pig expects this file to auto populate its values)
By default Pig will write logs to the root partition. Move the logs file to a separate location, for better management.
$ touch ~/.pigbootup $ mkdir /media/SYSTEM/hadoop/pig/pig-0.15.0/logs $ vi /media/SYSTEM/hadoop/pig/pig-0.15.0/conf/pig.properties pig.logfile=/media/SYSTEM/hadoop/pig/pig-0.15.0/logs/
5. Reboot
6. Start hadoop
$ start-all.sh
7. Testing Pig (The famous `Word Count` Example - In MapReduce/Hadoop Mode)
$ su hduser $ cd $ cat > words.txt this is a test file contains words $ hdfs dfs -copyFromLocal words.txt words.txt $ pig grunt> A = load './words.txt'; grunt> B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word; grunt> C = group B by word; grunt> D = foreach C generate COUNT(B), group; grunt> dump D
8 Stop Hadoop, Shutdown and Snapshot your VM
$ stop-all.sh
$ sudo shutdown now
No comments:
Post a Comment