Hope you've setup your Hadoop Single Node Cluster @ Your Desk.
In this tutorial, we will setup and test Pig 0.15.0. (Before you start, snapshot your VM, if not already done)
Note: You need to change paths as per your environment (i.e in my case I'm using '/media/SYSTEM', you've to replace it with yours)
Steps below:
1. Start your VM (Or Host, if you've installed Hadoop directly on Host)
2. Get Pig 0.15 and move to our dedicated partition (as that of Hadoop) for better management
3. Update .bashrc file, to have 'Pig' specific configuration$ su hduser $ cd $ wget http://www.eu.apache.org/dist/pig/latest/pig-0.15.0.tar.gz $ tar -xvf pig-0.15.0.tar.gz $ sudo mv pig-0.15.0 /media/SYSTEM/hadoop/pig/pig-0.15.0 $ sudo chown hduser pig
$ vi .bashrc #To avoid 'Found interface jline.Terminal, but class was expected' #export HADOOP_USER_CLASSPATH_FIRST=false #PIG VARIABLES START export PIG_INSTALL=/media/SYSTEM/hadoop/pig/pig-0.15.0 export PATH=${PATH}:${PIG_INSTALL}/bin #PIG VARIABLES END
4. Editing configuration files for Pig
Add a 'pigbootup' file with empty content (Pig expects this file to auto populate its values)
By default Pig will write logs to the root partition. Move the logs file to a separate location, for better management.
5. Reboot$ touch ~/.pigbootup $ mkdir /media/SYSTEM/hadoop/pig/pig-0.15.0/logs $ vi /media/SYSTEM/hadoop/pig/pig-0.15.0/conf/pig.properties pig.logfile=/media/SYSTEM/hadoop/pig/pig-0.15.0/logs/
6. Start hadoop
7. Testing Pig (The famous `Word Count` Example - In MapReduce/Hadoop Mode)$ start-all.sh
$ su hduser $ cd $ cat > words.txt this is a test file contains words $ hdfs dfs -copyFromLocal words.txt words.txt $ pig grunt> A = load './words.txt'; grunt> B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word; grunt> C = group B by word; grunt> D = foreach C generate COUNT(B), group; grunt> dump D
8 Stop Hadoop, Shutdown and Snapshot your VM
$ stop-all.sh
$ sudo shutdown now
No comments:
Post a Comment