In this article we build Oozie 4.1.0 on a Hadoop 2.7 cluster. We’ve not selected the latest Oozie 4.2.0 which have build issues with Hadoop 2.0+ till date. We’ve a dedicated node (HDAppsNode-1) for Oozie (or other apps) with in the cluster, which is highlighted in the below deployment digram, showing our cluster model in Azure. We will keep the Oozie Meta data in a seperate MySQL instance running on a seperate host (HDMetaNode-1) to have a production grade system, rather than keeping it in the default Derby database. This article assume, you’ve already configured Hadoop 2.0+ on your cluster. The steps we’ve followed to create the cluster can be found here, which is to build a Single Node Cluster. We’ve cloned the Single Node, to multiple nodes (7 Nodes as seen below), and then updated the Hadoop configuration files to transform it to a multi-node cluster. This blog has helped us to do the same. The updated Hadoop Configuration files for the below model (Multi-Node-Cluster) has been shared here for your reference.
Lets get started. We’ve referenced the following blogs to prepare oozie under Hadoop.2.0. (Link1, Link2, Link3). Also to make Oozie work, you’ve to start your Job History Server along with YARN. I’ve configured Job History Server in the same node as that of YARN (HDResNode-1), which have been started with YARN using the command (mr-jobhistory-daemon.sh start historyserver).
1. Firstly the CodeHaus Maven repository referenced in the Oozie build file has been moved to another mirror. We need to ovveride the maven settings to point to the new location.
Edit or Create (home/hduser/.m2/settings.xml) and add the below.
<settings>
<profiles>
<profile>
<id>OozieProfile</id><repositories>
<repository>
<id>Codehaus repository</id>
<name>codehaus-mule-repo</name>
<url>https://repository-master.mulesoft.org/nexus/content/groups/public/
</url>
<layout>default</layout>
</repository>
</repositories>
</profile>
</profiles>
<activeProfiles>
<activeProfile>OozieProfile</activeProfile>
</activeProfiles>
</settings>
2. Now we have to create a Meta Store for Oozie in MySql running on HDMetaNode-1.
sudo apt-get install mysql-server
<Loging to my sql using the default user: root>
create database oozie;grant all privileges on oozie.* to 'oozie'@'HDAppsNode-1' identified by 'oozie';
grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Edit (/etc/mysql/my.conf) to enable MySql to accept connections from hosts, other than localhost
bind-address = HDMetaNode-1
2. Get Oozie 4.1.0 and Build on HDAppsNode-1.
We are keeping Oozie binaries under (/media/SYSTEM/hadoop/oozie-4.1.0)cd/media/SYSTEM/hadoop/
wget http://archive.apache.org/dist/oozie/4.1.0/oozie-4.1.0.tar.gz
tar -xvf oozie-4.1.0.tar.gz
cd oozie-4.1.0
Update the pom.xml to change the default hadoop version to 2.3.0. The reason we’re not changing it to hadoop version 2.6.0 here is because 2.3.0-oozie-4.1.0.jar is the latest available jar file. Luckily it works with higher versions in 2.x series
vim pom.xml
--Search for
<hadoop.version>1.1.1</hadoop.version>
--Replace it with
<hadoop.version>2.3.0</hadoop.version>
Continue with Hadoop Build…
sudo apt-get install maven
bin/mkdistro.sh -DskipTests -P hadoop-2 -DjavaVersion=1.7 -DtargetJavaVersion=1.7
cd ..
mv /media/SYSTEM/hadoop/oozie-4.1.0 /media/SYSTEM/hadoop/oozie-4.1.0-build
cp -R /media/SYSTEM/hadoop/oozie-4.1.0-build/distro/target/oozie-4.1.0-distro/oozie-4.1.0 /media/SYSTEM/hadoop/oozie-4.1.0
3. Prepare Oozie Libraries.
Update both ~/.profile, ~/.bashrc file to contain Oozie path. Append the below
#OOZIE VARIABLES START
export PATH=$PATH:/media/SYSTEM/hadoop/oozie-4.1.0/bin
#OOZIE VARIABLES END
Relaod the environment.
source ~/.bashrc
Prepare Oozie
cd /media/SYSTEM/hadoop/oozie-4.1.0
mkdir libextcp /media/SYSTEM/hadoop/oozie-4.1.0-build/hadooplibs/target/oozie-4.1.0-hadooplibs.tar.gz .
tar -xvf oozie-4.1.0-hadooplibs.tar.gz
cp oozie-4.1.0/hadooplibs/hadooplib-2.3.0.oozie-4.1.0/* libext/
cd libext
wget http://dev.sencha.com/deploy/ext-2.2.zip
mv openlogic-extjs-2.2-all-src-1.zip ext-2.2.zip
rm -fr /media/SYSTEM/hadoop/oozie-4.1.0/oozie-4.1.0-hadooplibs.tar.gz
4. Update Hadoop and Oozie Config files
core-site.xml (Add the below tags)
<!-- OOZIE -->
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
oozie-site.xml (Add/update the below tags). Pleas note the MySql and Hadoop directory configurations. Please change as per your environment.
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://HDMetaNode-1:3306/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/media/SYSTEM/hadoop/hadoop-2.7.0/etc/hadoop</value>
</property><property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>hdfs:///user/${user.name}/share/lib</value>
</property>
4. Add Oozie user and setup Oozie Server
Add Oozie user
sudo adduser oozie --ingroup hadoop
sudo chown –R /media/SYSTEM/hadoop/oozie-4.1.0 oozie:hadoop
sudo chmod –R a+rwx /media/SYSTEM/hadoop/oozie-4.1.0
su oozie
cd /media/SYSTEM/hadoop/oozie-4.1.0
Setup MySql connector
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.31.tar.gz
tar -zxf mysql-connector-java-5.1.31.tar.gz
cp mysql-connector-java-5.1.31/mysql-connector-java-5.1.31-bin.jar /media/SYSTEM/hadoop/oozie-4.1.0/libext
Setup Logs
mkdir logs
sudo chmod -R a+rwx /media/SYSTEM/hadoop/oozie-4.1.0/logs
Setup and Create Meta Tables in MySql
bin/oozie-setup.sh db create –run
Setup Oozie WebApplication
sudo apt-get install zip
bin/oozie-setup.sh prepare-war
Setup Oozie Share Library in HDFS (Change the name node URL, as per your environment)
bin/oozie-setup.sh sharelib create -fs hdfs://HDNameNode-1:8020
Start Oozie and Test the status
bin/oozied.sh start
bin/oozie admin -oozie http://localhost:11000/oozie -status
5. Prepare Oozie Samples and run a sample through Oozie
Our Name Node running on HDNameNode:8020 and Resource Manager (YARN) running on HDResNode-1:8032. Hence we’ve to update the configuration of samples as below. Change the Host and port as per your environment
tar -zxvf oozie-examples.tar.gz
find examples/ -name "job.properties" -exec sed -i "s/localhost:8020/HDNameNode-1:8020/g" '{}' \;
find examples/ -name "job.properties" -exec sed -i "s/localhost:8021/HDResNode-1:8032/g" '{}' \;
Put the samples to HDFS
hdfs dfs -mkdir /user/oozie/exampleshdfs dfs -put examples/* /user/oozie/examples/
Run a sample by submitting a Job
oozie job -oozie http://HDAppsNode-1:11000/oozie -config examples/apps/map-reduce/job.properties -run
Check the status of the job
#now open a web browser and access "http://HDAppsNode-1:11000/oozie", you will see the submitted job
No comments:
Post a Comment