Sunday, February 28, 2016

Setting up Oozie 4.1.0 On Hadoop 2.7+ (Multi-Node-Cluster On Ubuntu 14.04 LXD Containers)

In this article we build Oozie 4.1.0 on a Hadoop 2.7 cluster. We’ve not selected the latest Oozie 4.2.0 which have build issues with Hadoop 2.0+ till date. We’ve a dedicated node (HDAppsNode-1) for Oozie (or other apps) with in the cluster, which is highlighted in the below deployment digram, showing our cluster model in Azure. We will keep the Oozie Meta data in a seperate MySQL instance running on a seperate host (HDMetaNode-1) to have a production grade system, rather than keeping it in the default Derby database. This article assume, you’ve already configured Hadoop 2.0+ on your cluster. The steps we’ve followed to create the cluster can be found here, which is to build a Single Node Cluster. We’ve cloned the Single Node, to multiple nodes (7 Nodes as seen below), and then updated the Hadoop configuration files to transform it to a multi-node cluster. This blog has helped us to do the same. The updated Hadoop Configuration files for the below model (Multi-Node-Cluster) has been shared here for your reference.

image

Lets get started. We’ve referenced the following blogs to prepare oozie under Hadoop.2.0. (Link1, Link2, Link3). Also to make Oozie work, you’ve to start your Job History Server along with YARN. I’ve configured Job History Server in the same node as that of YARN (HDResNode-1), which have been started with YARN using the command (mr-jobhistory-daemon.sh start historyserver).

1. Firstly the CodeHaus Maven repository referenced in the Oozie build file has been moved to another mirror. We need to ovveride the maven settings to point to the new location.

Edit or Create (home/hduser/.m2/settings.xml) and add the below.

<settings>
<profiles>
<profile>
      <id>OozieProfile</id>

<repositories>
      <repository>
        <id>Codehaus repository</id>
        <name>codehaus-mule-repo</name>
        <url>https://repository-master.mulesoft.org/nexus/content/groups/public/
        </url>
        <layout>default</layout>
      </repository>
   </repositories>
</profile>
</profiles>
  <activeProfiles>
    <activeProfile>OozieProfile</activeProfile>
  </activeProfiles>
</settings>

2. Now we have to create a Meta Store for Oozie in MySql running on HDMetaNode-1.

sudo apt-get install mysql-server

<Loging to my sql using the default user: root>


create database oozie;

grant all privileges on oozie.* to 'oozie'@'HDAppsNode-1' identified by 'oozie';
grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';

Edit  (/etc/mysql/my.conf) to enable MySql to accept connections from hosts, other than localhost

bind-address  = HDMetaNode-1

2. Get Oozie 4.1.0 and Build on HDAppsNode-1.

We are keeping Oozie binaries under  (/media/SYSTEM/hadoop/oozie-4.1.0)cd/media/SYSTEM/hadoop/

wget http://archive.apache.org/dist/oozie/4.1.0/oozie-4.1.0.tar.gz
tar -xvf oozie-4.1.0.tar.gz
cd oozie-4.1.0

Update the pom.xml to change the default hadoop version to 2.3.0. The reason we’re not changing it to hadoop version 2.6.0 here is because 2.3.0-oozie-4.1.0.jar is the latest available jar file. Luckily it works with higher versions in 2.x series

vim pom.xml


--Search for
<hadoop.version>1.1.1</hadoop.version>
--Replace it with
<hadoop.version>2.3.0</hadoop.version>

Continue with Hadoop Build…

sudo apt-get install maven
bin/mkdistro.sh -DskipTests -P hadoop-2 -DjavaVersion=1.7 -DtargetJavaVersion=1.7

cd ..
mv /media/SYSTEM/hadoop/oozie-4.1.0 /media/SYSTEM/hadoop/oozie-4.1.0-build
cp -R /media/SYSTEM/hadoop/oozie-4.1.0-build/distro/target/oozie-4.1.0-distro/oozie-4.1.0 /media/SYSTEM/hadoop/oozie-4.1.0

3. Prepare Oozie Libraries.

Update both ~/.profile, ~/.bashrc file to contain Oozie path. Append the below

#OOZIE VARIABLES START
export PATH=$PATH:/media/SYSTEM/hadoop/oozie-4.1.0/bin
#OOZIE VARIABLES END

Relaod the environment.


source ~/.bashrc

Prepare Oozie


cd /media/SYSTEM/hadoop/oozie-4.1.0
mkdir libext

cp /media/SYSTEM/hadoop/oozie-4.1.0-build/hadooplibs/target/oozie-4.1.0-hadooplibs.tar.gz .
 

tar -xvf oozie-4.1.0-hadooplibs.tar.gz

 
cp oozie-4.1.0/hadooplibs/hadooplib-2.3.0.oozie-4.1.0/* libext/

 
cd libext

 
wget http://dev.sencha.com/deploy/ext-2.2.zip
mv openlogic-extjs-2.2-all-src-1.zip ext-2.2.zip

 
rm -fr /media/SYSTEM/hadoop/oozie-4.1.0/oozie-4.1.0-hadooplibs.tar.gz

4. Update Hadoop and Oozie Config files

core-site.xml (Add the below tags)

  <!-- OOZIE -->
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>

oozie-site.xml (Add/update the below tags). Pleas note the MySql and Hadoop directory configurations. Please change as per your environment.

<property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://HDMetaNode-1:3306/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
<property>
        <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
        <value>*=/media/SYSTEM/hadoop/hadoop-2.7.0/etc/hadoop</value>      
    </property>

    <property>
        <name>oozie.service.WorkflowAppService.system.libpath</name>
        <value>hdfs:///user/${user.name}/share/lib</value>
    </property>

4. Add Oozie user and setup Oozie Server

Add Oozie user

sudo adduser oozie --ingroup hadoop

sudo chown –R /media/SYSTEM/hadoop/oozie-4.1.0 oozie:hadoop

sudo chmod –R a+rwx /media/SYSTEM/hadoop/oozie-4.1.0

su oozie

cd /media/SYSTEM/hadoop/oozie-4.1.0

Setup MySql connector

wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.31.tar.gz
tar -zxf mysql-connector-java-5.1.31.tar.gz
cp mysql-connector-java-5.1.31/mysql-connector-java-5.1.31-bin.jar /media/SYSTEM/hadoop/oozie-4.1.0/libext

Setup Logs

mkdir logs
sudo chmod -R a+rwx /media/SYSTEM/hadoop/oozie-4.1.0/logs

Setup and Create Meta Tables in MySql

bin/oozie-setup.sh db create –run

Setup Oozie WebApplication

sudo apt-get install zip
bin/oozie-setup.sh prepare-war

Setup Oozie Share Library in HDFS (Change the name node URL, as per your environment)


bin/oozie-setup.sh sharelib create -fs hdfs://HDNameNode-1:8020

Start Oozie and Test the status

bin/oozied.sh start
bin/oozie admin -oozie http://localhost:11000/oozie -status

 

5. Prepare Oozie Samples and run a sample through Oozie

 Our Name Node running on HDNameNode:8020 and Resource Manager (YARN) running on HDResNode-1:8032. Hence we’ve to update the configuration of samples as below. Change the Host and port as per your environment


tar -zxvf oozie-examples.tar.gz
find examples/ -name "job.properties" -exec sed -i "s/localhost:8020/HDNameNode-1:8020/g" '{}' \;
find examples/ -name "job.properties" -exec sed -i "s/localhost:8021/HDResNode-1:8032/g" '{}' \;

Put the samples to HDFS
 
hdfs dfs -mkdir /user/oozie/examples

hdfs dfs -put examples/* /user/oozie/examples/

Run a sample by submitting a Job


oozie job -oozie http://HDAppsNode-1:11000/oozie -config examples/apps/map-reduce/job.properties -run

Check the status of the job


#now open a web browser and access "http://HDAppsNode-1:11000/oozie", you will see the submitted job

No comments:

Post a Comment