[[PageOutline]]
= DRBL叢集上運行HADOOP =
'''Hadoop Cluster Based on DRBL'''
* 此篇的目的在於利用DRBL統整一個Cluster,並在上面運行Hadoop。
* 由於DRBL為無碟系統,並非一般的Cluster,因此有些地方需要注意。
= 零、環境說明 =
環境中共有十三台機器,一台為drbl server,也是hadoop的namenode,其他節點則client 與datanode,如下:
|| 名稱 || ip || drbl用途 || hadoop 用途 ||
|| hadoop || 192.168.1.254 || drbl server || namenode ||
|| hadoop || 192.168.1.1~12 || drbl client || datanode ||
介紹drbl server環境如下:
|| debian || etch (4.0) || server - 64 bit ||
DRBL為無碟系統,因此只要將drbl server系統與所需服務安裝好,則其他的client網路開機後,就會載入以server為依據的檔案系統,也就是說,只有某些特定資料夾內的內容(如 /etc /root /home /tmp /var ...)會各自不同之外,其他都一樣。舉例若改了server內/etc/hosts檔的,則其他的client都會自動即時一起更改(因為是用NFS mount 上來的)。
因此,只要先在drbl server上完成了'''一、安裝''','''二、設定'''之後,在將其他的client開機然後依照'''三、操作''' 就可以了。
= 一、安裝 =
== 1.1 安裝drbl ==
* 詳見 [http://drbl.nchc.org.tw/one4all/desktop/ DRBL的安裝]
== 1.2 安裝 java 6 ==
* 在套件庫裡 /etc/apt/sources.list 加入 non-free 庫以及 backports 網址才能安裝 sun-java6
{{{
deb http://free.nchc.org.tw/debian/ etch main contrib non-free
deb-src http://free.nchc.org.tw/debian/ etch main contrib non-free
deb http://security.debian.org/ etch/updates main contrib non-free
deb-src http://security.debian.org/ etch/updates main contrib non-free
deb http://www.backports.org/debian etch-backports main non-free
deb http://free.nchc.org.tw/drbl-core drbl stable
}}}
* 安裝key及java6
{{{
$ wget http://www.backports.org/debian/archive.key
$ sudo apt-key add archive.key
$ apt-get update
$ apt-get install sun-java6-bin sun-java6-jdk sun-java6-jre
}}}
== 1.3 安裝 Hadoop 0.18.3 ==
{{{
$ cd /opt
$ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/core/hadoop-0.18.3/hadoop-0.18.3.tar.gz
$ tar zxvf hadoop-0.18.3.tar.gz
}}}
== 1.4 設定使用者 ==
{{{
$ su -
$ addgroup hdfsgrp
$ adduser --ingroup hdfsgrp hdfsadm
$ chown -R hdfsadm:hdfsgrp /opt/hadoop-0.18.3
$ chmod -R 775 /opt/hadoop-0.18.3
$ su - hdfsadm
$ cd /opt/hadoop
$ ln -sf hadoop-0.18.3 hadoop
}}}
= 二、設定 Hadoop =
* 在 /etc/bash.bashrc 的最末加入 以下資訊
{{{
#!sh
PATH=$PATH:/opt/drbl/bin:/opt/drbl/sbin
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export HADOOP_HOME=/opt/hadoop/
}}}
* 載入設定值
{{{
$ source /etc/bash.bashrc
}}}
* 編輯 /etc/hosts 把下面內容貼在最後
{{{
192.168.1.254 gm2.nchc.org.tw
192.168.1.1 hadoop101
192.168.1.10 hadoop110
192.168.1.11 hadoop111
192.168.1.2 hadoop102
192.168.1.3 hadoop103
192.168.1.4 hadoop104
192.168.1.5 hadoop105
192.168.1.6 hadoop106
192.168.1.7 hadoop107
192.168.1.8 hadoop108
192.168.1.9 hadoop109
}}}
* 編輯 /opt/hadoop-0.18.3/conf/hadoop-env.sh
{{{
#!diff
--- hadoop-0.18.3/conf/hadoop-env.sh.org
+++ hadoop-0.18.3/conf/hadoop-env.sh
@@ -6,7 +6,10 @@
# remote nodes.
# The java implementation to use. Required.
-# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
+export JAVA_HOME=/usr/lib/jvm/java-6-sun
+export HADOOP_HOME=/opt/hadoop
+export HADOOP_CONF_DIR=$HADOOP_HOME/conf
+export HADOOP_LOG_DIR=/home/hdfsadm/hdfs/logs
+export HADOOP_PID_DIR=/home/hdfsadm/hdfs/pids
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
}}}
* 編輯 /opt/hadoop-0.18.3/conf/hadoop-site.xml
{{{
#!diff
--- hadoop-0.18.3/conf/hadoop-site.xml.org
+++ hadoop-0.18.3/conf/hadoop-site.xml
@@ -4,5 +4,31 @@
-
+
+ fs.default.name
+ hdfs://gm2.nchc.org.tw:9000/
+
+ The name of the default file system. Either the literal string
+ "local" or a host:port for NDFS.
+
+
+
+ mapred.job.tracker
+ hdfs://gm2.nchc.org.tw:9001
+
+ The host and port that the MapReduce job tracker runs at. If
+ "local", then jobs are run in-process as a single map and
+ reduce task.
+
+
+
+ hadoop.tmp.dir
+ /tmp/hadoop/hadoop-${user.name}
+ A base for other temporary directories.
+
}}}
* 編輯 /opt/hadoop/conf/slaves
{{{
192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.4
192.168.1.5
192.168.1.6
192.168.1.7
192.168.1.8
192.168.1.9
192.168.1.10
192.168.1.11
192.168.1.12
}}}
= 三、操作 =
== 3.1 開啟DRBL Client ==
* 將所有的 client 開啟,並且如下
{{{
******************************************************
NIC NIC IP Clients
+------------------------------+
| DRBL SERVER |
| |
| +-- [eth2] 140.110.X.X +- to WAN
| |
| +-- [eth1] 192.168.1.254 +- to clients group 1 [ 6 clients, their IP
| | from 192.168.1.1 - 192.168.1.12]
+------------------------------+
******************************************************
Total clients: 12
******************************************************
}}}
== 3.2 設定ssh ==
* 編寫 /etc/ssh/ssh_config
{{{
StrictHostKeyChecking no
}}}
* 執行
{{{
$ su -
$ /opt/drbl/sbin/drbl-useradd -s hdfsadm hdfsgrp
$ su - hdfsadm
$ ssh-keygen -t rsa -b 1024 -N "" -f ~/.ssh/id_rsa
$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
$ /etc/init.d/ssh restart
}}}
* 正確無誤則可免密碼登入
=== 3.2.1 dsh ===
{{{
$ sudo apt-get install dsh
$ mkdir -p .dsh
$ for ((i=1;i<=12;i++)); do echo "192.168.1.$i" >> .dsh/machines.list; done
}}}
* 測試並執行
{{{
$ dsh -a hostname
$ dsh -a source /etc/bash.bashrc
}}}
== 3.3 啟動 Hadoop ==
* 啟動
{{{
$ cd /opt/hadoop
$ bin/hadoop namenode -format
$ bin/start-all.sh
}}}
== 3.4 Hadoop 測試範例 ==
* 運作WordCount以測試
{{{
$ mkdir input
$ cp *.txt input/
$ bin/hadoop dfs -put input input
$ bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
}}}
* 執行畫面:
{{{
hadoop:/opt/hadoop# bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
09/02/26 06:16:35 INFO mapred.JobClient: Running job: job_200902260615_0001
09/02/26 06:16:36 INFO mapred.JobClient: map 0% reduce 0%
09/02/26 06:16:39 INFO mapred.JobClient: map 80% reduce 0%
09/02/26 06:16:40 INFO mapred.JobClient: map 100% reduce 0%
09/02/26 06:16:50 INFO mapred.JobClient: Job complete: job_200902260615_0001
09/02/26 06:16:50 INFO mapred.JobClient: Counters: 16
09/02/26 06:16:50 INFO mapred.JobClient: File Systems
09/02/26 06:16:50 INFO mapred.JobClient: HDFS bytes read=267854
09/02/26 06:16:50 INFO mapred.JobClient: HDFS bytes written=100895
09/02/26 06:16:50 INFO mapred.JobClient: Local bytes read=133897
09/02/26 06:16:50 INFO mapred.JobClient: Local bytes written=292260
09/02/26 06:16:50 INFO mapred.JobClient: Job Counters
09/02/26 06:16:50 INFO mapred.JobClient: Launched reduce tasks=1
09/02/26 06:16:50 INFO mapred.JobClient: Rack-local map tasks=5
09/02/26 06:16:50 INFO mapred.JobClient: Launched map tasks=5
09/02/26 06:16:50 INFO mapred.JobClient: Map-Reduce Framework
09/02/26 06:16:50 INFO mapred.JobClient: Reduce input groups=8123
09/02/26 06:16:50 INFO mapred.JobClient: Combine output records=17996
09/02/26 06:16:50 INFO mapred.JobClient: Map input records=6515
09/02/26 06:16:50 INFO mapred.JobClient: Reduce output records=8123
09/02/26 06:16:50 INFO mapred.JobClient: Map output bytes=385233
09/02/26 06:16:50 INFO mapred.JobClient: Map input bytes=265370
09/02/26 06:16:50 INFO mapred.JobClient: Combine input records=44786
09/02/26 06:16:50 INFO mapred.JobClient: Map output records=34913
09/02/26 06:16:50 INFO mapred.JobClient: Reduce input records=8123
hadoop:/opt/hadoop#
}}}
* http://gm2.nchc.org.tw:50030/
* 網頁中可以看到剛剛有在工作的node數
{{{
#!html
gm2 Hadoop Map/Reduce Administration
State: RUNNING
Started: Tue Mar 17 16:22:46 EDT 2009
Version: 0.18.3,
r736250
Compiled: Thu Jan 22 23:12:08 UTC 2009 by
ndaley
Identifier: 200903171622
Cluster Summary
Maps | Reduces | Total Submissions | Nodes | Map Task Capacity | Reduce Task Capacity | Avg. Tasks/Node |
0 | 0 | 1 | 9 | 18 | 18 | 4.00 |
Running Jobs
Completed Jobs
Completed Jobs |
Jobid | User | Name | Map % Complete | Map Total | Maps Completed | Reduce % Complete | Reduce Total | Reduces Completed |
job_200903171622_0001 | hdfsadm | wordcount | 100.00% | 5 | 5 | 100.00% | 1 | 1 |
}}}
* http://gm2.nchc.org.tw:50075/browseDirectory.jsp?dir=%2Fuser%2Froot&namenodeInfoPort=50070
* 可以看到輸出結果
== 3.5 停止hadoop ==
{{{
$ bin/stop-all.sh
}}}
== 3.6 重新建立 hadoop ==
{{{
$ bin/stop-all.sh
$ dsh -a rm -rf ~/hdfs/logs/* ~/hdfs/pids/* /tmp/hadoop/*
$ bin/hadoop namenode -format
$ bin/start-all.sh
}}}
= 四、操作 =
== 4.1 帳號 ==
* 增加一個hadoop帳號huser,使之可以在hdfs上自己的目錄內進行存取瀏覽的操作
1. 在drbl系統新增帳號 huser
{{{
$ su -
$ /opt/drbl/sbin/drbl-useradd -s huser huser
}}}
2. 用hdfs的superuser(此篇文章為hdfsadm)在hdfs上建立資料夾
{{{
$ su - hdfsadm
$ /opt/hadoop/bin/hadoop dfs -mkdir /user/huser
}}}
3. 用superuser 設定hdfs上該資料夾的權限與擁有者
{{{
$ /opt/hadoop/bin/hadoop dfs -chown -R huser /user/huser
$ /opt/hadoop/bin/hadoop dfs -chmod -R 775 /user/huser
}}}
4. 測試:用huser瀏覽或寫入檔案
{{{
$ su - huser
$ cd /opt/hadoop/
$ /opt/hadoop/bin/hadoop dfs -put input /user/huser/input
$ /opt/hadoop/bin/hadoop dfs -ls /user/huser/input
}}}
== 4.2 多帳號執行 ==
* 測試兩個user: rock , waue 同時執行,沒有問題
{{{
bin/hadoop jar hadoop-*-examples.jar wordcount input/ ouput/
}}}
網頁結果:
{{{
#!html
Completed Jobs
Failed Jobs
}}}
= 五、參考資料 =
* [http://trac.nchc.org.tw/grid/wiki/jazz/DRBL_Hadoop Jazz: DRBL_Hadoop ]
* [http://trac.nchc.org.tw/cloud/wiki/MR_manual Hadoop手冊]
= 六、問題排解 =
* drbl似乎安裝不順
drblsrv -i
出現以下錯誤訊息
{{{
Kernel 2.6 was found, so default to use initramfs.
The requested kernel "" 2.6.18-6-amd64 kernel files are NOT found in /tftpboot/node_root/lib/modules/s and /tftpboot/node_root/boot in the server! The necessary modules in the network initrd can NOT be created!
Client will NOT remote boot correctly!
Program terminated!
Done!
}}}
> 我這邊用 VMWare 裝 Debian 4.0r6 amd64 沒有這個問題耶
> [[Image(debian_4.0r6_drbl.jpg)]]
>> 原因: apt 的鏡像站台沒有複製到資料因此無法安裝新kernel,導致出現問題