wiki:waue/Hadoop_DRBL

Version 27 (modified by waue, 16 years ago) (diff)

--

DRBL叢集上運行HADOOP

Hadoop Cluster Based on DRBL

  • 此篇的目的在於利用DRBL統整一個Cluster,並在上面運行Hadoop。
  • 由於DRBL為無碟系統,並非一般的Cluster,因此有些地方需要注意。

零、環境說明

環境中共有七台機器,一台為drbl server,也是hadoop的namenode,其他節點則client 與datanode,如下:

名稱 ip drbl用途 hadoop 用途
hadoop 192.168.1.254 drbl server namenode
hadoop 192.168.1.2 drbl server namenode
hadoop 192.168.1.3 drbl clinet datanode
hadoop 192.168.1.4 drbl clinet datanode
hadoop 192.168.1.5 drbl clinet datanode
hadoop 192.168.1.6 drbl clinet datanode
hadoop 192.168.1.7 drbl clinet datanode

介紹drbl server環境如下:

debian etch (4.0) server - 64 bit

DRBL為無碟系統,因此只要將drbl server系統與所需服務安裝好,則其他的client網路開機後,就會載入以server為依據的檔案系統,也就是說,只有某些特定資料夾內的內容(如 /etc /root /home /tmp /var ...)會各自不同之外,其他都一樣。舉例若改了server內/etc/hosts檔的,則其他的client都會自動即時一起更改(因為是用NFS mount 上來的)。

因此,只要先在drbl server上完成了一、安裝,二、設定之後,在將其他的client開機然後依照三、操作 就可以了。

一、安裝

1.1 安裝drbl

1.2 安裝 java 6

  • 在套件庫裡 /etc/apt/sources.list 加入 non-free 庫以及 backports 網址才能安裝 sun-java6
    deb http://free.nchc.org.tw/debian/ etch main contrib non-free
    deb-src http://free.nchc.org.tw/debian/ etch main contrib non-free
    deb http://security.debian.org/ etch/updates main contrib non-free
    deb-src http://security.debian.org/ etch/updates main contrib non-free
    deb http://www.backports.org/debian etch-backports main non-free
    deb http://free.nchc.org.tw/drbl-core drbl stable
    
  • 安裝key及java6
    $ wget http://www.backports.org/debian/archive.key
    $ sudo apt-key add archive.key
    $ apt-get update
    $ apt-get install sun-java6-bin  sun-java6-jdk sun-java6-jre
    

1.3 安裝 Hadoop 0.18.3

$ cd /opt
$ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/core/hadoop-0.18.3/hadoop-0.18.3.tar.gz
$ tar zxvf hadoop-0.18.3.tar.gz

1.4 設定使用者

$ su - 
$ addgroup hdfsgrp
$ adduser --ingroup hdfsgrp hdfsadm
$ chown -R hdfsadm:hdfsgrp /opt/hadoop-0.18.3
$ chmod -R 775 /opt/hadoop-0.18.3
$ su - hdfsadm
$ cd /opt/hadoop
$ ln -sf hadoop-0.18.3 hadoop

二、設定 Hadoop

  • 在 /etc/bash.bashrc 的最末加入 以下資訊
    PATH=$PATH:/opt/drbl/bin:/opt/drbl/sbin
    export JAVA_HOME=/usr/lib/jvm/java-6-sun
    export HADOOP_HOME=/opt/hadoop/
    
  • 載入設定值
    $ source /etc/bash.bashrc
    
  • 編輯 /etc/hosts 把下面內容貼在最後
    192.168.1.254 gm2.nchc.org.tw
    192.168.1.1 hadoop101
    192.168.1.10 hadoop110
    192.168.1.11 hadoop111
    192.168.1.2 hadoop102
    192.168.1.3 hadoop103
    192.168.1.4 hadoop104
    192.168.1.5 hadoop105
    192.168.1.6 hadoop106
    192.168.1.7 hadoop107
    192.168.1.8 hadoop108
    192.168.1.9 hadoop109
    
  • 編輯 /opt/hadoop-0.18.3/conf/hadoop-env.sh
    • hadoop-0.18.3/conf/hadoop-env.sh

      old new  
      66# remote nodes.
      77# The java implementation to use.  Required.
      8 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
       8export JAVA_HOME=/usr/lib/jvm/java-6-sun
       9export HADOOP_HOME=/opt/hadoop
       10export HADOOP_CONF_DIR=$HADOOP_HOME/conf
       11export HADOOP_LOG_DIR=/home/hdfsadm/hdfs/logs
       12export HADOOP_PID_DIR=/home/hdfsadm/hdfs/pids
      913
      1014# Extra Java CLASSPATH elements.  Optional.
      1115# export HADOOP_CLASSPATH=
  • 編輯 /opt/hadoop-0.18.3/conf/hadoop-site.xml
    • hadoop-0.18.3/conf/hadoop-site.xml

      old new  
      44<!-- Put site-specific property overrides in this file. -->
      55<configuration>
      6 
       6  <property>
       7    <name>fs.default.name</name>
       8    <value>hdfs://gm2.nchc.org.tw:9000/</value>
       9    <description>
       10      The name of the default file system. Either the literal string
       11      "local" or a host:port for NDFS.
       12    </description>
       13  </property>
       14  <property>
       15    <name>mapred.job.tracker</name>
       16    <value>hdfs://gm2.nchc.org.tw:9001</value>
       17    <description>
       18      The host and port that the MapReduce job tracker runs at. If
       19      "local", then jobs are run in-process as a single map and
       20      reduce task.
       21    </description>
       22  </property>
       23  <property>
       24    <name>hadoop.tmp.dir</name>
       25    <value>/tmp/hadoop/hadoop-${user.name}</value>
       26    <description>A base for other temporary directories.</description>
       27  </property>
      728</configuration>
  • 編輯 /opt/hadoop/conf/slaves
    192.168.1.1
    192.168.1.2
    192.168.1.3
    192.168.1.4
    192.168.1.5
    192.168.1.6
    192.168.1.7
    192.168.1.8
    192.168.1.9
    192.168.1.10
    192.168.1.11
    192.168.1.12
    

三、操作

3.1 開啟DRBL Client

  • 將所有的 client 開啟,並且如下
    ******************************************************
              NIC    NIC IP                    Clients
    +------------------------------+
    |         DRBL SERVER          |
    |                              |
    |    +-- [eth2] 140.110.X.X    +- to WAN
    |                              |
    |    +-- [eth1] 192.168.1.254  +- to clients group 1 [ 6 clients, their IP
    |                              |             from 192.168.1.1 - 192.168.1.12]
    +------------------------------+
    ******************************************************
    Total clients: 12
    ******************************************************
    

3.2 設定ssh

  • 編寫 /etc/ssh/ssh_config
    StrictHostKeyChecking no
    
  • 執行
    $ su -
    $ /opt/drbl/sbin/drbl-useradd -s hdfsadm hdfsgrp
    $ su - hdfsadm
    $ ssh-keygen -t rsa -b 1024 -N "" -f ~/.ssh/id_rsa
    $ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
    $ /etc/init.d/ssh restart
    
  • 正確無誤則可免密碼登入

3.2.1 dsh

$ sudo apt-get install dsh
$ mkdir -p .dsh
$ for ((i=1;i<=12;i++)); do echo "192.168.1.$i" >> .dsh/machines.list; done
  • 測試並執行
    $ dsh -a hostname
    $ dsh -a source /etc/bash.bashrc
    

3.3 啟動 Hadoop

  • 啟動
    $ cd /opt/hadoop
    $ bin/hadoop namenode -format
    $ bin/start-all.sh
    

3.4 Hadoop 測試範例

  • 運作WordCount以測試
    $ mkdir input
    $ cp *.txt input/
    $ bin/hadoop dfs -put input input 
    $ bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
    
  • 執行畫面:
    hadoop:/opt/hadoop# bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
    09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/02/26 06:16:35 INFO mapred.JobClient: Running job: job_200902260615_0001
    09/02/26 06:16:36 INFO mapred.JobClient:  map 0% reduce 0%
    09/02/26 06:16:39 INFO mapred.JobClient:  map 80% reduce 0%
    09/02/26 06:16:40 INFO mapred.JobClient:  map 100% reduce 0%
    09/02/26 06:16:50 INFO mapred.JobClient: Job complete: job_200902260615_0001
    09/02/26 06:16:50 INFO mapred.JobClient: Counters: 16
    09/02/26 06:16:50 INFO mapred.JobClient:   File Systems
    09/02/26 06:16:50 INFO mapred.JobClient:     HDFS bytes read=267854
    09/02/26 06:16:50 INFO mapred.JobClient:     HDFS bytes written=100895
    09/02/26 06:16:50 INFO mapred.JobClient:     Local bytes read=133897
    09/02/26 06:16:50 INFO mapred.JobClient:     Local bytes written=292260
    09/02/26 06:16:50 INFO mapred.JobClient:   Job Counters 
    09/02/26 06:16:50 INFO mapred.JobClient:     Launched reduce tasks=1
    09/02/26 06:16:50 INFO mapred.JobClient:     Rack-local map tasks=5
    09/02/26 06:16:50 INFO mapred.JobClient:     Launched map tasks=5
    09/02/26 06:16:50 INFO mapred.JobClient:   Map-Reduce Framework
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce input groups=8123
    09/02/26 06:16:50 INFO mapred.JobClient:     Combine output records=17996
    09/02/26 06:16:50 INFO mapred.JobClient:     Map input records=6515
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce output records=8123
    09/02/26 06:16:50 INFO mapred.JobClient:     Map output bytes=385233
    09/02/26 06:16:50 INFO mapred.JobClient:     Map input bytes=265370
    09/02/26 06:16:50 INFO mapred.JobClient:     Combine input records=44786
    09/02/26 06:16:50 INFO mapred.JobClient:     Map output records=34913
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce input records=8123
    hadoop:/opt/hadoop# 
    
  • http://gm2.nchc.org.tw:50030/
    • 網頁中可以看到剛剛有在工作的node數

      gm2 Hadoop Map/Reduce Administration

      State: RUNNING
      Started: Tue Mar 17 16:22:46 EDT 2009
      Version: 0.18.3, r736250
      Compiled: Thu Jan 22 23:12:08 UTC 2009 by ndaley
      Identifier: 200903171622

      Cluster Summary

      MapsReducesTotal SubmissionsNodesMap Task CapacityReduce Task CapacityAvg. Tasks/Node
      001918184.00

      Running Jobs

      Running Jobs
      none

      Completed Jobs

      Completed Jobs
      JobidUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces Completed
      job_200903171622_0001hdfsadmwordcount100.00%
      55100.00%
      1 1

3.5 停止hadoop

$ bin/stop-all.sh

3.6 重新建立 hadoop

$ bin/stop-all.sh
$ dsh -a rm -rf ~/hdfs/logs/* ~/hdfs/pids/* /tmp/hadoop/*
$ bin/hadoop namenode -format
$ bin/start-all.sh

四、操作

4.1 帳號

  • 增加一個hadoop帳號huser,使之可以在hdfs上自己的目錄內進行存取瀏覽的操作
  1. 在drbl系統新增帳號 huser
    $ su -
    $ /opt/drbl/sbin/drbl-useradd -s huser huser 
    
  2. 用hdfs的superuser(此篇文章為hdfsadm)在hdfs上建立資料夾
    $ su - hdfsadm
    $ /opt/hadoop/bin/hadoop dfs -mkdir /user/huser
    
  3. 用superuser 設定hdfs上該資料夾的權限與擁有者
    $ /opt/hadoop/bin/hadoop dfs -chown -R huser /user/huser
    $ /opt/hadoop/bin/hadoop dfs -chmod -R 775 /user/huser
    
  4. 測試:用huser瀏覽或寫入檔案
    <root>$ su - huser
    <huser>$ cd /opt/hadoop/
    <huser>$ /opt/hadoop/bin/hadoop dfs -put input /user/huser/input
    <huser>$ /opt/hadoop/bin/hadoop dfs -ls /user/huser/input
    

4.2 多帳號執行

  • 測試兩個user: rock , waue 同時執行,沒有問題
    bin/hadoop jar hadoop-*-examples.jar wordcount input/ ouput/
    

網頁結果:

Completed Jobs

Completed Jobs
JobidUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces Completed
job_200903061742_0001wauewordcount100.00%
11100.00%
1 1
job_200903061742_0002rockwordcount100.00%
11100.00%
1 1
job_200903061742_0003wauewordcount100.00%
11100.00%
1 1

Failed Jobs

五、參考資料

六、問題排解

  • drbl似乎安裝不順

drblsrv -i 出現以下錯誤訊息

Kernel 2.6 was found, so default to use initramfs.
The requested kernel "" 2.6.18-6-amd64 kernel files are NOT found in  /tftpboot/node_root/lib/modules/s and /tftpboot/node_root/boot in the server! The necessary modules in the network initrd can NOT be created!
Client will NOT remote boot correctly!
Program terminated!
Done!

原因: apt 的鏡像站台沒有複製到資料因此無法安裝新kernel,導致出現問題