wiki:waue/Hadoop_DRBL

Version 19 (modified by jazz, 16 years ago) (diff)

--

DRBL叢集上運行HADOOP

Hadoop Cluster Based on DRBL

  • 此篇的目的在於利用DRBL統整一個Cluster,並在上面運行Hadoop。
  • 由於DRBL為無碟系統,並非一般的Cluster,因此有些地方需要注意。

零、環境說明

環境中共有七台機器,一台為drbl server,也是hadoop的namenode,其他節點則client 與datanode,如下:

名稱 ip drbl用途 hadoop 用途
hadoop 192.168.1.254 drbl server namenode
hadoop 192.168.1.2 drbl server namenode
hadoop 192.168.1.3 drbl clinet datanode
hadoop 192.168.1.4 drbl clinet datanode
hadoop 192.168.1.5 drbl clinet datanode
hadoop 192.168.1.6 drbl clinet datanode
hadoop 192.168.1.7 drbl clinet datanode

介紹drbl server環境如下:

debian etch (4.0) server - 64 bit

DRBL為無碟系統,因此只要將drbl server系統與所需服務安裝好,則其他的client網路開機後,就會載入以server為依據的檔案系統,也就是說,只有某些特定資料夾內的內容(如 /etc /root /home /tmp /var ...)會各自不同之外,其他都一樣。舉例若改了server內/etc/hosts檔的,則其他的client都會自動即時一起更改(因為是用NFS mount 上來的)。

因此,只要先在drbl server上完成了一、安裝,二、設定之後,在將其他的client開機然後依照三、操作 就可以了。

一、安裝

1.1 安裝drbl

1.2 安裝 java 6

  • 在套件庫裡 /etc/apt/sources.list 加入 non-free 庫以及 backports 網址才能安裝 sun-java6
    deb http://opensource.nchc.org.tw/debian/ etch main contrib non-free
    deb-src http://opensource.nchc.org.tw/debian/ etch main contrib non-free
    deb http://security.debian.org/ etch/updates main contrib non-free
    deb-src http://security.debian.org/ etch/updates main contrib non-free
    deb http://www.backports.org/debian etch-backports main non-free
    deb http://free.nchc.org.tw/drbl-core drbl stable
    
  • 安裝key及java6
    $ wget http://www.backports.org/debian/archive.key
    $ sudo apt-key add archive.key
    $ apt-get update
    $ apt-get install sun-java6-bin  sun-java6-jdk sun-java6-jre
    

1.3 安裝 Hadoop 0.18.3

$ cd /opt
$ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/core/hadoop-0.18.3/hadoop-0.18.3.tar.gz
$ tar zxvf hadoop-0.18.3.tar.gz
hadoop:/opt# ln -sf hadoop-0.18.3 hadoop

二、設定 Hadoop

  • 在 /etc/bash.bashrc 的最末加入 以下資訊
    PATH=$PATH:/opt/drbl/bin:/opt/drbl/sbin
    export JAVA_HOME=/usr/lib/jvm/java-6-sun
    export HADOOP_HOME=/opt/hadoop/
    
  • 編輯 /etc/hosts 把下面內容貼在最後
    192.168.1.254 gm2.nchc.org.tw
    192.168.1.1 hadoop101
    192.168.1.10 hadoop110
    192.168.1.11 hadoop111
    192.168.1.2 hadoop102
    192.168.1.3 hadoop103
    192.168.1.4 hadoop104
    192.168.1.5 hadoop105
    192.168.1.6 hadoop106
    192.168.1.7 hadoop107
    192.168.1.8 hadoop108
    192.168.1.9 hadoop109
    
  • 編輯 /opt/hadoop-0.18.3/conf/hadoop-env.sh
    • hadoop-0.18.3/conf/hadoop-env.sh

      old new  
      66# remote nodes.
      77# The java implementation to use.  Required.
      8 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
       8export JAVA_HOME=/usr/lib/jvm/java-6-sun
       9export HADOOP_HOME=/opt/hadoop-0.18.3
       10export HADOOP_CONF_DIR=$HADOOP_HOME/conf
       11export HADOOP_LOG_DIR=/root/hadoop/logs
      912# Extra Java CLASSPATH elements.  Optional.
      1013# export HADOOP_CLASSPATH=
  • 編輯 /opt/hadoop-0.18.3/conf/hadoop-site.xml
    • hadoop-0.18.3/conf/hadoop-site.xml

      old new  
      44<!-- Put site-specific property overrides in this file. -->
      55<configuration>
      6 
       6  <property>
       7    <name>fs.default.name</name>
       8    <value>hdfs://gm2.nchc.org.tw:9000/</value>
       9    <description>
       10      The name of the default file system. Either the literal string
       11      "local" or a host:port for NDFS.
       12    </description>
       13  </property>
       14  <property>
       15    <name>mapred.job.tracker</name>
       16    <value>hdfs://gm2.nchc.org.tw:9001</value>
       17    <description>
       18      The host and port that the MapReduce job tracker runs at. If
       19      "local", then jobs are run in-process as a single map and
       20      reduce task.
       21    </description>
       22  </property>
      723</configuration>
  • 編輯 /opt/hadoop/conf/slaves
    hadoop102
    hadoop103
    hadoop104
    hadoop105
    hadoop106
    hadoop107
    hadoop
    

三、操作

3.1 開啟DRBL Client

  • 將所有的 client 開啟,並且如下
    ******************************************************
              NIC    NIC IP                    Clients
    +------------------------------+
    |         DRBL SERVER          |
    |                              |
    |    +-- [eth2] 140.110.X.X    +- to WAN
    |                              |
    |    +-- [eth1] 192.168.1.254  +- to clients group 1 [ 6 clients, their IP
    |                              |             from 192.168.1.2 - 192.168.1.7]
    +------------------------------+
    ******************************************************
    Total clients: 6
    ******************************************************
    

3.2 設定ssh

  • 編寫 /etc/ssh/ssh_config
    StrictHostKeyChecking no
    
  • 執行
    $ ssh-keygen -t rsa -b 1024 -N "" -f ~/.ssh/id_rsa
    $ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
    $ /etc/init.d/ssh restart
    
  • 寫個自動化 auto.shell 並執行
    #!/bin/bash
    
    for ((i=2;i<=7;i++));
    do
     scp -r ~/.ssh/ "192.168.1.$i":~/
     scp /etc/ssh/ssh_config "192.168.1.$i":/etc/ssh/ssh_config
     ssh "192.168.1.$i" /etc/init.d/ssh restart
    done
    
  • 正確無誤則可免密碼登入

3.2.1 dsh

$ sudo apt-get install dsh
$ mkdir -p .dsh
$ for ((i=2;i<=7;i++)); do echo "192.168.1.$i" >> .dsh/machines.list; done

並執行

$ dsh -a scp hadoop:/etc/hosts /etc/
$ dsh -a source /etc/bash.bashrc

3.3 啟動 Hadoop

  • 啟動
    $ cd /opt/hadoop
    $ bin/hadoop namenode -format
    $ bin/start-all.sh
    

3.4 Hadoop 測試範例

  • 運作WordCount以測試
    $ mkdir input
    $ cp *.txt input/
    $ bin/hadoop dfs -put input input 
    $ bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
    
  • 執行畫面:
    hadoop:/opt/hadoop# bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
    09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/02/26 06:16:35 INFO mapred.JobClient: Running job: job_200902260615_0001
    09/02/26 06:16:36 INFO mapred.JobClient:  map 0% reduce 0%
    09/02/26 06:16:39 INFO mapred.JobClient:  map 80% reduce 0%
    09/02/26 06:16:40 INFO mapred.JobClient:  map 100% reduce 0%
    09/02/26 06:16:50 INFO mapred.JobClient: Job complete: job_200902260615_0001
    09/02/26 06:16:50 INFO mapred.JobClient: Counters: 16
    09/02/26 06:16:50 INFO mapred.JobClient:   File Systems
    09/02/26 06:16:50 INFO mapred.JobClient:     HDFS bytes read=267854
    09/02/26 06:16:50 INFO mapred.JobClient:     HDFS bytes written=100895
    09/02/26 06:16:50 INFO mapred.JobClient:     Local bytes read=133897
    09/02/26 06:16:50 INFO mapred.JobClient:     Local bytes written=292260
    09/02/26 06:16:50 INFO mapred.JobClient:   Job Counters 
    09/02/26 06:16:50 INFO mapred.JobClient:     Launched reduce tasks=1
    09/02/26 06:16:50 INFO mapred.JobClient:     Rack-local map tasks=5
    09/02/26 06:16:50 INFO mapred.JobClient:     Launched map tasks=5
    09/02/26 06:16:50 INFO mapred.JobClient:   Map-Reduce Framework
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce input groups=8123
    09/02/26 06:16:50 INFO mapred.JobClient:     Combine output records=17996
    09/02/26 06:16:50 INFO mapred.JobClient:     Map input records=6515
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce output records=8123
    09/02/26 06:16:50 INFO mapred.JobClient:     Map output bytes=385233
    09/02/26 06:16:50 INFO mapred.JobClient:     Map input bytes=265370
    09/02/26 06:16:50 INFO mapred.JobClient:     Combine input records=44786
    09/02/26 06:16:50 INFO mapred.JobClient:     Map output records=34913
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce input records=8123
    hadoop:/opt/hadoop# 
    

3.5 停止hadoop

$ bin/stop-all.sh

3.6 重新建立 hadoop

$ bin/stop-all.sh
$ dsh -a rm -rf /root/hadoop/* /tmp/hadoop-root*
$ bin/hadoop namenode -format
$ bin/start-all.sh

四、參考資料

五、問題排解

  • drbl似乎安裝不順

drblsrv -i 出現以下錯誤訊息

Kernel 2.6 was found, so default to use initramfs.
The requested kernel "" 2.6.18-6-amd64 kernel files are NOT found in  /tftpboot/node_root/lib/modules/s and /tftpboot/node_root/boot in the server! The necessary modules in the network initrd can NOT be created!
Client will NOT remote boot correctly!
Program terminated!
Done!

我這邊用 VMWare 裝 Debian 4.0r6 amd64 沒有這個問題耶 No image "debian_4.0r6_drbl.jpg" attached to waue/Hadoop_DRBL

原因: apt 的鏡像站台沒有複製到資料因此無法安裝新kernel,導致出現問題