wiki:waue/Hadoop_DRBL

Version 29 (modified by waue, 16 years ago) (diff)

--

DRBL叢集上運行HADOOP

Hadoop Cluster Based on DRBL

  • 此篇的目的在於利用DRBL統整一個Cluster,並在上面運行Hadoop。
  • 由於DRBL為無碟系統,並非一般的Cluster,因此有些地方需要注意。

零、環境說明

環境中共有七台機器,一台為drbl server,也是hadoop的namenode,其他節點則client 與datanode,如下:

名稱 ip drbl用途 hadoop 用途
hadoop 192.168.1.254 drbl server namenode
hadoop 192.168.1.1~12 drbl client datanode

介紹drbl server環境如下:

debian etch (4.0) server - 64 bit

DRBL為無碟系統,因此只要將drbl server系統與所需服務安裝好,則其他的client網路開機後,就會載入以server為依據的檔案系統,也就是說,只有某些特定資料夾內的內容(如 /etc /root /home /tmp /var ...)會各自不同之外,其他都一樣。舉例若改了server內/etc/hosts檔的,則其他的client都會自動即時一起更改(因為是用NFS mount 上來的)。

因此,只要先在drbl server上完成了一、安裝,二、設定之後,在將其他的client開機然後依照三、操作 就可以了。

一、安裝

1.1 安裝drbl

1.2 安裝 java 6

  • 在套件庫裡 /etc/apt/sources.list 加入 non-free 庫以及 backports 網址才能安裝 sun-java6
    deb http://free.nchc.org.tw/debian/ etch main contrib non-free
    deb-src http://free.nchc.org.tw/debian/ etch main contrib non-free
    deb http://security.debian.org/ etch/updates main contrib non-free
    deb-src http://security.debian.org/ etch/updates main contrib non-free
    deb http://www.backports.org/debian etch-backports main non-free
    deb http://free.nchc.org.tw/drbl-core drbl stable
    
  • 安裝key及java6
    $ wget http://www.backports.org/debian/archive.key
    $ sudo apt-key add archive.key
    $ apt-get update
    $ apt-get install sun-java6-bin  sun-java6-jdk sun-java6-jre
    

1.3 安裝 Hadoop 0.18.3

$ cd /opt
$ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/core/hadoop-0.18.3/hadoop-0.18.3.tar.gz
$ tar zxvf hadoop-0.18.3.tar.gz

1.4 設定使用者

$ su - 
$ addgroup hdfsgrp
$ adduser --ingroup hdfsgrp hdfsadm
$ chown -R hdfsadm:hdfsgrp /opt/hadoop-0.18.3
$ chmod -R 775 /opt/hadoop-0.18.3
$ su - hdfsadm
$ cd /opt/hadoop
$ ln -sf hadoop-0.18.3 hadoop

二、設定 Hadoop

  • 在 /etc/bash.bashrc 的最末加入 以下資訊
    PATH=$PATH:/opt/drbl/bin:/opt/drbl/sbin
    export JAVA_HOME=/usr/lib/jvm/java-6-sun
    export HADOOP_HOME=/opt/hadoop/
    
  • 載入設定值
    $ source /etc/bash.bashrc
    
  • 編輯 /etc/hosts 把下面內容貼在最後
    192.168.1.254 gm2.nchc.org.tw
    192.168.1.1 hadoop101
    192.168.1.10 hadoop110
    192.168.1.11 hadoop111
    192.168.1.2 hadoop102
    192.168.1.3 hadoop103
    192.168.1.4 hadoop104
    192.168.1.5 hadoop105
    192.168.1.6 hadoop106
    192.168.1.7 hadoop107
    192.168.1.8 hadoop108
    192.168.1.9 hadoop109
    
  • 編輯 /opt/hadoop-0.18.3/conf/hadoop-env.sh
    • hadoop-0.18.3/conf/hadoop-env.sh

      old new  
      66# remote nodes.
      77# The java implementation to use.  Required.
      8 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
       8export JAVA_HOME=/usr/lib/jvm/java-6-sun
       9export HADOOP_HOME=/opt/hadoop
       10export HADOOP_CONF_DIR=$HADOOP_HOME/conf
       11export HADOOP_LOG_DIR=/home/hdfsadm/hdfs/logs
       12export HADOOP_PID_DIR=/home/hdfsadm/hdfs/pids
      913
      1014# Extra Java CLASSPATH elements.  Optional.
      1115# export HADOOP_CLASSPATH=
  • 編輯 /opt/hadoop-0.18.3/conf/hadoop-site.xml
    • hadoop-0.18.3/conf/hadoop-site.xml

      old new  
      44<!-- Put site-specific property overrides in this file. -->
      55<configuration>
      6 
       6  <property>
       7    <name>fs.default.name</name>
       8    <value>hdfs://gm2.nchc.org.tw:9000/</value>
       9    <description>
       10      The name of the default file system. Either the literal string
       11      "local" or a host:port for NDFS.
       12    </description>
       13  </property>
       14  <property>
       15    <name>mapred.job.tracker</name>
       16    <value>hdfs://gm2.nchc.org.tw:9001</value>
       17    <description>
       18      The host and port that the MapReduce job tracker runs at. If
       19      "local", then jobs are run in-process as a single map and
       20      reduce task.
       21    </description>
       22  </property>
       23  <property>
       24    <name>hadoop.tmp.dir</name>
       25    <value>/tmp/hadoop/hadoop-${user.name}</value>
       26    <description>A base for other temporary directories.</description>
       27  </property>
      728</configuration>
  • 編輯 /opt/hadoop/conf/slaves
    192.168.1.1
    192.168.1.2
    192.168.1.3
    192.168.1.4
    192.168.1.5
    192.168.1.6
    192.168.1.7
    192.168.1.8
    192.168.1.9
    192.168.1.10
    192.168.1.11
    192.168.1.12
    

三、操作

3.1 開啟DRBL Client

  • 將所有的 client 開啟,並且如下
    ******************************************************
              NIC    NIC IP                    Clients
    +------------------------------+
    |         DRBL SERVER          |
    |                              |
    |    +-- [eth2] 140.110.X.X    +- to WAN
    |                              |
    |    +-- [eth1] 192.168.1.254  +- to clients group 1 [ 6 clients, their IP
    |                              |             from 192.168.1.1 - 192.168.1.12]
    +------------------------------+
    ******************************************************
    Total clients: 12
    ******************************************************
    

3.2 設定ssh

  • 編寫 /etc/ssh/ssh_config
    StrictHostKeyChecking no
    
  • 執行
    $ su -
    $ /opt/drbl/sbin/drbl-useradd -s hdfsadm hdfsgrp
    $ su - hdfsadm
    $ ssh-keygen -t rsa -b 1024 -N "" -f ~/.ssh/id_rsa
    $ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
    $ /etc/init.d/ssh restart
    
  • 正確無誤則可免密碼登入

3.2.1 dsh

$ sudo apt-get install dsh
$ mkdir -p .dsh
$ for ((i=1;i<=12;i++)); do echo "192.168.1.$i" >> .dsh/machines.list; done
  • 測試並執行
    $ dsh -a hostname
    $ dsh -a source /etc/bash.bashrc
    

3.3 啟動 Hadoop

  • 啟動
    $ cd /opt/hadoop
    $ bin/hadoop namenode -format
    $ bin/start-all.sh
    

3.4 Hadoop 測試範例

  • 運作WordCount以測試
    $ mkdir input
    $ cp *.txt input/
    $ bin/hadoop dfs -put input input 
    $ bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
    
  • 執行畫面:
    hadoop:/opt/hadoop# bin/hadoop jar hadoop-*-examples.jar wordcount input ouput
    09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/02/26 06:16:34 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/02/26 06:16:35 INFO mapred.JobClient: Running job: job_200902260615_0001
    09/02/26 06:16:36 INFO mapred.JobClient:  map 0% reduce 0%
    09/02/26 06:16:39 INFO mapred.JobClient:  map 80% reduce 0%
    09/02/26 06:16:40 INFO mapred.JobClient:  map 100% reduce 0%
    09/02/26 06:16:50 INFO mapred.JobClient: Job complete: job_200902260615_0001
    09/02/26 06:16:50 INFO mapred.JobClient: Counters: 16
    09/02/26 06:16:50 INFO mapred.JobClient:   File Systems
    09/02/26 06:16:50 INFO mapred.JobClient:     HDFS bytes read=267854
    09/02/26 06:16:50 INFO mapred.JobClient:     HDFS bytes written=100895
    09/02/26 06:16:50 INFO mapred.JobClient:     Local bytes read=133897
    09/02/26 06:16:50 INFO mapred.JobClient:     Local bytes written=292260
    09/02/26 06:16:50 INFO mapred.JobClient:   Job Counters 
    09/02/26 06:16:50 INFO mapred.JobClient:     Launched reduce tasks=1
    09/02/26 06:16:50 INFO mapred.JobClient:     Rack-local map tasks=5
    09/02/26 06:16:50 INFO mapred.JobClient:     Launched map tasks=5
    09/02/26 06:16:50 INFO mapred.JobClient:   Map-Reduce Framework
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce input groups=8123
    09/02/26 06:16:50 INFO mapred.JobClient:     Combine output records=17996
    09/02/26 06:16:50 INFO mapred.JobClient:     Map input records=6515
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce output records=8123
    09/02/26 06:16:50 INFO mapred.JobClient:     Map output bytes=385233
    09/02/26 06:16:50 INFO mapred.JobClient:     Map input bytes=265370
    09/02/26 06:16:50 INFO mapred.JobClient:     Combine input records=44786
    09/02/26 06:16:50 INFO mapred.JobClient:     Map output records=34913
    09/02/26 06:16:50 INFO mapred.JobClient:     Reduce input records=8123
    hadoop:/opt/hadoop# 
    
  • http://gm2.nchc.org.tw:50030/
    • 網頁中可以看到剛剛有在工作的node數

      gm2 Hadoop Map/Reduce Administration

      State: RUNNING
      Started: Tue Mar 17 16:22:46 EDT 2009
      Version: 0.18.3, r736250
      Compiled: Thu Jan 22 23:12:08 UTC 2009 by ndaley
      Identifier: 200903171622

      Cluster Summary

      MapsReducesTotal SubmissionsNodesMap Task CapacityReduce Task CapacityAvg. Tasks/Node
      001918184.00

      Running Jobs

      Running Jobs
      none

      Completed Jobs

      Completed Jobs
      JobidUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces Completed
      job_200903171622_0001hdfsadmwordcount100.00%
      55100.00%
      1 1

3.5 停止hadoop

$ bin/stop-all.sh

3.6 重新建立 hadoop

$ bin/stop-all.sh
$ dsh -a rm -rf ~/hdfs/logs/* ~/hdfs/pids/* /tmp/hadoop/*
$ bin/hadoop namenode -format
$ bin/start-all.sh

四、操作

4.1 帳號

  • 增加一個hadoop帳號huser,使之可以在hdfs上自己的目錄內進行存取瀏覽的操作
  1. 在drbl系統新增帳號 huser
    $ su -
    $ /opt/drbl/sbin/drbl-useradd -s huser huser 
    
  2. 用hdfs的superuser(此篇文章為hdfsadm)在hdfs上建立資料夾
    $ su - hdfsadm
    $ /opt/hadoop/bin/hadoop dfs -mkdir /user/huser
    
  3. 用superuser 設定hdfs上該資料夾的權限與擁有者
    $ /opt/hadoop/bin/hadoop dfs -chown -R huser /user/huser
    $ /opt/hadoop/bin/hadoop dfs -chmod -R 775 /user/huser
    
  4. 測試:用huser瀏覽或寫入檔案
    <root>$ su - huser
    <huser>$ cd /opt/hadoop/
    <huser>$ /opt/hadoop/bin/hadoop dfs -put input /user/huser/input
    <huser>$ /opt/hadoop/bin/hadoop dfs -ls /user/huser/input
    

4.2 多帳號執行

  • 測試兩個user: rock , waue 同時執行,沒有問題
    bin/hadoop jar hadoop-*-examples.jar wordcount input/ ouput/
    

網頁結果:

Completed Jobs

Completed Jobs
JobidUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces Completed
job_200903061742_0001wauewordcount100.00%
11100.00%
1 1
job_200903061742_0002rockwordcount100.00%
11100.00%
1 1
job_200903061742_0003wauewordcount100.00%
11100.00%
1 1

Failed Jobs

五、參考資料

六、問題排解

  • drbl似乎安裝不順

drblsrv -i 出現以下錯誤訊息

Kernel 2.6 was found, so default to use initramfs.
The requested kernel "" 2.6.18-6-amd64 kernel files are NOT found in  /tftpboot/node_root/lib/modules/s and /tftpboot/node_root/boot in the server! The necessary modules in the network initrd can NOT be created!
Client will NOT remote boot correctly!
Program terminated!
Done!

原因: apt 的鏡像站台沒有複製到資料因此無法安裝新kernel,導致出現問題

  • Hadoop 空間問題
    hdfsadm@hadoop102:~$ df /tmp
    Filesystem           1K-blocks      Used Available Use% Mounted on
    tmpfs                  1018360       712   1017648   1% /tmp
    hdfsadm@hadoop102:~$ df /
    Filesystem           1K-blocks      Used Available Use% Mounted on
    192.168.1.254:/tftpboot/node_root
                          28834720  20961600   6408384  77% /
    
    

datanode hadoop102的空間大小指稱有0.98G的空間,因此研判datanode的空間大小為hadoop.tmp.dir所設定的目錄剩餘空間

  • 如何增加drbl_hadoop 模式下 datanode的可用空間
    1. 將硬碟mount起來到/hdfs_data資料夾
    2. 將hadoop.tmp.dir設在各個datanode的 /hdfs_data中