{{{ #!html
實作八: Hadoop 叢集進階用法
}}} [[PageOutline]] = 狀況一: 如何動態加入datanode 與 tasktracker = * 某些情況下,後來的環境也許會跟之前配置的不同,比如說,原本只有五台機器架設hadoop,但是也許老闆今天心血來潮,又撥了五台電腦給你。在接續之前的環境動態的擴增節點的方法,請看以下作法。 == 1.0 說明 == * 要新增的節點上,hadoop版本與設定檔要與原先的相同 * 是否能連到正確的位址取決於conf/hadoop-site.xml內的jobTracker, Namenode資訊是否正確 (目前測試結果與conf/slave、masters無關) == 1.1 加入datanode == * 在要加入的節點上面下此指令 {{{ $ cd $HADOOP_HOME $ bin/hadoop-daemon.sh --config ./conf start datanode }}} * 執行畫面如下: {{{ starting datanode, logging to /tmp/hadoop/logs/hadoop-waue-datanode-Dx7200.out }}} == 1.2 加入 taskTracker == * 是否能連到正確的namenode取決於conf/hadoop-site.xml,目前測試結果與conf/slave、masters無關 {{{ $ cd $HADOOP_HOME $ bin/hadoop-daemon.sh --config ./conf start tasktracker }}} * 執行畫面如下: {{{ starting tasktracker, logging to /tmp/hadoop/logs/hadoop-waue-tasktracker-Dx7200.out }}} ----- = 狀況二: 怎麼讓我的HDFS內資料均勻分散於各個Node上 = * 下面指令用於分析數據塊分佈和重新平衡!DataNode上的數據分佈 {{{ $ bin/hadoop balancer }}} * 執行畫面如下 {{{ Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 09/04/01 18:00:08 INFO net.NetworkTopology: Adding a new node: /default-rack/140.110.138.191:50010 09/04/01 18:00:08 INFO net.NetworkTopology: Adding a new node: /default-rack/140.110.141.129:50010 09/04/01 18:00:08 INFO dfs.Balancer: 0 over utilized nodes: 09/04/01 18:00:08 INFO dfs.Balancer: 0 under utilized nodes: The cluster is balanced. Exiting... Balancing took 186.0 milliseconds }}} ------ = 狀況三:如何讓已上線服務的Hadoop進行升級並且不失去以前的資料 = * 假設從原本的hadoop 0.18 升級到hadoop 0.20 * 如果把conf/這個資料夾至於$Hadoop_home目錄下的話,一旦換版本就連conf也被一併換掉,但無論hadoop的版本新舊,其實設定檔及其資訊是可以共用的。 == step 1. 停止hdfs == * 先看狀態 {{{ $ cd /opt/hadoop/ $ bin/hadoop dfsadmin -upgradeProgress status There are no upgrades in progress. }}} * 停止hdfs * 注意不可使用bin/stop-all.sh來停止 {{{ $ bin/stop-dfs.sh }}} == Step 2. 鍊結新版本hadoop == * 把conf 移至/opt/conf ,hadoop 0.18 與 hadoop 0.20用 ln 做捷徑代換。 * 以下假設你已經下載好hadoop0.20並解壓縮後,資料夾名稱為hadoop-0.20.3 {{{ $ cd opt/ $ mv hadoop/conf ./ $ mv hadoop hadoop-0.18 $ ln hadoop-0.20.3 hadoop }}} == step 3. 設置環境變數 == * 由於conf已不在hadoop_home內,因此記得匯入conf/hadoop-env.sh的參數 * 填入hadoop-env.sh 內$HADOOP_CONF_DIR正確路徑,並匯入資訊 {{{ $ source /opt/conf/hadoop-env.sh }}} == step 4. 每個節點都部署新版本的Hadoop == * 若有多個node的話,則每個node的hadoop版本都要統一,否則會出現問題 == step 5. 啟動 == {{{ $ bin/start-dfs.sh -upgrade }}} * namenode管理網頁會出現升級狀態 ------- = 狀況四:如何讓已上線服務的Hadoop進行降級並且不失去以前的資料 = * 此情況與狀況三相反,因此作法類似狀況三,下面的狀況假設設定檔已在/opt/conf內,並且/opt內也都有hadoop-0.18 與 hadoop-0.20.3 兩個資料夾,而且節點只有一台。 == step 1. 停止 HDFS == {{{ $ cd /opt/hadoop $ bin/stop-dfs.sh }}} == step 2. 部署老版本的Hadoop == {{{ $ rm /opt/hadoop $ ln hadoop-0.18 hadoop }}} == step 3. 退回之前版本 == {{{ $ bin/start-dfs.sh -rollback }}} ----- = 狀況五:我的HDFS檔案系統是否正常 = * 在此介紹HDFS文件系統檢查工具 "bin/hadoop fsck" {{{ $ bin/hadoop fsck / }}} * 執行畫面如下 {{{ . /user/waue/input/1.txt: Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s). /user/waue/input/1.txt: Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s). . /user/waue/input/2.txt: Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s). . /user/waue/input/3.txt: Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s). . /user/waue/input/4.txt: Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s). Status: HEALTHY Total size: 143451003 B Total dirs: 8 Total files: 4 Total blocks (validated): 5 (avg. block size 28690200 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 5 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.0 Corrupt blocks: 0 Missing replicas: 5 (50.0 %) Number of data-nodes: 2 Number of racks: 1 The filesystem under path '/' is HEALTHY }}} * 加不同的參數有不同的用處,如 {{{ $ bin/hadoop fsck / -files }}} * 執行畫面如下 {{{ /tmp /tmp/hadoop /tmp/hadoop/hadoop-waue /tmp/hadoop/hadoop-waue/mapred /tmp/hadoop/hadoop-waue/mapred/system /user /user/waue /user/waue/input /user/waue/input/1.txt 115045564 bytes, 2 block(s): Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s). Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s). /user/waue/input/2.txt 987864 bytes, 1 block(s): Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s). /user/waue/input/3.txt 1573048 bytes, 1 block(s): Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s). /user/waue/input/4.txt 25844527 bytes, 1 block(s): Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s). Status: HEALTHY ....(同上) }}} ----- = 狀況六:我的系統似乎跑太多Job,我要幫他減肥 = == step 1. 把所有程序列出來 == * 可到JobTracker:50030網頁來看程序的Jobid * 或用指令印出所有程序 {{{ $ bin/hadoop job -list all 5 jobs submitted States are: Running : 1 Succeded : 2 Failed : 3 Prep : 4 JobId State StartTime UserName job_200904021140_0001 2 1238652150499 waue job_200904021140_0002 3 1238657754096 waue job_200904021140_0004 3 1238657989495 waue job_200904021140_0005 2 1238658076347 waue job_200904021140_0006 2 1238658644666 waue }}} == step 2. more detail == * 查看工作狀態 {{{ $ bin/hadoop job -status job_200904021140_0001 }}} * 印出程序的歷史狀態 {{{ $ bin/hadoop job -history /user/waue/stream-output1 Hadoop job: job_200904021140_0005 ===================================== Job tracker host name: gm1.nchc.org.tw job tracker start time: Thu Apr 02 11:40:06 CST 2009 User: waue JobName: streamjob9019.jar JobConf: hdfs://gm1.nchc.org.tw:9000/tmp/hadoop/hadoop-waue/mapred/system/job_200904021140_0005/job.xml Submitted At: 2-四月-2009 15:41:16 Launched At: 2-四月-2009 15:41:16 (0sec) Finished At: 2-四月-2009 15:42:04 (48sec) Status: SUCCESS ===================================== ...略 }}} == step 3. 殺很大、殺不用錢 == * 終止正在執行的程序,如 id 為 job_200904021140_0001 {{{ $ bin/hadoop job -kill job_200904021140_0001 }}} -------- = 狀況七:怎麼看目前的Hadoop版本 = * 印出目前的hadoop 版本 {{{ $ bin/hadoop version }}} * 執行畫面 {{{ Hadoop 0.20.3 Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 736250 Compiled by ndaley on Thu Jan 22 23:12:08 UTC 2009 }}} ----- = 狀況八:我要設定HDFS的帳戶及配額 = == step 1. 先設定各使用者的預設資料夾,屬性及讀寫權限 == * hdfs的權限有owner, group, other三種 * 而用戶的身份取決於client上的使用者 (用 whoami),群組為(bash -c groups) * 相關的操作: {{{ $ bin/hadoop fs -mkdir own $ bin/hadoop fs -chmod -R 755 own $ bin/hadoop fs -chgrp -R waue own $ bin/hadoop fs -chown -R waue own $ bin/hadoop fs -lsr own }}} * conf/hadoop-site.xml 可用參數: {{{ #!php dfs.permissions = true dfs.web.ugi = webuser,webgroup dfs.permissions.supergroup = supergroup dfs.upgrade.permission = 777 dfs.umask = 022 }}} == step 2. 設定配額 == * 目錄配額是對目錄樹上該目錄下的名字數量做硬性限制 * 設定配額,數字代表個數 (如:我上傳了一個2個block的檔案可以上傳,但我上傳兩個檔案很小的檔上去卻不行) * 配額為1可以強制目錄保持為空 * 重命名不會改變該目錄的配額 {{{ $ bin/hadoop fs -mkdir quota $ bin/hadoop dfsadmin -setQuota 2 quota $ bin/hadoop fs -put ../conf/hadoop-env.sh quota/ $ bin/hadoop fs -put ../conf/hadoop-site.xml quota/ put: org.apache.hadoop.dfs.QuotaExceededException: The quota of /user/waue/quota is exceeded: quota=2 count=3 }}} * 檢查目錄的配額方法: "bin/hadoop fs -count -q <目錄> " {{{ $ bin/hadoop fs -count -q own none inf 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own $ bin/hadoop dfsadmin -setQuota 4 own $ bin/hadoop fs -count -q own 4 3 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own }}} * 清除之前設定的配額 {{{ $ bin/hadoop dfsadmin -clrQuota quota/ }}}