= 動態加入datanode 與 tasktracker =
* 是否能連到正確的namenode取決於conf/hadoop-site.xml,目前測試結果與conf/slave、masters無關
{{{
$ bin/hadoop-daemon.sh --config ./conf start datanode
starting datanode, logging to /tmp/hadoop/logs/hadoop-waue-datanode-Dx7200.out
}}}
{{{
$ bin/hadoop-daemon.sh --config ./conf start tasktracker
starting tasktracker, logging to /tmp/hadoop/logs/hadoop-waue-tasktracker-Dx7200.out
}}}
-----
= balancer =
用於分析數據塊分佈和重新平衡!DataNode上的數據分佈
{{{
$ bin/hadoop balancer
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
09/04/01 18:00:08 INFO net.NetworkTopology: Adding a new node: /default-rack/140.110.138.191:50010
09/04/01 18:00:08 INFO net.NetworkTopology: Adding a new node: /default-rack/140.110.141.129:50010
09/04/01 18:00:08 INFO dfs.Balancer: 0 over utilized nodes:
09/04/01 18:00:08 INFO dfs.Balancer: 0 under utilized nodes:
The cluster is balanced. Exiting...
Balancing took 186.0 milliseconds
}}}
== 升級 ==
* 由於換版本的話,資料夾內的conf設定檔也勢必被更改,因此目前作法為: 把conf 移至/opt/conf ,hadoop 0.16 與 hadoop 0.18用 ln 做捷徑代換。由於conf已不在hadoop_home內,因此記得匯入conf/hadoop-env.sh
{{{
$ source /opt/conf/hadoop-env.sh
}}}
* 先看狀態
{{{
$ bin/hadoop dfsadmin -upgradeProgress status
There are no upgrades in progress.
}}}
* 停止hdfs
* 注意不可使用bin/stop-all.sh來停止
{{{
$ bin/stop-dfs.sh
}}}
* 部署新版本的Hadoop
* 注意每個node的版本都要統一,否則會出現問題
* 啟動
{{{
$ bin/start-dfs.sh -upgrade
}}}
ps:之後有介紹到 bin/hadoop namenode -upgrade ,應該要查查看與 $ bin/start-dfs.sh -upgrade 有何不同
* namenode管理網頁會出現升級狀態
-----
== 退回 ==
* 停止集群
{{{
$ bin/stop-dfs.sh
}}}
* 部署老版本的Hadoop
* 退回之前版本
{{{
$ bin/start-dfs.sh -rollback
}}}
ps:之後有介紹到 bin/hadoop namenode -rollback ,應該要查查看與 $ bin/start-dfs.sh -rollback 有何不同
-----
== fsck ==
* HDFS文件系統檢查工具
{{{
$ bin/hadoop fsck /
.
/user/waue/input/1.txt: Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s).
/user/waue/input/1.txt: Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s).
.
/user/waue/input/2.txt: Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s).
.
/user/waue/input/3.txt: Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s).
.
/user/waue/input/4.txt: Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s).
Status: HEALTHY
Total size: 143451003 B
Total dirs: 8
Total files: 4
Total blocks (validated): 5 (avg. block size 28690200 B)
Minimally replicated blocks: 5 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 5 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 5 (50.0 %)
Number of data-nodes: 2
Number of racks: 1
The filesystem under path '/' is HEALTHY
}}}
* 加不同的參數有不同的用處,如
{{{
$ bin/hadoop fsck / -files
/tmp
/tmp/hadoop
/tmp/hadoop/hadoop-waue
/tmp/hadoop/hadoop-waue/mapred
/tmp/hadoop/hadoop-waue/mapred/system
/user
/user/waue
/user/waue/input
/user/waue/input/1.txt 115045564 bytes, 2 block(s): Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s).
/user/waue/input/2.txt 987864 bytes, 1 block(s): Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s).
/user/waue/input/3.txt 1573048 bytes, 1 block(s): Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s).
/user/waue/input/4.txt 25844527 bytes, 1 block(s): Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s).
Status: HEALTHY
....(同上)
}}}
-----
== job ==
* 用以跟Map Reduce 的作業程序溝通
* 在測試此指令之前,請確認已經先執行過mapReduce的程序過
* 可到JobTracker:50030網頁來看程序的Jobid
=== -status ===
* 查看工作狀態
{{{
$ bin/hadoop job -status job_200904021140_0001
}}}
=== -kill ===
* 終止正在執行的程序,其id為 job_200904021140_0001
{{{
$ bin/hadoop job -kill job_200904021140_0001
}}}
=== -list ===
* 印出所有程序的狀態
{{{
$ bin/hadoop job -list all
5 jobs submitted
States are:
Running : 1 Succeded : 2 Failed : 3 Prep : 4
JobId State StartTime UserName
job_200904021140_0001 2 1238652150499 waue
job_200904021140_0002 3 1238657754096 waue
job_200904021140_0004 3 1238657989495 waue
job_200904021140_0005 2 1238658076347 waue
job_200904021140_0006 2 1238658644666 waue
}}}
=== -history ===
* 印出程序的歷史狀態
{{{
$ bin/hadoop job -history /user/waue/stream-output1
Hadoop job: job_200904021140_0005
=====================================
Job tracker host name: gm1.nchc.org.tw
job tracker start time: Thu Apr 02 11:40:06 CST 2009
User: waue
JobName: streamjob9019.jar
JobConf: hdfs://gm1.nchc.org.tw:9000/tmp/hadoop/hadoop-waue/mapred/system/job_200904021140_0005/job.xml
Submitted At: 2-四月-2009 15:41:16
Launched At: 2-四月-2009 15:41:16 (0sec)
Finished At: 2-四月-2009 15:42:04 (48sec)
Status: SUCCESS
=====================================
...略
}}}
-----
== version ==
* 印出目前的hadoop 版本
{{{
bin/hadoop version
Hadoop 0.18.3
Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250
Compiled by ndaley on Thu Jan 22 23:12:08 UTC 2009
}}}
-----
= HDFS權限管理用戶 =
* hdfs的權限有owner, group, other三種
* 而用戶的身份取決於client上的使用者 (用 whoami),群組為(bash -c groups)
* 相關的操作:
{{{
$ bin/hadoop dfs -mkdir own
$ bin/hadoop dfs -chmod -R 755 own
$ bin/hadoop dfs -chgrp -R waue own
$ bin/hadoop dfs -chown -R waue own
$ bin/hadoop dfs -lsr own
}}}
* conf/hadoop-site.xml 可用參數:
{{{
#!php
dfs.permissions = true
dfs.web.ugi = webuser,webgroup
dfs.permissions.supergroup = supergroup
dfs.upgrade.permission = 777
dfs.umask = 022
}}}
=== -setQuota ===
* 目錄配額是對目錄樹上該目錄下的名字數量做硬性限制
* 設定配額,數字代表個數 (如:我上傳了一個2個block的檔案可以上傳,但我上傳兩個檔案很小的檔上去卻不行)
* 配額為1可以強制目錄保持為空
* 重命名不會改變該目錄的配額
{{{
$ bin/hadoop dfs -mkdir quota
$ bin/hadoop dfsadmin -setQuota 2 quota
$ bin/hadoop dfs -put ../conf/hadoop-env.sh quota/
$ bin/hadoop dfs -put ../conf/hadoop-site.xml quota/
put: org.apache.hadoop.dfs.QuotaExceededException: The quota of /user/waue/quota is exceeded: quota=2 count=3
}}}
* 檢查目錄的配額方法: "bin/hadoop fs -count -q <目錄> "
{{{
$ bin/hadoop fs -count -q own
none inf 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own
$ bin/hadoop dfsadmin -setQuota 4 own
$ bin/hadoop fs -count -q own
4 3 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own
}}}
=== -clrQuota ===
* 清除之前設定的配額
{{{
$ bin/hadoop dfsadmin -clrQuota quota/
}}}
-----
= Hadoop Streaming 函式庫用法 =
* Hadoop streaming是Hadoop的一個工具, 它幫助用戶創建和運行一類特殊的map/reduce作業, 這些特殊的map/reduce作業是由一些可執行文件或腳本文件充當mapper或者reducer
* 最簡單的透過shell執行stream的map reduce:
{{{
$ bin/hadoop jar hadoop-0.18.3-streaming.jar -input input -output stream-output1 -mapper /bin/cat -reducer /usr/bin/wc
}}}
* 輸出的結果為: (代表 行、字數、字元數)
{{{
#!sh
2910628 24507806 143451003
}}}