wiki:0428Hadoop_Lab4

Version 1 (modified by waue, 15 years ago) (diff)

--

升級

  • 由於換版本的話,資料夾內的conf設定檔也勢必被更改,因此目前作法為: 把conf 移至/opt/conf ,hadoop 0.16 與 hadoop 0.18用 ln 做捷徑代換。由於conf已不在hadoop_home內,因此記得匯入conf/hadoop-env.sh
    $ source /opt/conf/hadoop-env.sh
    
  • 先看狀態
    $ bin/hadoop dfsadmin -upgradeProgress status
    
    There are no upgrades in progress.
    
  • 停止hdfs
    • 注意不可使用bin/stop-all.sh來停止
      $ bin/stop-dfs.sh
      
  • 部署新版本的Hadoop
    • 注意每個node的版本都要統一,否則會出現問題
  • 啟動
    $ bin/start-dfs.sh -upgrade
    

ps:之後有介紹到 bin/hadoop namenode -upgrade ,應該要查查看與 $ bin/start-dfs.sh -upgrade 有何不同

  • namenode管理網頁會出現升級狀態

退回

  • 停止集群
    $ bin/stop-dfs.sh
    
  • 部署老版本的Hadoop
  • 退回之前版本
    $ bin/start-dfs.sh -rollback
    
    ps:之後有介紹到 bin/hadoop namenode -rollback ,應該要查查看與 $ bin/start-dfs.sh -rollback 有何不同

fsck

  • HDFS文件系統檢查工具
$ bin/hadoop fsck /

.
/user/waue/input/1.txt:  Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s).
/user/waue/input/1.txt:  Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s).
.
/user/waue/input/2.txt:  Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s).
.
/user/waue/input/3.txt:  Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s).
.
/user/waue/input/4.txt:  Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s).
Status: HEALTHY
 Total size:	143451003 B
 Total dirs:	8
 Total files:	4
 Total blocks (validated):	5 (avg. block size 28690200 B)
 Minimally replicated blocks:	5 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	5 (100.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	2.0
 Corrupt blocks:		0
 Missing replicas:		5 (50.0 %)
 Number of data-nodes:		2
 Number of racks:		1
The filesystem under path '/' is HEALTHY
  • 加不同的參數有不同的用處,如
    $ bin/hadoop fsck / -files
    
    /tmp <dir>
    /tmp/hadoop <dir>
    /tmp/hadoop/hadoop-waue <dir>
    /tmp/hadoop/hadoop-waue/mapred <dir>
    /tmp/hadoop/hadoop-waue/mapred/system <dir>
    /user <dir>
    /user/waue <dir>
    /user/waue/input <dir>
    /user/waue/input/1.txt 115045564 bytes, 2 block(s):  Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s).
     Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s).
    /user/waue/input/2.txt 987864 bytes, 1 block(s):  Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s).
    /user/waue/input/3.txt 1573048 bytes, 1 block(s):  Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s).
    /user/waue/input/4.txt 25844527 bytes, 1 block(s):  Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s).
    Status: HEALTHY
    ....(同上)
    

job

  • 用以跟Map Reduce 的作業程序溝通
  • 在測試此指令之前,請確認已經先執行過mapReduce的程序過
  • 可到JobTracker:50030網頁來看程序的Jobid

-status

  • 查看工作狀態
    $ bin/hadoop job -status job_200904021140_0001
    
    

-kill

  • 終止正在執行的程序,其id為 job_200904021140_0001
    $ bin/hadoop job -kill job_200904021140_0001
    

-list

  • 印出所有程序的狀態
    $ bin/hadoop job -list all
    
    5 jobs submitted
    States are:
    	Running : 1	Succeded : 2	Failed : 3	Prep : 4
    JobId	State	StartTime	UserName
    job_200904021140_0001	2	1238652150499	waue
    job_200904021140_0002	3	1238657754096	waue
    job_200904021140_0004	3	1238657989495	waue
    job_200904021140_0005	2	1238658076347	waue
    job_200904021140_0006	2	1238658644666	waue
    

-history

  • 印出程序的歷史狀態
    $ bin/hadoop job -history /user/waue/stream-output1
    
    Hadoop job: job_200904021140_0005
    =====================================
    Job tracker host name: gm1.nchc.org.tw
    job tracker start time: Thu Apr 02 11:40:06 CST 2009
    User: waue
    JobName: streamjob9019.jar
    JobConf: hdfs://gm1.nchc.org.tw:9000/tmp/hadoop/hadoop-waue/mapred/system/job_200904021140_0005/job.xml
    Submitted At: 2-四月-2009 15:41:16
    Launched At: 2-四月-2009 15:41:16 (0sec)
    Finished At: 2-四月-2009 15:42:04 (48sec)
    Status: SUCCESS
    =====================================
    ...略
    

version

  • 印出目前的hadoop 版本
    bin/hadoop version
    
    Hadoop 0.18.3
    Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250
    Compiled by ndaley on Thu Jan 22 23:12:08 UTC 2009
    

HDFS權限管理用戶

  • hdfs的權限有owner, group, other三種
  • 而用戶的身份取決於client上的使用者 (用 whoami),群組為(bash -c groups)
  • 相關的操作:
    $ bin/hadoop dfs -mkdir own
    $ bin/hadoop dfs -chmod -R 755 own
    $ bin/hadoop dfs -chgrp -R waue own
    $ bin/hadoop dfs -chown -R waue own
    $ bin/hadoop dfs -lsr own
    
  • conf/hadoop-site.xml 可用參數:
    dfs.permissions = true 
    dfs.web.ugi = webuser,webgroup
    dfs.permissions.supergroup = supergroup
    dfs.upgrade.permission = 777
    dfs.umask = 022
    

-setQuota

  • 目錄配額是對目錄樹上該目錄下的名字數量做硬性限制
  • 設定配額,數字代表個數 (如:我上傳了一個2個block的檔案可以上傳,但我上傳兩個檔案很小的檔上去卻不行)
  • 配額為1可以強制目錄保持為空
  • 重命名不會改變該目錄的配額
    $ bin/hadoop dfs -mkdir quota
    $ bin/hadoop dfsadmin -setQuota 2 quota
    $ bin/hadoop dfs -put ../conf/hadoop-env.sh quota/
    $ bin/hadoop dfs -put ../conf/hadoop-site.xml quota/
    
    put: org.apache.hadoop.dfs.QuotaExceededException: The quota of /user/waue/quota is exceeded: quota=2 count=3
    
  • 檢查目錄的配額方法: "bin/hadoop fs -count -q <目錄> "
$ bin/hadoop fs -count -q own
   none  inf  1    0     0 hdfs://gm1.nchc.org.tw:9000/user/waue/own
$ bin/hadoop dfsadmin -setQuota 4 own
$ bin/hadoop fs -count -q own
    4     3   1    0     0 hdfs://gm1.nchc.org.tw:9000/user/waue/own

-clrQuota

  • 清除之前設定的配額
    $ bin/hadoop dfsadmin -clrQuota quota/
    

Hadoop Streaming 函式庫用法

  • Hadoop streaming是Hadoop的一個工具, 它幫助用戶創建和運行一類特殊的map/reduce作業, 這些特殊的map/reduce作業是由一些可執行文件或腳本文件充當mapper或者reducer
  • 最簡單的透過shell執行stream的map reduce:
    $ bin/hadoop jar hadoop-0.18.3-streaming.jar -input input -output stream-output1 -mapper /bin/cat -reducer /usr/bin/wc
    
    • 輸出的結果為: (代表 行、字數、字元數)
      2910628 24507806 143451003