Changes between Initial Version and Version 1 of 0428Hadoop_Lab4


Ignore:
Timestamp:
Apr 26, 2009, 1:50:30 AM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 0428Hadoop_Lab4

    v1 v1  
     1
     2
     3
     4== 升級 ==
     5 * 由於換版本的話,資料夾內的conf設定檔也勢必被更改,因此目前作法為: 把conf 移至/opt/conf ,hadoop 0.16 與 hadoop 0.18用 ln 做捷徑代換。由於conf已不在hadoop_home內,因此記得匯入conf/hadoop-env.sh
     6{{{
     7$ source /opt/conf/hadoop-env.sh
     8}}}
     9
     10 * 先看狀態
     11{{{
     12$ bin/hadoop dfsadmin -upgradeProgress status
     13
     14There are no upgrades in progress.
     15}}}
     16
     17 * 停止hdfs
     18   * 注意不可使用bin/stop-all.sh來停止
     19{{{
     20$ bin/stop-dfs.sh
     21}}}
     22
     23 * 部署新版本的Hadoop
     24   * 注意每個node的版本都要統一,否則會出現問題
     25
     26
     27 * 啟動
     28{{{
     29$ bin/start-dfs.sh -upgrade
     30}}}
     31
     32 ps:之後有介紹到 bin/hadoop namenode -upgrade ,應該要查查看與 $ bin/start-dfs.sh -upgrade 有何不同
     33
     34 * namenode管理網頁會出現升級狀態
     35
     36
     37-----
     38
     39== 退回 ==
     40
     41 * 停止集群
     42{{{
     43$ bin/stop-dfs.sh
     44}}}
     45 * 部署老版本的Hadoop
     46
     47 * 退回之前版本
     48{{{
     49$ bin/start-dfs.sh -rollback
     50}}}
     51 ps:之後有介紹到 bin/hadoop namenode -rollback ,應該要查查看與 $ bin/start-dfs.sh -rollback 有何不同
     52
     53
     54-----
     55 == fsck ==
     56 * HDFS文件系統檢查工具
     57
     58{{{
     59$ bin/hadoop fsck /
     60
     61.
     62/user/waue/input/1.txt:  Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s).
     63/user/waue/input/1.txt:  Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s).
     64.
     65/user/waue/input/2.txt:  Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s).
     66.
     67/user/waue/input/3.txt:  Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s).
     68.
     69/user/waue/input/4.txt:  Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s).
     70Status: HEALTHY
     71 Total size:    143451003 B
     72 Total dirs:    8
     73 Total files:   4
     74 Total blocks (validated):      5 (avg. block size 28690200 B)
     75 Minimally replicated blocks:   5 (100.0 %)
     76 Over-replicated blocks:        0 (0.0 %)
     77 Under-replicated blocks:       5 (100.0 %)
     78 Mis-replicated blocks:         0 (0.0 %)
     79 Default replication factor:    3
     80 Average block replication:     2.0
     81 Corrupt blocks:                0
     82 Missing replicas:              5 (50.0 %)
     83 Number of data-nodes:          2
     84 Number of racks:               1
     85The filesystem under path '/' is HEALTHY
     86}}}
     87
     88 * 加不同的參數有不同的用處,如
     89{{{
     90$ bin/hadoop fsck / -files
     91
     92/tmp <dir>
     93/tmp/hadoop <dir>
     94/tmp/hadoop/hadoop-waue <dir>
     95/tmp/hadoop/hadoop-waue/mapred <dir>
     96/tmp/hadoop/hadoop-waue/mapred/system <dir>
     97/user <dir>
     98/user/waue <dir>
     99/user/waue/input <dir>
     100/user/waue/input/1.txt 115045564 bytes, 2 block(s):  Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s).
     101 Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s).
     102/user/waue/input/2.txt 987864 bytes, 1 block(s):  Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s).
     103/user/waue/input/3.txt 1573048 bytes, 1 block(s):  Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s).
     104/user/waue/input/4.txt 25844527 bytes, 1 block(s):  Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s).
     105Status: HEALTHY
     106....(同上)
     107}}}
     108
     109-----
     110
     111== job  ==
     112 * 用以跟Map Reduce 的作業程序溝通
     113 * 在測試此指令之前,請確認已經先執行過mapReduce的程序過
     114 * 可到JobTracker:50030網頁來看程序的Jobid
     115=== -status ===
     116 * 查看工作狀態
     117{{{
     118$ bin/hadoop job -status job_200904021140_0001
     119
     120}}}
     121=== -kill ===
     122 * 終止正在執行的程序,其id為 job_200904021140_0001
     123{{{
     124$ bin/hadoop job -kill job_200904021140_0001
     125}}}
     126=== -list ===
     127 * 印出所有程序的狀態
     128{{{
     129$ bin/hadoop job -list all
     130
     1315 jobs submitted
     132States are:
     133        Running : 1     Succeded : 2    Failed : 3      Prep : 4
     134JobId   State   StartTime       UserName
     135job_200904021140_0001   2       1238652150499   waue
     136job_200904021140_0002   3       1238657754096   waue
     137job_200904021140_0004   3       1238657989495   waue
     138job_200904021140_0005   2       1238658076347   waue
     139job_200904021140_0006   2       1238658644666   waue
     140}}}
     141
     142=== -history ===
     143 * 印出程序的歷史狀態
     144{{{
     145$ bin/hadoop job -history /user/waue/stream-output1
     146
     147Hadoop job: job_200904021140_0005
     148=====================================
     149Job tracker host name: gm1.nchc.org.tw
     150job tracker start time: Thu Apr 02 11:40:06 CST 2009
     151User: waue
     152JobName: streamjob9019.jar
     153JobConf: hdfs://gm1.nchc.org.tw:9000/tmp/hadoop/hadoop-waue/mapred/system/job_200904021140_0005/job.xml
     154Submitted At: 2-四月-2009 15:41:16
     155Launched At: 2-四月-2009 15:41:16 (0sec)
     156Finished At: 2-四月-2009 15:42:04 (48sec)
     157Status: SUCCESS
     158=====================================
     159...略
     160}}}
     161-----
     162
     163== version ==
     164 * 印出目前的hadoop 版本
     165{{{
     166bin/hadoop version
     167
     168Hadoop 0.18.3
     169Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250
     170Compiled by ndaley on Thu Jan 22 23:12:08 UTC 2009
     171}}}
     172
     173-----
     174= HDFS權限管理用戶 =
     175
     176 * hdfs的權限有owner, group, other三種
     177 * 而用戶的身份取決於client上的使用者 (用 whoami),群組為(bash -c groups)
     178 * 相關的操作:
     179{{{
     180$ bin/hadoop dfs -mkdir own
     181$ bin/hadoop dfs -chmod -R 755 own
     182$ bin/hadoop dfs -chgrp -R waue own
     183$ bin/hadoop dfs -chown -R waue own
     184$ bin/hadoop dfs -lsr own
     185}}}
     186 * conf/hadoop-site.xml 可用參數:
     187{{{
     188#!php
     189dfs.permissions = true
     190dfs.web.ugi = webuser,webgroup
     191dfs.permissions.supergroup = supergroup
     192dfs.upgrade.permission = 777
     193dfs.umask = 022
     194}}}
     195
     196=== -setQuota ===
     197 * 目錄配額是對目錄樹上該目錄下的名字數量做硬性限制
     198 * 設定配額,數字代表個數 (如:我上傳了一個2個block的檔案可以上傳,但我上傳兩個檔案很小的檔上去卻不行)
     199 * 配額為1可以強制目錄保持為空
     200 * 重命名不會改變該目錄的配額
     201{{{
     202$ bin/hadoop dfs -mkdir quota
     203$ bin/hadoop dfsadmin -setQuota 2 quota
     204$ bin/hadoop dfs -put ../conf/hadoop-env.sh quota/
     205$ bin/hadoop dfs -put ../conf/hadoop-site.xml quota/
     206
     207put: org.apache.hadoop.dfs.QuotaExceededException: The quota of /user/waue/quota is exceeded: quota=2 count=3
     208}}}
     209
     210 * 檢查目錄的配額方法: "bin/hadoop fs -count -q <目錄> "
     211
     212{{{
     213$ bin/hadoop fs -count -q own
     214   none  inf  1    0     0 hdfs://gm1.nchc.org.tw:9000/user/waue/own
     215$ bin/hadoop dfsadmin -setQuota 4 own
     216$ bin/hadoop fs -count -q own
     217    4     3   1    0     0 hdfs://gm1.nchc.org.tw:9000/user/waue/own
     218}}}
     219
     220 === -clrQuota ===
     221 * 清除之前設定的配額
     222{{{
     223$ bin/hadoop dfsadmin -clrQuota quota/
     224}}}
     225
     226-----
     227= Hadoop Streaming 函式庫用法 =
     228 * Hadoop streaming是Hadoop的一個工具, 它幫助用戶創建和運行一類特殊的map/reduce作業, 這些特殊的map/reduce作業是由一些可執行文件或腳本文件充當mapper或者reducer
     229 * 最簡單的透過shell執行stream的map reduce:
     230{{{
     231$ bin/hadoop jar hadoop-0.18.3-streaming.jar -input input -output stream-output1 -mapper /bin/cat -reducer /usr/bin/wc
     232}}}
     233   * 輸出的結果為: (代表 行、字數、字元數)
     234{{{
     235#!sh
     2362910628 24507806 143451003
     237}}}