| 1 | |
| 2 | |
| 3 | |
| 4 | == 升級 == |
| 5 | * 由於換版本的話,資料夾內的conf設定檔也勢必被更改,因此目前作法為: 把conf 移至/opt/conf ,hadoop 0.16 與 hadoop 0.18用 ln 做捷徑代換。由於conf已不在hadoop_home內,因此記得匯入conf/hadoop-env.sh |
| 6 | {{{ |
| 7 | $ source /opt/conf/hadoop-env.sh |
| 8 | }}} |
| 9 | |
| 10 | * 先看狀態 |
| 11 | {{{ |
| 12 | $ bin/hadoop dfsadmin -upgradeProgress status |
| 13 | |
| 14 | There are no upgrades in progress. |
| 15 | }}} |
| 16 | |
| 17 | * 停止hdfs |
| 18 | * 注意不可使用bin/stop-all.sh來停止 |
| 19 | {{{ |
| 20 | $ bin/stop-dfs.sh |
| 21 | }}} |
| 22 | |
| 23 | * 部署新版本的Hadoop |
| 24 | * 注意每個node的版本都要統一,否則會出現問題 |
| 25 | |
| 26 | |
| 27 | * 啟動 |
| 28 | {{{ |
| 29 | $ bin/start-dfs.sh -upgrade |
| 30 | }}} |
| 31 | |
| 32 | ps:之後有介紹到 bin/hadoop namenode -upgrade ,應該要查查看與 $ bin/start-dfs.sh -upgrade 有何不同 |
| 33 | |
| 34 | * namenode管理網頁會出現升級狀態 |
| 35 | |
| 36 | |
| 37 | ----- |
| 38 | |
| 39 | == 退回 == |
| 40 | |
| 41 | * 停止集群 |
| 42 | {{{ |
| 43 | $ bin/stop-dfs.sh |
| 44 | }}} |
| 45 | * 部署老版本的Hadoop |
| 46 | |
| 47 | * 退回之前版本 |
| 48 | {{{ |
| 49 | $ bin/start-dfs.sh -rollback |
| 50 | }}} |
| 51 | ps:之後有介紹到 bin/hadoop namenode -rollback ,應該要查查看與 $ bin/start-dfs.sh -rollback 有何不同 |
| 52 | |
| 53 | |
| 54 | ----- |
| 55 | == fsck == |
| 56 | * HDFS文件系統檢查工具 |
| 57 | |
| 58 | {{{ |
| 59 | $ bin/hadoop fsck / |
| 60 | |
| 61 | . |
| 62 | /user/waue/input/1.txt: Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s). |
| 63 | /user/waue/input/1.txt: Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s). |
| 64 | . |
| 65 | /user/waue/input/2.txt: Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s). |
| 66 | . |
| 67 | /user/waue/input/3.txt: Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s). |
| 68 | . |
| 69 | /user/waue/input/4.txt: Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s). |
| 70 | Status: HEALTHY |
| 71 | Total size: 143451003 B |
| 72 | Total dirs: 8 |
| 73 | Total files: 4 |
| 74 | Total blocks (validated): 5 (avg. block size 28690200 B) |
| 75 | Minimally replicated blocks: 5 (100.0 %) |
| 76 | Over-replicated blocks: 0 (0.0 %) |
| 77 | Under-replicated blocks: 5 (100.0 %) |
| 78 | Mis-replicated blocks: 0 (0.0 %) |
| 79 | Default replication factor: 3 |
| 80 | Average block replication: 2.0 |
| 81 | Corrupt blocks: 0 |
| 82 | Missing replicas: 5 (50.0 %) |
| 83 | Number of data-nodes: 2 |
| 84 | Number of racks: 1 |
| 85 | The filesystem under path '/' is HEALTHY |
| 86 | }}} |
| 87 | |
| 88 | * 加不同的參數有不同的用處,如 |
| 89 | {{{ |
| 90 | $ bin/hadoop fsck / -files |
| 91 | |
| 92 | /tmp <dir> |
| 93 | /tmp/hadoop <dir> |
| 94 | /tmp/hadoop/hadoop-waue <dir> |
| 95 | /tmp/hadoop/hadoop-waue/mapred <dir> |
| 96 | /tmp/hadoop/hadoop-waue/mapred/system <dir> |
| 97 | /user <dir> |
| 98 | /user/waue <dir> |
| 99 | /user/waue/input <dir> |
| 100 | /user/waue/input/1.txt 115045564 bytes, 2 block(s): Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s). |
| 101 | Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s). |
| 102 | /user/waue/input/2.txt 987864 bytes, 1 block(s): Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s). |
| 103 | /user/waue/input/3.txt 1573048 bytes, 1 block(s): Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s). |
| 104 | /user/waue/input/4.txt 25844527 bytes, 1 block(s): Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s). |
| 105 | Status: HEALTHY |
| 106 | ....(同上) |
| 107 | }}} |
| 108 | |
| 109 | ----- |
| 110 | |
| 111 | == job == |
| 112 | * 用以跟Map Reduce 的作業程序溝通 |
| 113 | * 在測試此指令之前,請確認已經先執行過mapReduce的程序過 |
| 114 | * 可到JobTracker:50030網頁來看程序的Jobid |
| 115 | === -status === |
| 116 | * 查看工作狀態 |
| 117 | {{{ |
| 118 | $ bin/hadoop job -status job_200904021140_0001 |
| 119 | |
| 120 | }}} |
| 121 | === -kill === |
| 122 | * 終止正在執行的程序,其id為 job_200904021140_0001 |
| 123 | {{{ |
| 124 | $ bin/hadoop job -kill job_200904021140_0001 |
| 125 | }}} |
| 126 | === -list === |
| 127 | * 印出所有程序的狀態 |
| 128 | {{{ |
| 129 | $ bin/hadoop job -list all |
| 130 | |
| 131 | 5 jobs submitted |
| 132 | States are: |
| 133 | Running : 1 Succeded : 2 Failed : 3 Prep : 4 |
| 134 | JobId State StartTime UserName |
| 135 | job_200904021140_0001 2 1238652150499 waue |
| 136 | job_200904021140_0002 3 1238657754096 waue |
| 137 | job_200904021140_0004 3 1238657989495 waue |
| 138 | job_200904021140_0005 2 1238658076347 waue |
| 139 | job_200904021140_0006 2 1238658644666 waue |
| 140 | }}} |
| 141 | |
| 142 | === -history === |
| 143 | * 印出程序的歷史狀態 |
| 144 | {{{ |
| 145 | $ bin/hadoop job -history /user/waue/stream-output1 |
| 146 | |
| 147 | Hadoop job: job_200904021140_0005 |
| 148 | ===================================== |
| 149 | Job tracker host name: gm1.nchc.org.tw |
| 150 | job tracker start time: Thu Apr 02 11:40:06 CST 2009 |
| 151 | User: waue |
| 152 | JobName: streamjob9019.jar |
| 153 | JobConf: hdfs://gm1.nchc.org.tw:9000/tmp/hadoop/hadoop-waue/mapred/system/job_200904021140_0005/job.xml |
| 154 | Submitted At: 2-四月-2009 15:41:16 |
| 155 | Launched At: 2-四月-2009 15:41:16 (0sec) |
| 156 | Finished At: 2-四月-2009 15:42:04 (48sec) |
| 157 | Status: SUCCESS |
| 158 | ===================================== |
| 159 | ...略 |
| 160 | }}} |
| 161 | ----- |
| 162 | |
| 163 | == version == |
| 164 | * 印出目前的hadoop 版本 |
| 165 | {{{ |
| 166 | bin/hadoop version |
| 167 | |
| 168 | Hadoop 0.18.3 |
| 169 | Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250 |
| 170 | Compiled by ndaley on Thu Jan 22 23:12:08 UTC 2009 |
| 171 | }}} |
| 172 | |
| 173 | ----- |
| 174 | = HDFS權限管理用戶 = |
| 175 | |
| 176 | * hdfs的權限有owner, group, other三種 |
| 177 | * 而用戶的身份取決於client上的使用者 (用 whoami),群組為(bash -c groups) |
| 178 | * 相關的操作: |
| 179 | {{{ |
| 180 | $ bin/hadoop dfs -mkdir own |
| 181 | $ bin/hadoop dfs -chmod -R 755 own |
| 182 | $ bin/hadoop dfs -chgrp -R waue own |
| 183 | $ bin/hadoop dfs -chown -R waue own |
| 184 | $ bin/hadoop dfs -lsr own |
| 185 | }}} |
| 186 | * conf/hadoop-site.xml 可用參數: |
| 187 | {{{ |
| 188 | #!php |
| 189 | dfs.permissions = true |
| 190 | dfs.web.ugi = webuser,webgroup |
| 191 | dfs.permissions.supergroup = supergroup |
| 192 | dfs.upgrade.permission = 777 |
| 193 | dfs.umask = 022 |
| 194 | }}} |
| 195 | |
| 196 | === -setQuota === |
| 197 | * 目錄配額是對目錄樹上該目錄下的名字數量做硬性限制 |
| 198 | * 設定配額,數字代表個數 (如:我上傳了一個2個block的檔案可以上傳,但我上傳兩個檔案很小的檔上去卻不行) |
| 199 | * 配額為1可以強制目錄保持為空 |
| 200 | * 重命名不會改變該目錄的配額 |
| 201 | {{{ |
| 202 | $ bin/hadoop dfs -mkdir quota |
| 203 | $ bin/hadoop dfsadmin -setQuota 2 quota |
| 204 | $ bin/hadoop dfs -put ../conf/hadoop-env.sh quota/ |
| 205 | $ bin/hadoop dfs -put ../conf/hadoop-site.xml quota/ |
| 206 | |
| 207 | put: org.apache.hadoop.dfs.QuotaExceededException: The quota of /user/waue/quota is exceeded: quota=2 count=3 |
| 208 | }}} |
| 209 | |
| 210 | * 檢查目錄的配額方法: "bin/hadoop fs -count -q <目錄> " |
| 211 | |
| 212 | {{{ |
| 213 | $ bin/hadoop fs -count -q own |
| 214 | none inf 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own |
| 215 | $ bin/hadoop dfsadmin -setQuota 4 own |
| 216 | $ bin/hadoop fs -count -q own |
| 217 | 4 3 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own |
| 218 | }}} |
| 219 | |
| 220 | === -clrQuota === |
| 221 | * 清除之前設定的配額 |
| 222 | {{{ |
| 223 | $ bin/hadoop dfsadmin -clrQuota quota/ |
| 224 | }}} |
| 225 | |
| 226 | ----- |
| 227 | = Hadoop Streaming 函式庫用法 = |
| 228 | * Hadoop streaming是Hadoop的一個工具, 它幫助用戶創建和運行一類特殊的map/reduce作業, 這些特殊的map/reduce作業是由一些可執行文件或腳本文件充當mapper或者reducer |
| 229 | * 最簡單的透過shell執行stream的map reduce: |
| 230 | {{{ |
| 231 | $ bin/hadoop jar hadoop-0.18.3-streaming.jar -input input -output stream-output1 -mapper /bin/cat -reducer /usr/bin/wc |
| 232 | }}} |
| 233 | * 輸出的結果為: (代表 行、字數、字元數) |
| 234 | {{{ |
| 235 | #!sh |
| 236 | 2910628 24507806 143451003 |
| 237 | }}} |