17 | | == Content 1: HDFS Shell 基本操作 == |
18 | | == Content 1: Basic HDFS Shell Commands == |
19 | | |
20 | | === 1.1 瀏覽你HDFS目錄 === |
21 | | === 1.1 Browsing Your HDFS Folder === |
22 | | |
23 | | {{{ |
24 | | ~$ hadoop fs -ls |
25 | | Found 1 items |
26 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
27 | | ~$ hadoop fs -lsr |
28 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
29 | | }}} |
30 | | |
31 | | === 1.2 上傳資料到 HDFS 目錄 === |
32 | | === 1.2 Upload Files or Folder to HDFS === |
33 | | |
34 | | * 上傳 Upload |
35 | | |
36 | | {{{ |
37 | | ~$ hadoop fs -put /etc/hadoop/conf input |
38 | | }}} |
39 | | |
40 | | * 檢查 Check |
41 | | |
42 | | {{{ |
43 | | ~$ hadoop fs -ls |
44 | | Found 2 items |
45 | | drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:16 /user/hXXXX/input |
46 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
47 | | ~$ hadoop fs -ls input |
48 | | Found 25 items |
49 | | -rw-r--r-- 2 hXXXX supergroup 321 2011-04-19 09:16 /user/hXXXX/input/README |
50 | | -rw-r--r-- 2 hXXXX supergroup 3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml |
51 | | -rw-r--r-- 2 hXXXX supergroup 196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties |
52 | | (.... skip ....) |
53 | | }}} |
54 | | |
55 | | === 1.3 下載 HDFS 的資料到本地目錄 === |
56 | | === 1.3 Download HDFS Files or Folder to Local === |
57 | | |
58 | | * 下載 Download |
59 | | |
60 | | {{{ |
61 | | ~$ hadoop fs -get input fromHDFS |
62 | | }}} |
63 | | |
64 | | * 檢查 Check |
65 | | {{{ |
66 | | ~$ ls -al | grep fromHDFS |
67 | | drwxr-xr-x 2 hXXXX hXXXX 4096 2011-04-19 09:18 fromHDFS |
68 | | ~$ ls -al fromHDFS |
69 | | 總計 160 |
70 | | drwxr-xr-x 2 hXXXX hXXXX 4096 2011-04-19 09:18 . |
71 | | drwx--x--x 3 hXXXX hXXXX 4096 2011-04-19 09:18 .. |
72 | | -rw-r--r-- 1 hXXXX hXXXX 3936 2011-04-19 09:18 capacity-scheduler.xml |
73 | | -rw-r--r-- 1 hXXXX hXXXX 196 2011-04-19 09:18 commons-logging.properties |
74 | | -rw-r--r-- 1 hXXXX hXXXX 535 2011-04-19 09:18 configuration.xsl |
75 | | (.... skip ....) |
76 | | ~$ diff /etc/hadoop/conf fromHDFS/ |
77 | | }}} |
78 | | |
79 | | === 1.4 刪除檔案 === |
80 | | === 1.4 Remove Files or Folder === |
81 | | |
82 | | {{{ |
83 | | ~$ hadoop fs -ls input/masters |
84 | | Found 1 items |
85 | | -rw-r--r-- 2 hXXXX supergroup 10 2011-04-19 09:16 /user/hXXXX/input/masters |
86 | | ~$ hadoop fs -rm input/masters |
87 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input/masters |
88 | | }}} |
89 | | |
90 | | === 1.5 直接看檔案 === |
91 | | === 1.5 Browse Files Directly === |
92 | | |
93 | | {{{ |
94 | | ~$ hadoop fs -ls input/slaves |
95 | | Found 1 items |
96 | | -rw-r--r-- 2 hXXXX supergroup 10 2011-04-19 09:16 /user/hXXXX/input/slaves |
97 | | ~$ hadoop fs -cat input/slaves |
98 | | localhost |
99 | | }}} |
100 | | |
101 | | === 1.6 更多指令操作 === |
102 | | === 1.6 More Commands -- Help message === |
103 | | |
104 | | {{{ |
105 | | hXXXX@hadoop:~$ hadoop fs |
106 | | |
107 | | Usage: java FsShell |
108 | | [-ls <path>] |
109 | | [-lsr <path>] |
110 | | [-du <path>] |
111 | | [-dus <path>] |
112 | | [-count[-q] <path>] |
113 | | [-mv <src> <dst>] |
114 | | [-cp <src> <dst>] |
115 | | [-rm <path>] |
116 | | [-rmr <path>] |
117 | | [-expunge] |
118 | | [-put <localsrc> ... <dst>] |
119 | | [-copyFromLocal <localsrc> ... <dst>] |
120 | | [-moveFromLocal <localsrc> ... <dst>] |
121 | | [-get [-ignoreCrc] [-crc] <src> <localdst>] |
122 | | [-getmerge <src> <localdst> [addnl]] |
123 | | [-cat <src>] |
124 | | [-text <src>] |
125 | | [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>] |
126 | | [-moveToLocal [-crc] <src> <localdst>] |
127 | | [-mkdir <path>] |
128 | | [-setrep [-R] [-w] <rep> <path/file>] |
129 | | [-touchz <path>] |
130 | | [-test -[ezd] <path>] |
131 | | [-stat [format] <path>] |
132 | | [-tail [-f] <file>] |
133 | | [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] |
134 | | [-chown [-R] [OWNER][:[GROUP]] PATH...] |
135 | | [-chgrp [-R] GROUP PATH...] |
136 | | [-help [cmd]] |
137 | | |
138 | | Generic options supported are |
139 | | -conf <configuration file> specify an application configuration file |
140 | | -D <property=value> use value for given property |
141 | | -fs <local|namenode:port> specify a namenode |
142 | | -jt <local|jobtracker:port> specify a job tracker |
143 | | -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster |
144 | | -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. |
145 | | -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. |
146 | | The general command line syntax is |
147 | | hadoop command [genericOptions] [commandOptions] |
148 | | }}} |
149 | | |
150 | | == Content 2: 使用網頁 GUI 瀏覽資訊 == |
151 | | == Content 2: User Web GUI to browse HDFS == |
152 | | |
153 | | * [http://hadoop.nchc.org.tw:50030 JobTracker Web Interface] |
154 | | * [http://hadoop.nchc.org.tw:50070 NameNode Web Interface] |
155 | | |
156 | | == Content 3: 更多 HDFS Shell 的用法 == |
157 | | == Content 3: More about HDFS Shell == |
158 | | |
159 | | * hadoop fs <args> ,下面則列出 <args> 的用法[[BR]]Following are the examples of hadoop fs related commands. |
160 | | * 以下操作預設的目錄在 /user/<$username>/ 下[[BR]]By default, your working directory will be at /user/<$username>/. |
161 | | {{{ |
162 | | $ hadoop fs -ls input |
163 | | Found 25 items |
164 | | -rw-r--r-- 2 hXXXX supergroup 321 2011-04-19 09:16 /user/hXXXX/input/README |
165 | | -rw-r--r-- 2 hXXXX supergroup 3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml |
166 | | -rw-r--r-- 2 hXXXX supergroup 196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties |
167 | | (.... skip ....) |
168 | | }}} |
169 | | * 完整的路徑則是 '''hdfs://node:port/path''' 如:[[BR]]Or you have to give a __''absolute path''__, such as '''hdfs://node:port/path''' |
170 | | {{{ |
171 | | $ hadoop fs -ls hdfs://hadoop.nchc.org.tw/user/hXXXX/input |
172 | | Found 25 items |
173 | | -rw-r--r-- 2 hXXXX supergroup 321 2011-04-19 09:16 /user/hXXXX/input/README |
174 | | -rw-r--r-- 2 hXXXX supergroup 3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml |
175 | | -rw-r--r-- 2 hXXXX supergroup 196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties |
176 | | (.... skip ....) |
177 | | }}} |
178 | | |
179 | | === -cat === |
180 | | |
181 | | * 將路徑指定文件的內容輸出到 STDOUT [[BR]] Print given file content to STDOUT |
182 | | {{{ |
183 | | $ hadoop fs -cat input/hadoop-env.sh |
184 | | }}} |
185 | | |
186 | | === -chgrp === |
187 | | |
188 | | * 改變文件所屬的組 [[BR]] Change '''owner group''' of given file or folder |
189 | | {{{ |
190 | | $ hadoop fs -ls |
191 | | Found 2 items |
192 | | drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:16 /user/hXXXX/input |
193 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
194 | | $ hadoop fs -chgrp -R ${USER} input |
195 | | $ hadoop fs -ls |
196 | | Found 2 items |
197 | | drwxr-xr-x - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input |
198 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
199 | | }}} |
200 | | |
201 | | === -chmod === |
202 | | |
203 | | * 改變文件的權限 [[BR]] Change '''read and write permission''' of given file or folder |
204 | | {{{ |
205 | | $ hadoop fs -ls |
206 | | Found 2 items |
207 | | drwxr-xr-x - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input |
208 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
209 | | $ hadoop fs -chmod -R 755 input |
210 | | $ hadoop fs -ls |
211 | | Found 2 items |
212 | | drwxrwxrwx - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input |
213 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
214 | | }}} |
215 | | |
216 | | === -chown === |
217 | | |
218 | | * 改變文件的擁有者 [[BR]] Change '''owner''' of given file or folder |
219 | | {{{ |
220 | | $ hadoop fs -chown -R ${USER} input |
221 | | }}} |
222 | | * 注意:因為在 hadoop.nchc.org.tw 上您沒有管理者權限,因此若要改成其他使用者時,會看到類似以下的錯誤訊息: |
223 | | * Note: Since you don't have the super user permission, you will see error message as following: |
224 | | {{{ |
225 | | $ hadoop fs -chown -R h1000 input |
226 | | chown: changing ownership of 'hdfs://hadoop.nchc.org.tw/user/hXXXX/input':org.apache.hadoop.security.AccessControlException: Non-super user cannot change owner. |
227 | | }}} |
228 | | |
229 | | === -copyFromLocal, -put === |
230 | | |
231 | | * 從 local 放檔案到 hdfs [[BR]] Both commands will copy given file or folder from local to HDFS |
232 | | {{{ |
233 | | $ hadoop fs -copyFromLocal /etc/hadoop/conf dfs_input |
234 | | }}} |
235 | | |
236 | | === -copyToLocal, -get === |
237 | | |
238 | | * 把hdfs上得檔案下載到 local [[BR]] Both commands will copy given file or folder from HDFS to local |
239 | | {{{ |
240 | | $ hadoop fs -copyToLocal dfs_input input1 |
241 | | }}} |
242 | | |
243 | | === -cp === |
244 | | |
245 | | * 將文件從 hdfs 原本路徑複製到 hdfs 目標路徑 [[BR]] Copy given file or folder from HDFS source path to HDFS target path |
246 | | {{{ |
247 | | $ hadoop fs -cp input input1 |
248 | | }}} |
249 | | |
250 | | === -du === |
251 | | |
252 | | * 顯示目錄中所有文件的大小 [[BR]] Display the size of files in given folder |
253 | | {{{ |
254 | | $ hadoop fs -du input |
255 | | Found 24 items |
256 | | 321 hdfs://hadoop.nchc.org.tw/user/hXXXX/input/README |
257 | | 3936 hdfs://hadoop.nchc.org.tw/user/hXXXX/input/capacity-scheduler.xml |
258 | | 196 hdfs://hadoop.nchc.org.tw/user/hXXXX/input/commons-logging.properties |
259 | | ( .... skip .... ) |
260 | | }}} |
261 | | |
262 | | === -dus === |
263 | | |
264 | | * 顯示該目錄/文件的總大小 [[BR]] Display total size of given folder |
265 | | {{{ |
266 | | $ hadoop fs -dus input |
267 | | hdfs://hadoop.nchc.org.tw/user/hXXXX/input 84218 |
268 | | }}} |
269 | | |
270 | | === -expunge === |
271 | | |
272 | | * 清空垃圾桶 [[BR]] Clean up Recycled |
273 | | {{{ |
274 | | $ hadoop fs -expunge |
275 | | }}} |
276 | | |
277 | | === -getmerge === |
278 | | |
279 | | * 將來源目錄<src>下所有的文件都集合到本地端一個<localdst>檔案內 [[BR]] Merge all files in HDFS source folder <src> into one local file |
280 | | {{{ |
281 | | $ hadoop fs -getmerge <src> <localdst> |
282 | | }}} |
283 | | {{{ |
284 | | $ mkdir -p in1 |
285 | | $ echo "this is one; " >> in1/input |
286 | | $ echo "this is two; " >> in1/input2 |
287 | | $ hadoop fs -put in1 in1 |
288 | | $ hadoop fs -getmerge in1 merge.txt |
289 | | $ cat ./merge.txt |
290 | | }}} |
291 | | * 您應該會看到類似底下的結果:[[BR]]You should see results like this: |
292 | | {{{ |
293 | | this is one; |
294 | | this is two; |
295 | | }}} |
296 | | |
297 | | === -ls === |
298 | | |
299 | | * 列出文件或目錄的資訊 [[BR]] List files and folders |
300 | | * 文件名 <副本數> 文件大小 修改日期 修改時間 權限 用戶ID 組ID [[BR]] <file name> <replication> <size> <modified date> <modified time> <permission> <user id> <group id> |
301 | | * 目錄名 <dir> 修改日期 修改時間 權限 用戶ID 組ID [[BR]] <folder name> <modified date> <modified time> <permission> <user id> <group id> |
302 | | {{{ |
303 | | $ hadoop fs -ls |
304 | | Found 5 items |
305 | | drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:32 /user/hXXXX/dfs_input |
306 | | drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:34 /user/hXXXX/in1 |
307 | | drwxrwxrwx - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input |
308 | | drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:33 /user/hXXXX/input1 |
309 | | drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp |
310 | | }}} |
311 | | |
312 | | === -lsr === |
313 | | |
314 | | * ls 命令的遞迴版本 [[BR]] list files and folders with recursive |
315 | | {{{ |
316 | | $ hadoop fs -lsr in1 |
317 | | -rw-r--r-- 2 hXXXX supergroup 14 2011-04-19 09:34 /user/hXXXX/in1/input |
318 | | -rw-r--r-- 2 hXXXX supergroup 14 2011-04-19 09:34 /user/hXXXX/in1/input2 |
319 | | }}} |
320 | | |
321 | | === -mkdir === |
322 | | |
323 | | * 建立資料夾 [[BR]] create directories |
324 | | {{{ |
325 | | $ hadoop fs -mkdir a b c |
326 | | }}} |
327 | | |
328 | | === -moveFromLocal === |
329 | | |
330 | | * 將 local 端的資料夾剪下移動到 hdfs 上 [[BR]] move local files or folder to HDFS ( it will delete local files or folder. ) |
331 | | {{{ |
332 | | $ hadoop fs -moveFromLocal in1 in2 |
333 | | }}} |
334 | | |
335 | | === -mv === |
336 | | |
337 | | * 更改資料的名稱 [[BR]] Change file name or folder name. |
338 | | {{{ |
339 | | $ hadoop fs -mv in2 in3 |
340 | | }}} |
341 | | |
342 | | === -rm === |
343 | | |
344 | | * 刪除指定的檔案(不可資料夾)[[BR]] Remove given files (not folders) |
345 | | {{{ |
346 | | $ hadoop fs -rm in1/input |
347 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input |
348 | | }}} |
349 | | === -rmr === |
350 | | |
351 | | * 遞迴刪除資料夾(包含在內的所有檔案) [[BR]] Remove given files and folders with recursive |
352 | | {{{ |
353 | | $ hadoop fs -rmr a b c dfs_input in3 input input1 |
354 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/a |
355 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/b |
356 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/c |
357 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/dfs_input |
358 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in3 |
359 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input |
360 | | Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input1 |
361 | | }}} |
362 | | |
363 | | === -setrep === |
364 | | |
365 | | * 設定副本係數 [[BR]] setup replication numbers of given files or folder |
366 | | {{{ |
367 | | $ hadoop fs -setrep [-R] [-w] <rep> <path/file> |
368 | | }}} |
369 | | {{{ |
370 | | $ hadoop fs -setrep -w 2 -R in1 |
371 | | Replication 2 set: hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2 |
372 | | Waiting for hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2 ... done |
373 | | }}} |
374 | | |
375 | | === -stat === |
376 | | |
377 | | * 印出時間資訊 [[BR]] Print Status of time stamp of folder |
378 | | {{{ |
379 | | $ hadoop fs -stat in1 |
380 | | 2011-04-19 09:34:49 |
381 | | }}} |
382 | | === -tail === |
383 | | |
384 | | * 將文件的最後 1K 內容輸出 [[BR]] Display the last 1K contents of given file |
385 | | * 用法 Usage |
386 | | {{{ |
387 | | hadoop fs -tail [-f] 檔案 (-f 參數用來顯示如果檔案增大,則秀出被append上得內容) |
388 | | hadoop fs -tail [-f] <path/file> (-f is used when file had appended) |
389 | | }}} |
390 | | {{{ |
391 | | $ hadoop fs -tail in1/input2 |
392 | | this is two; |
393 | | }}} |
394 | | |
395 | | === -test === |
396 | | |
397 | | * 測試檔案, -e 檢查文件是否存在(1=存在, 0=否), -z 檢查文件是否為空(1=空, 0=不為空), -d 檢查是否為目錄(1=存在, 0=否) [[BR]] test files or folders [[BR]] -e : check if file or folder existed ( 1 = exist , 0 = false )[[BR]] -z : check if file is empty ( 1 = empty , 0 = false ) [[BR]] -d : check if given path is folder ( 1 = it's folder , 0 = false ) |
398 | | * 要用 echo $? 來看回傳值為 0 or 1 [[BR]] You have to use '''echo $?''' to get the return value |
399 | | * 用法 Usage |
400 | | {{{ |
401 | | $ hadoop fs -test -[ezd] URI |
402 | | }}} |
403 | | |
404 | | {{{ |
405 | | $ hadoop fs -test -e in1/input2 |
406 | | $ echo $? |
407 | | 0 |
408 | | $ hadoop fs -test -z in1/input3 |
409 | | $ echo $? |
410 | | 1 |
411 | | $ hadoop fs -test -d in1/input2 |
412 | | $ echo $? |
413 | | 1 |
414 | | }}} |
415 | | |
416 | | === -text === |
417 | | |
418 | | * 將檔案(如壓縮檔, textrecordinputstream)輸出為純文字格式 [[BR]] Display archive file contents into STDOUT |
419 | | {{{ |
420 | | $ hadoop fs -text <src> |
421 | | }}} |
422 | | {{{ |
423 | | $ gzip merge.txt |
424 | | $ hadoop fs -put merge.txt.gz . |
425 | | $ hadoop fs -text merge.txt.gz |
426 | | 11/04/19 09:54:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library |
427 | | 11/04/19 09:54:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library |
428 | | this is one; |
429 | | this is two; |
430 | | }}} |
431 | | * ps : 目前沒支援zip的函式庫 [[BR]] PS. It does not support zip files yet. |
432 | | {{{ |
433 | | $ gunzip merge.txt.gz |
434 | | $ zip merge.zip merge.txt |
435 | | $ hadoop fs -put merge.zip . |
436 | | $ hadoop fs -text merge.zip |
437 | | PK�N�>E73 merge.txtUT ���Mq��Mux |
438 | | ��+��,V���Tk�(��<�PK�N�>E73 ��merge.txtUT���Mux |
439 | | ��PKOY |
440 | | }}} |
441 | | |
442 | | === -touchz === |
443 | | |
444 | | * 建立一個空文件 [[BR]] creat an empty file |
445 | | {{{ |
446 | | $ hadoop fs -touchz in1/kk |
447 | | $ hadoop fs -test -z in1/kk |
448 | | $ echo $? |
449 | | 0 |
450 | | }}} |
451 | | |
452 | | ---- |
453 | | |
454 | | * 您可以用以下指令把以上練習產生的暫存目錄與檔案清除:[[BR]]You can clean up the temporary folders and files using following command: |
455 | | {{{ |
456 | | ~$ hadoop fs -rmr in1 merge.txt.gz merge.zip |
457 | | ~$ rm -rf input1/ fromHDFS/ merge.zip |
458 | | }}} |
| 17 | * https://lab.3du.me |