Version 4 (modified by jazz, 13 years ago) (diff) |
---|
實作四 Lab 4
HDFS 叢集環境操作練習
HDFS full distributed mode in practice
HDFS full distributed mode in practice
以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。
Content 1: HDFS Shell 基本操作
Content 1: Basic HDFS Shell Commands
1.1 瀏覽你HDFS目錄
1.1 Browsing Your HDFS Folder
~$ hadoop fs -ls Found 1 items drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp ~$ hadoop fs -lsr drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp
1.2 上傳資料到 HDFS 目錄
1.2 Upload Files or Folder to HDFS
- 上傳 Upload
~$ hadoop fs -put /etc/hadoop/conf input
- 檢查 Check
~$ hadoop fs -ls Found 2 items drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:16 /user/hXXXX/input drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp ~$ hadoop fs -ls input Found 25 items -rw-r--r-- 2 hXXXX supergroup 321 2011-04-19 09:16 /user/hXXXX/input/README -rw-r--r-- 2 hXXXX supergroup 3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml -rw-r--r-- 2 hXXXX supergroup 196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties (.... skip ....)
1.3 下載 HDFS 的資料到本地目錄
1.3 Download HDFS Files or Folder to Local
- 下載 Download
~$ hadoop fs -get input fromHDFS
- 檢查 Check
~$ ls -al | grep fromHDFS drwxr-xr-x 2 hXXXX hXXXX 4096 2011-04-19 09:18 fromHDFS ~$ ls -al fromHDFS 總計 160 drwxr-xr-x 2 hXXXX hXXXX 4096 2011-04-19 09:18 . drwx--x--x 3 hXXXX hXXXX 4096 2011-04-19 09:18 .. -rw-r--r-- 1 hXXXX hXXXX 3936 2011-04-19 09:18 capacity-scheduler.xml -rw-r--r-- 1 hXXXX hXXXX 196 2011-04-19 09:18 commons-logging.properties -rw-r--r-- 1 hXXXX hXXXX 535 2011-04-19 09:18 configuration.xsl (.... skip ....) ~$ diff /etc/hadoop/conf fromHDFS/
1.4 刪除檔案
1.4 Remove Files or Folder
~$ hadoop fs -ls input/masters Found 1 items -rw-r--r-- 2 hXXXX supergroup 10 2011-04-19 09:16 /user/hXXXX/input/masters ~$ hadoop fs -rm input/masters Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input/masters
1.5 直接看檔案
1.5 Browse Files Directly
~$ hadoop fs -ls input/slaves Found 1 items -rw-r--r-- 2 hXXXX supergroup 10 2011-04-19 09:16 /user/hXXXX/input/slaves ~$ hadoop fs -cat input/slaves localhost
1.6 更多指令操作
1.6 More Commands -- Help message
hXXXX@hadoop:~$ hadoop fs Usage: java FsShell [-ls <path>] [-lsr <path>] [-du <path>] [-dus <path>] [-count[-q] <path>] [-mv <src> <dst>] [-cp <src> <dst>] [-rm <path>] [-rmr <path>] [-expunge] [-put <localsrc> ... <dst>] [-copyFromLocal <localsrc> ... <dst>] [-moveFromLocal <localsrc> ... <dst>] [-get [-ignoreCrc] [-crc] <src> <localdst>] [-getmerge <src> <localdst> [addnl]] [-cat <src>] [-text <src>] [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>] [-moveToLocal [-crc] <src> <localdst>] [-mkdir <path>] [-setrep [-R] [-w] <rep> <path/file>] [-touchz <path>] [-test -[ezd] <path>] [-stat [format] <path>] [-tail [-f] <file>] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-chgrp [-R] GROUP PATH...] [-help [cmd]] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|jobtracker:port> specify a job tracker -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is hadoop command [genericOptions] [commandOptions]
Content 2: 使用網頁 GUI 瀏覽資訊
Content 2: User Web GUI to browse HDFS
Content 3: 更多 HDFS Shell 的用法
Content 3: More about HDFS Shell
- hadoop fs <args> ,下面則列出 <args> 的用法
Following are the examples of hadoop fs related commands. - 以下操作預設的目錄在 /user/<$username>/ 下
By default, your working directory will be at /user/<$username>/.$ hadoop fs -ls input Found 25 items -rw-r--r-- 2 hXXXX supergroup 321 2011-04-19 09:16 /user/hXXXX/input/README -rw-r--r-- 2 hXXXX supergroup 3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml -rw-r--r-- 2 hXXXX supergroup 196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties (.... skip ....)
- 完整的路徑則是 hdfs://node:port/path 如:
Or you have to give a absolute path, such as hdfs://node:port/path$ hadoop fs -ls hdfs://hadoop.nchc.org.tw/user/hXXXX/input Found 25 items -rw-r--r-- 2 hXXXX supergroup 321 2011-04-19 09:16 /user/hXXXX/input/README -rw-r--r-- 2 hXXXX supergroup 3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml -rw-r--r-- 2 hXXXX supergroup 196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties (.... skip ....)
-cat
- 將路徑指定文件的內容輸出到 STDOUT
Print given file content to STDOUT$ hadoop fs -cat input/hadoop-env.sh
-chgrp
- 改變文件所屬的組
Change owner group of given file or folder$ hadoop fs -ls Found 2 items drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:16 /user/hXXXX/input drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp $ hadoop fs -chgrp -R ${USER} input $ hadoop fs -ls Found 2 items drwxr-xr-x - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp
-chmod
- 改變文件的權限
Change read and write permission of given file or folder$ hadoop fs -ls Found 2 items drwxr-xr-x - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp $ hadoop fs -chmod -R 755 input $ hadoop fs -ls Found 2 items drwxrwxrwx - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp
-chown
- 改變文件的擁有者
Change owner of given file or folder$ hadoop fs -chown -R ${USER} input
- 注意:因為在 hadoop.nchc.org.tw 上您沒有管理者權限,因此若要改成其他使用者時,會看到類似以下的錯誤訊息:
- Note: Since you don't have the super user permission, you will see error message as following:
$ hadoop fs -chown -R h1000 input chown: changing ownership of 'hdfs://hadoop.nchc.org.tw/user/hXXXX/input':org.apache.hadoop.security.AccessControlException: Non-super user cannot change owner.
-copyFromLocal, -put
- 從 local 放檔案到 hdfs
Both commands will copy given file or folder from local to HDFS$ hadoop fs -copyFromLocal /etc/hadoop/conf dfs_input
-copyToLocal, -get
- 把hdfs上得檔案下載到 local
Both commands will copy given file or folder from HDFS to local$ hadoop fs -copyToLocal dfs_input input1
-cp
- 將文件從 hdfs 原本路徑複製到 hdfs 目標路徑
Copy given file or folder from HDFS source path to HDFS target path$ hadoop fs -cp input input1
-du
- 顯示目錄中所有文件的大小
Display the size of files in given folder$ hadoop fs -du input Found 24 items 321 hdfs://hadoop.nchc.org.tw/user/hXXXX/input/README 3936 hdfs://hadoop.nchc.org.tw/user/hXXXX/input/capacity-scheduler.xml 196 hdfs://hadoop.nchc.org.tw/user/hXXXX/input/commons-logging.properties ( .... skip .... )
-dus
- 顯示該目錄/文件的總大小
Display total size of given folder$ hadoop fs -dus input hdfs://hadoop.nchc.org.tw/user/hXXXX/input 84218
-expunge
- 清空垃圾桶
Clean up Recycled$ hadoop fs -expunge
-getmerge
- 將來源目錄<src>下所有的文件都集合到本地端一個<localdst>檔案內
Merge all files in HDFS source folder <src> into one local file$ hadoop fs -getmerge <src> <localdst>
$ mkdir -p in1 $ echo "this is one; " >> in1/input $ echo "this is two; " >> in1/input2 $ hadoop fs -put in1 in1 $ hadoop fs -getmerge in1 merge.txt $ cat ./merge.txt
- 您應該會看到類似底下的結果:
You should see results like this:this is one; this is two;
-ls
- 列出文件或目錄的資訊
List files and folders - 文件名 <副本數> 文件大小 修改日期 修改時間 權限 用戶ID 組ID
<file name> <replication> <size> <modified date> <modified time> <permission> <user id> <group id> - 目錄名 <dir> 修改日期 修改時間 權限 用戶ID 組ID
<folder name> <modified date> <modified time> <permission> <user id> <group id>$ hadoop fs -ls Found 5 items drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:32 /user/hXXXX/dfs_input drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:34 /user/hXXXX/in1 drwxrwxrwx - hXXXX hXXXX 0 2011-04-19 09:21 /user/hXXXX/input drwxr-xr-x - hXXXX supergroup 0 2011-04-19 09:33 /user/hXXXX/input1 drwxr-xr-x - hXXXX supergroup 0 2010-01-24 17:23 /user/hXXXX/tmp
-lsr
- ls 命令的遞迴版本
list files and folders with recursive$ hadoop fs -lsr in1 -rw-r--r-- 2 hXXXX supergroup 14 2011-04-19 09:34 /user/hXXXX/in1/input -rw-r--r-- 2 hXXXX supergroup 14 2011-04-19 09:34 /user/hXXXX/in1/input2
-mkdir
- 建立資料夾
create directories$ hadoop fs -mkdir a b c
-moveFromLocal
- 將 local 端的資料夾剪下移動到 hdfs 上
move local files or folder to HDFS ( it will delete local files or folder. )$ hadoop fs -moveFromLocal in1 in2
-mv
- 更改資料的名稱
Change file name or folder name.$ hadoop fs -mv in2 in3
-rm
- 刪除指定的檔案(不可資料夾)
Remove given files (not folders)$ hadoop fs -rm in1/input Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input
-rmr
- 遞迴刪除資料夾(包含在內的所有檔案)
Remove given files and folders with recursive$ hadoop fs -rmr a b c dfs_input in3 input input1 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/a Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/b Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/c Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/dfs_input Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in3 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input1
-setrep
- 設定副本係數
setup replication numbers of given files or folder$ hadoop fs -setrep [-R] [-w] <rep> <path/file>
$ hadoop fs -setrep -w 2 -R in1 Replication 2 set: hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2 Waiting for hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2 ... done
-stat
- 印出時間資訊
Print Status of time stamp of folder$ hadoop fs -stat in1 2011-04-19 09:34:49
-tail
- 將文件的最後 1K 內容輸出
Display the last 1K contents of given file - 用法 Usage
hadoop fs -tail [-f] 檔案 (-f 參數用來顯示如果檔案增大,則秀出被append上得內容) hadoop fs -tail [-f] <path/file> (-f is used when file had appended)
$ hadoop fs -tail in1/input2 this is two;
-test
- 測試檔案, -e 檢查文件是否存在(1=存在, 0=否), -z 檢查文件是否為空(1=空, 0=不為空), -d 檢查是否為目錄(1=存在, 0=否)
test files or folders
-e : check if file or folder existed ( 1 = exist , 0 = false )
-z : check if file is empty ( 1 = empty , 0 = false )
-d : check if given path is folder ( 1 = it's folder , 0 = false )- 要用 echo $? 來看回傳值為 0 or 1
You have to use echo $? to get the return value
- 要用 echo $? 來看回傳值為 0 or 1
- 用法 Usage
$ hadoop fs -test -[ezd] URI
$ hadoop fs -test -e in1/input2 $ echo $? 0 $ hadoop fs -test -z in1/input3 $ echo $? 1 $ hadoop fs -test -d in1/input2 $ echo $? 1
-text
- 將檔案(如壓縮檔, textrecordinputstream)輸出為純文字格式
Display archive file contents into STDOUT$ hadoop fs -text <src>
$ gzip merge.txt $ hadoop fs -put merge.txt.gz . $ hadoop fs -text merge.txt.gz 11/04/19 09:54:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library 11/04/19 09:54:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library this is one; this is two;
- ps : 目前沒支援zip的函式庫
PS. It does not support zip files yet.$ gunzip merge.txt.gz $ zip merge.zip merge.txt $ hadoop fs -put merge.zip . $ hadoop fs -text merge.zip PK�N�>E73 merge.txtUT ���Mq��Mux ��+��,V���Tk�(��<�PK�N�>E73 ��merge.txtUT���Mux ��PKOY
-touchz
- 建立一個空文件
creat an empty file$ hadoop fs -touchz in1/kk $ hadoop fs -test -z in1/kk $ echo $? 0
- 您可以用以下指令把以上練習產生的暫存目錄與檔案清除:
You can clean up the temporary folders and files using following command:~$ hadoop fs -rmr in1 merge.txt.gz merge.zip ~$ rm -rf input1/ fromHDFS/ merge.zip