wiki:ITRI0521/Lab4

Version 1 (modified by jazz, 12 years ago) (diff)

--

◢ <實作三> | <回課程大綱> ▲ | <實作五> ◣

實作四 Lab 4

HDFS 叢集環境操作練習
HDFS full distributed mode in practice
以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。

Content 1: HDFS Shell 基本操作

Content 1: Basic HDFS Shell Commands

1.1 瀏覽你HDFS目錄

1.1 Browsing Your HDFS Folder

~$ hadoop fs -ls
Found 1 items
drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
~$ hadoop fs -lsr
drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp

1.2 上傳資料到 HDFS 目錄

1.2 Upload Files or Folder to HDFS

  • 上傳 Upload
~$ hadoop fs -put /etc/hadoop/conf input
  • 檢查 Check
~$ hadoop fs -ls
Found 2 items
drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:16 /user/hXXXX/input
drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
~$ hadoop fs -ls input
Found 25 items
-rw-r--r--   2 hXXXX supergroup        321 2011-04-19 09:16 /user/hXXXX/input/README
-rw-r--r--   2 hXXXX supergroup       3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml
-rw-r--r--   2 hXXXX supergroup        196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties
(.... skip ....)

1.3 下載 HDFS 的資料到本地目錄

1.3 Download HDFS Files or Folder to Local

  • 下載 Download
~$ hadoop fs -get input fromHDFS
  • 檢查 Check
    ~$ ls -al | grep fromHDFS
    drwxr-xr-x    2 hXXXX hXXXX  4096 2011-04-19 09:18 fromHDFS
    ~$ ls -al fromHDFS
    總計 160
    drwxr-xr-x 2 hXXXX hXXXX  4096 2011-04-19 09:18 .
    drwx--x--x 3 hXXXX hXXXX  4096 2011-04-19 09:18 ..
    -rw-r--r-- 1 hXXXX hXXXX  3936 2011-04-19 09:18 capacity-scheduler.xml
    -rw-r--r-- 1 hXXXX hXXXX   196 2011-04-19 09:18 commons-logging.properties
    -rw-r--r-- 1 hXXXX hXXXX   535 2011-04-19 09:18 configuration.xsl
    (.... skip ....)
    ~$ diff /etc/hadoop/conf fromHDFS/
    

1.4 刪除檔案

1.4 Remove Files or Folder

~$ hadoop fs -ls input/masters
Found 1 items
-rw-r--r--   2 hXXXX supergroup         10 2011-04-19 09:16 /user/hXXXX/input/masters
~$ hadoop fs -rm input/masters
Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input/masters

1.5 直接看檔案

1.5 Browse Files Directly

~$ hadoop fs -ls input/slaves
Found 1 items
-rw-r--r--   2 hXXXX supergroup         10 2011-04-19 09:16 /user/hXXXX/input/slaves
~$ hadoop fs -cat input/slaves
localhost

1.6 更多指令操作

1.6 More Commands -- Help message

hXXXX@hadoop:~$ hadoop fs 

Usage: java FsShell
           [-ls <path>]
           [-lsr <path>]
           [-du <path>]
           [-dus <path>]
           [-count[-q] <path>]
           [-mv <src> <dst>]
           [-cp <src> <dst>]
           [-rm <path>]
           [-rmr <path>]
           [-expunge]
           [-put <localsrc> ... <dst>]
           [-copyFromLocal <localsrc> ... <dst>]
           [-moveFromLocal <localsrc> ... <dst>]
           [-get [-ignoreCrc] [-crc] <src> <localdst>]
           [-getmerge <src> <localdst> [addnl]]
           [-cat <src>]
           [-text <src>]
           [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
           [-moveToLocal [-crc] <src> <localdst>]
           [-mkdir <path>]
           [-setrep [-R] [-w] <rep> <path/file>]
           [-touchz <path>]
           [-test -[ezd] <path>]
           [-stat [format] <path>]
           [-tail [-f] <file>]
           [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
           [-chown [-R] [OWNER][:[GROUP]] PATH...]
           [-chgrp [-R] GROUP PATH...]
           [-help [cmd]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
hadoop command [genericOptions] [commandOptions]

Content 2: 使用網頁 GUI 瀏覽資訊

Content 2: User Web GUI to browse HDFS

Content 3: 更多 HDFS Shell 的用法

Content 3: More about HDFS Shell

  • hadoop fs <args> ,下面則列出 <args> 的用法
    Following are the examples of hadoop fs related commands.
  • 以下操作預設的目錄在 /user/<$username>/ 下
    By default, your working directory will be at /user/<$username>/.
    $ hadoop fs -ls input
    Found 25 items
    -rw-r--r--   2 hXXXX supergroup        321 2011-04-19 09:16 /user/hXXXX/input/README
    -rw-r--r--   2 hXXXX supergroup       3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml
    -rw-r--r--   2 hXXXX supergroup        196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties
    (.... skip ....)
    
  • 完整的路徑則是 hdfs://node:port/path 如:
    Or you have to give a absolute path, such as hdfs://node:port/path
    $ hadoop fs -ls hdfs://hadoop.nchc.org.tw/user/hXXXX/input
    Found 25 items
    -rw-r--r--   2 hXXXX supergroup        321 2011-04-19 09:16 /user/hXXXX/input/README
    -rw-r--r--   2 hXXXX supergroup       3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml
    -rw-r--r--   2 hXXXX supergroup        196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties
    (.... skip ....)
    

-cat

  • 將路徑指定文件的內容輸出到 STDOUT
    Print given file content to STDOUT
    $ hadoop fs -cat input/hadoop-env.sh
    

-chgrp

  • 改變文件所屬的組
    Change owner group of given file or folder
    $ hadoop fs -ls
    Found 2 items
    drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:16 /user/hXXXX/input
    drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    $ hadoop fs -chgrp -R ${USER} input
    $ hadoop fs -ls
    Found 2 items
    drwxr-xr-x   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    

-chmod

  • 改變文件的權限
    Change read and write permission of given file or folder
    $ hadoop fs -ls
    Found 2 items
    drwxr-xr-x   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    $ hadoop fs -chmod -R 755 input
    $ hadoop fs -ls
    Found 2 items
    drwxrwxrwx   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    

-chown

  • 改變文件的擁有者
    Change owner of given file or folder
    $ hadoop fs -chown -R ${USER} input
    
  • 注意:因為在 hadoop.nchc.org.tw 上您沒有管理者權限,因此若要改成其他使用者時,會看到類似以下的錯誤訊息:
  • Note: Since you don't have the super user permission, you will see error message as following:
    $ hadoop fs -chown -R h1000 input
    chown: changing ownership of 'hdfs://hadoop.nchc.org.tw/user/hXXXX/input':org.apache.hadoop.security.AccessControlException: Non-super user cannot change owner.
    

-copyFromLocal, -put

  • 從 local 放檔案到 hdfs
    Both commands will copy given file or folder from local to HDFS
    $ hadoop fs -copyFromLocal /etc/hadoop/conf dfs_input
    

-copyToLocal, -get

  • 把hdfs上得檔案下載到 local
    Both commands will copy given file or folder from HDFS to local
    $ hadoop fs -copyToLocal dfs_input input1
    

-cp

  • 將文件從 hdfs 原本路徑複製到 hdfs 目標路徑
    Copy given file or folder from HDFS source path to HDFS target path
    $ hadoop fs -cp input input1
    

-du

  • 顯示目錄中所有文件的大小
    Display the size of files in given folder
    $ hadoop fs -du input
    Found 24 items
    321         hdfs://hadoop.nchc.org.tw/user/hXXXX/input/README
    3936        hdfs://hadoop.nchc.org.tw/user/hXXXX/input/capacity-scheduler.xml
    196         hdfs://hadoop.nchc.org.tw/user/hXXXX/input/commons-logging.properties
    ( .... skip .... )
    

-dus

  • 顯示該目錄/文件的總大小
    Display total size of given folder
    $ hadoop fs -dus input
    hdfs://hadoop.nchc.org.tw/user/hXXXX/input	84218
    

-expunge

  • 清空垃圾桶
    Clean up Recycled
    $ hadoop fs -expunge
    

-getmerge

  • 將來源目錄<src>下所有的文件都集合到本地端一個<localdst>檔案內
    Merge all files in HDFS source folder <src> into one local file
    $ hadoop fs -getmerge <src> <localdst> 
    
    $ mkdir -p in1
    $ echo "this is one; " >> in1/input
    $ echo "this is two; " >> in1/input2
    $ hadoop fs -put in1 in1
    $ hadoop fs -getmerge in1 merge.txt
    $ cat ./merge.txt
    
  • 您應該會看到類似底下的結果:
    You should see results like this:
    this is one; 
    this is two;
    

-ls

  • 列出文件或目錄的資訊
    List files and folders
  • 文件名 <副本數> 文件大小 修改日期 修改時間 權限 用戶ID 組ID
    <file name> <replication> <size> <modified date> <modified time> <permission> <user id> <group id>
  • 目錄名 <dir> 修改日期 修改時間 權限 用戶ID 組ID
    <folder name> <modified date> <modified time> <permission> <user id> <group id>
    $ hadoop fs -ls
    Found 5 items
    drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:32 /user/hXXXX/dfs_input
    drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:34 /user/hXXXX/in1
    drwxrwxrwx   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:33 /user/hXXXX/input1
    drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    

-lsr

  • ls 命令的遞迴版本
    list files and folders with recursive
    $ hadoop fs -lsr in1
    -rw-r--r--   2 hXXXX supergroup         14 2011-04-19 09:34 /user/hXXXX/in1/input
    -rw-r--r--   2 hXXXX supergroup         14 2011-04-19 09:34 /user/hXXXX/in1/input2
    

-mkdir

  • 建立資料夾
    create directories
    $ hadoop fs -mkdir a b c
    

-moveFromLocal

  • 將 local 端的資料夾剪下移動到 hdfs 上
    move local files or folder to HDFS ( it will delete local files or folder. )
    $ hadoop fs -moveFromLocal in1 in2
    

-mv

  • 更改資料的名稱
    Change file name or folder name.
    $ hadoop fs -mv in2 in3
    

-rm

  • 刪除指定的檔案(不可資料夾)
    Remove given files (not folders)
    $ hadoop fs -rm in1/input
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input
    

-rmr

  • 遞迴刪除資料夾(包含在內的所有檔案)
    Remove given files and folders with recursive
    $ hadoop fs -rmr a b c dfs_input in3 input input1
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/a
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/b
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/c
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/dfs_input
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in3
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input
    Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input1
    

-setrep

  • 設定副本係數
    setup replication numbers of given files or folder
    $ hadoop fs -setrep [-R] [-w] <rep> <path/file>
    
    $ hadoop fs -setrep -w 2 -R in1
    Replication 2 set: hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2
    Waiting for hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2 ... done
    

-stat

  • 印出時間資訊
    Print Status of time stamp of folder
    $ hadoop fs -stat in1
    2011-04-19 09:34:49
    

-tail

  • 將文件的最後 1K 內容輸出
    Display the last 1K contents of given file
  • 用法 Usage
    hadoop fs -tail [-f] 檔案 (-f 參數用來顯示如果檔案增大,則秀出被append上得內容)
    hadoop fs -tail [-f] <path/file> (-f is used when file had appended)
    
    $ hadoop fs -tail in1/input2
    this is two; 
    

-test

  • 測試檔案, -e 檢查文件是否存在(1=存在, 0=否), -z 檢查文件是否為空(1=空, 0=不為空), -d 檢查是否為目錄(1=存在, 0=否)
    test files or folders
    -e : check if file or folder existed ( 1 = exist , 0 = false )
    -z : check if file is empty ( 1 = empty , 0 = false )
    -d : check if given path is folder ( 1 = it's folder , 0 = false )
    • 要用 echo $? 來看回傳值為 0 or 1
      You have to use echo $? to get the return value
  • 用法 Usage
    $ hadoop fs -test -[ezd] URI
    

$ hadoop fs -test -e in1/input2
$ echo $?
0
$ hadoop fs -test -z in1/input3
$ echo $?
1
$ hadoop fs -test -d in1/input2
$ echo $?
1

-text

  • 將檔案(如壓縮檔, textrecordinputstream)輸出為純文字格式
    Display archive file contents into STDOUT
    $ hadoop fs -text <src>
    
    $ gzip merge.txt
    $ hadoop fs -put merge.txt.gz .
    $ hadoop fs -text merge.txt.gz
    11/04/19 09:54:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    11/04/19 09:54:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    this is one; 
    this is two; 
    
  • ps : 目前沒支援zip的函式庫
    PS. It does not support zip files yet.
    $ gunzip merge.txt.gz
    $ zip merge.zip merge.txt
    $ hadoop fs -put merge.zip .
    $ hadoop fs -text merge.zip
    PK�N�>E73	merge.txtUT	���Mq��Mux
                                               ��+��,V���Tk�(��<�PK�N�>E73	��merge.txtUT���Mux
                                  ��PKOY
    

-touchz

  • 建立一個空文件
    creat an empty file
    $ hadoop fs -touchz in1/kk
    $ hadoop fs -test -z in1/kk
    $ echo $?
    0
    

  • 您可以用以下指令把以上練習產生的暫存目錄與檔案清除:
    You can clean up the temporary folders and files using following command:
    ~$ hadoop fs -rmr in1 merge.txt.gz merge.zip
    ~$ rm -rf input1/ fromHDFS/ merge.zip