wiki:III110813/Lab3

◢ <實作二> | <回課程大綱> ▲ | <實作四> ◣

實作三 Lab3

HDFS 單機操作練習
HDFS local mode in Practice

0. 啟動 Hadoop4Win

  • STEP 1 : 請在「開始功能表」依序點選以下捷徑

  • STEP 2 :首先點選 start-hadoop 來啟動 Hadoop 的服務(跑在獨立的 CMD 視窗中)
    • 注意:必須看到 Safe Mode is OFF 才算正常啟動完畢。

  • STEP 3 :其次點選 NameNode Web UI 用瀏覽器開啟 http://localhost:50070 的頁面,確認 NameNode 正常開啟,可以正常顯示如下畫面:
    • 注意:必須有一個 Live Node 才算是正常。

  • STEP 4 :接著點選 JobTracker Web UI 用瀏覽器開啟 http://localhost:50030 的頁面,確認 JobTracker 正常開啟,可以正常顯示如下畫面:
    • 注意:狀態必須是 RUNNING 才算是正常。

  • STEP 5 : 最後點選 hadoop4win 來啟動 hadoop4win 的 Cygwin 視窗,用以輸入後續的指令。

1. HDFS 指令練習

1.1 瀏覽您的 HDFS 目錄

  • 首先,您可以使用 hadoop fs -ls 指令來瀏覽您的 HDFS 目錄
    Jazz@human ~
    $ hadoop fs -ls
    Found 1 items
    drwxr-xr-x   - Jazz supergroup          0 2011-09-14 12:50 /user/Jazz/tmp
    

1.2 上傳資料到 HDFS 目錄

  • 接著,讓我們來練習如何上傳資料到 HDFS 目錄。這裡我們使用的是 /opt/hadoop/conf 當作來源目錄,/user/${使用者名稱}/input 當作目標目錄。
  • 注意:由於 Windows 版的 Hadoop 運行於 Cygwin 中,然而 Cygwin 的路徑是虛擬路徑,JRE(Java Runtime Environment)只認識 Windows 目錄路徑,因此倘若您遇到類似底下的錯誤訊息,請加上 cygpath -w 來轉換 Cygwin 路徑到 Windows 路徑。
    Jazz@human ~
    $ hadoop fs -put /opt/hadoop/conf input
    put: File /opt/hadoop/conf does not exist.
    Jazz@human ~
    $ hadoop fs -put $(cygpath -w /opt/hadoop/conf) input
    
  • 我們可以使用 hadoop fs -ls 來檢查剛剛上傳的檔案
    Jazz@human ~
    $ hadoop fs -ls
    Found 2 items
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 11:45 /user/Jazz/input
    drwxr-xr-x   - Jazz supergroup          0 2011-09-14 12:50 /user/Jazz/tmp
    
    Jazz@human ~
    $ hadoop fs -ls input
    Found 13 items
    -rw-r--r--   1 Jazz supergroup       3936 2011-10-21 11:45 /user/Jazz/input/capacity-scheduler.xml
    -rw-r--r--   1 Jazz supergroup        535 2011-10-21 11:45 /user/Jazz/input/configuration.xsl
    -rw-r--r--   1 Jazz supergroup        326 2011-10-21 11:45 /user/Jazz/input/core-site.xml
    -rw-r--r--   1 Jazz supergroup       2409 2011-10-21 11:45 /user/Jazz/input/hadoop-env.sh
    -rw-r--r--   1 Jazz supergroup       1245 2011-10-21 11:45 /user/Jazz/input/hadoop-metrics.properties
    -rw-r--r--   1 Jazz supergroup       4190 2011-10-21 11:45 /user/Jazz/input/hadoop-policy.xml
    -rw-r--r--   1 Jazz supergroup        196 2011-10-21 11:45 /user/Jazz/input/hdfs-site.xml
    -rw-r--r--   1 Jazz supergroup       2815 2011-10-21 11:45 /user/Jazz/input/log4j.properties
    -rw-r--r--   1 Jazz supergroup        212 2011-10-21 11:45 /user/Jazz/input/mapred-site.xml
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 11:45 /user/Jazz/input/masters
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 11:45 /user/Jazz/input/slaves
    -rw-r--r--   1 Jazz supergroup       1243 2011-10-21 11:45 /user/Jazz/input/ssl-client.xml.example
    -rw-r--r--   1 Jazz supergroup       1195 2011-10-21 11:45 /user/Jazz/input/ssl-server.xml.example
    

1.3 下載 HDFS 的資料到本地目錄

  • 接著讓我們來練習如何透過指令從 HDFS 下載資料到本地目錄
    Jazz@human ~
    $ hadoop fs -get input fromHDFS
    
  • 您可以透過 diff 指令來檢查剛剛上傳的內容與下載下來的內容是否一致
    Jazz@human ~
    $ diff -Naur fromHDFS/ /opt/hadoop/conf
    

1.4 刪除 HDFS 上的檔案

  • 您可以透過 hadoop fs -rm 來刪除 HDFS 上的單一檔案
    Jazz@human ~
    $ hadoop fs -rm input/masters
    Deleted hdfs://localhost:9000/user/Jazz/input/masters
    
  • 倘若您欲刪除的是目錄,請使用 hadoop fs -rmr 來刪除 HDFS 上的目錄
    Jazz@human ~
    $ hadoop fs -rmr tmp
    Deleted hdfs://localhost:9000/user/Jazz/tmp
    

1.5 傾印 HDFS 上的檔案內容

  • 有時,如果只是想要查閱 HDFS 上的檔案內容,可以使用 hdfs fs -cat 來傾印(dump)檔案內容。
    Jazz@human ~
    $ hadoop fs -cat input/slaves
    localhost
    

1.6 更多 HDFS 指令操作

  • HDFS 支援的所有指令可以透過以下方式取得列表:
    Jazz@human ~
    $ hadoop fs
    Usage: java FsShell
               [-ls <path>]
               [-lsr <path>]
               [-du <path>]
               [-dus <path>]
               [-count[-q] <path>]
               [-mv <src> <dst>]
               [-cp <src> <dst>]
               [-rm [-skipTrash] <path>]
               [-rmr [-skipTrash] <path>]
               [-expunge]
               [-put <localsrc> ... <dst>]
               [-copyFromLocal <localsrc> ... <dst>]
               [-moveFromLocal <localsrc> ... <dst>]
               [-get [-ignoreCrc] [-crc] <src> <localdst>]
               [-getmerge <src> <localdst> [addnl]]
               [-cat <src>]
               [-text <src>]
               [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
               [-moveToLocal [-crc] <src> <localdst>]
               [-mkdir <path>]
               [-setrep [-R] [-w] <rep> <path/file>]
               [-touchz <path>]
               [-test -[ezd] <path>]
               [-stat [format] <path>]
               [-tail [-f] <file>]
               [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
               [-chown [-R] [OWNER][:[GROUP]] PATH...]
               [-chgrp [-R] GROUP PATH...]
               [-help [cmd]]
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|jobtracker:port>    specify a job tracker
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    

2. 使用網頁介面來瀏覽 HDFS 的內容資訊

  • 您亦可透過調閱 NameNode 的頁面來查詢方才上傳的檔案內容與 Block Size、File Size、Block Location、Rack Location 等資訊。



3. 更多 HDFS shell 的用法

-ls

  • -ls 的操作預設目錄在 /user/${username}/ 下,意思就是您使用的是相對於 /user/${username} 的「相對路徑」
    Jazz@human ~
    $ hadoop fs -ls input
    Found 13 items
    -rw-r--r--   1 Jazz supergroup       3936 2011-10-21 11:45 /user/Jazz/input/capacity-scheduler.xml
    -rw-r--r--   1 Jazz supergroup        535 2011-10-21 11:45 /user/Jazz/input/configuration.xsl
    -rw-r--r--   1 Jazz supergroup        326 2011-10-21 11:45 /user/Jazz/input/core-site.xml
    -rw-r--r--   1 Jazz supergroup       2409 2011-10-21 11:45 /user/Jazz/input/hadoop-env.sh
    -rw-r--r--   1 Jazz supergroup       1245 2011-10-21 11:45 /user/Jazz/input/hadoop-metrics.properties
    -rw-r--r--   1 Jazz supergroup       4190 2011-10-21 11:45 /user/Jazz/input/hadoop-policy.xml
    -rw-r--r--   1 Jazz supergroup        196 2011-10-21 11:45 /user/Jazz/input/hdfs-site.xml
    -rw-r--r--   1 Jazz supergroup       2815 2011-10-21 11:45 /user/Jazz/input/log4j.properties
    -rw-r--r--   1 Jazz supergroup        212 2011-10-21 11:45 /user/Jazz/input/mapred-site.xml
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 11:45 /user/Jazz/input/slaves
    -rw-r--r--   1 Jazz supergroup       1243 2011-10-21 11:45 /user/Jazz/input/ssl-client.xml.example
    -rw-r--r--   1 Jazz supergroup       1195 2011-10-21 11:45 /user/Jazz/input/ssl-server.xml.example
    
  • 當然您也可以指定「完整路徑」,採用 hdfs://node:port/path 這種格式。
    Jazz@human ~
    $ hadoop fs -ls hdfs://localhost:9000/user/${USER}/input
    Found 12 items
    -rw-r--r--   1 Jazz supergroup       3936 2011-10-21 11:45 /user/Jazz/input/capacity-scheduler.xml
    -rw-r--r--   1 Jazz supergroup        535 2011-10-21 11:45 /user/Jazz/input/configuration.xsl
    -rw-r--r--   1 Jazz supergroup        326 2011-10-21 11:45 /user/Jazz/input/core-site.xml
    -rw-r--r--   1 Jazz supergroup       2409 2011-10-21 11:45 /user/Jazz/input/hadoop-env.sh
    -rw-r--r--   1 Jazz supergroup       1245 2011-10-21 11:45 /user/Jazz/input/hadoop-metrics.properties
    -rw-r--r--   1 Jazz supergroup       4190 2011-10-21 11:45 /user/Jazz/input/hadoop-policy.xml
    -rw-r--r--   1 Jazz supergroup        196 2011-10-21 11:45 /user/Jazz/input/hdfs-site.xml
    -rw-r--r--   1 Jazz supergroup       2815 2011-10-21 11:45 /user/Jazz/input/log4j.properties
    -rw-r--r--   1 Jazz supergroup        212 2011-10-21 11:45 /user/Jazz/input/mapred-site.xml
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 11:45 /user/Jazz/input/slaves
    -rw-r--r--   1 Jazz supergroup       1243 2011-10-21 11:45 /user/Jazz/input/ssl-client.xml.example
    -rw-r--r--   1 Jazz supergroup       1195 2011-10-21 11:45 /user/Jazz/input/ssl-server.xml.example
    

-cat

  • 將路徑指定文件的內容輸出到標準輸出(STDOUT)
    Jazz@human ~
    $ hadoop fs -cat input/slaves
    localhost
    

-chgrp

  • 改變文件所屬的群組
    Jazz@human ~
    $ hadoop fs -ls input/slaves
    Found 1 items
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 11:45 /user/Jazz/input/slaves
    
    Jazz@human ~
    $ hadoop fs -chgrp ${USERNAME} input/slaves
    
    Jazz@human ~
    $ hadoop fs -ls input/slaves
    Found 1 items
    -rw-r--r--   1 Jazz Jazz         10 2011-10-21 11:45 /user/Jazz/input/slaves
    

-chmod

  • 改變文件的權限
    Jazz@human ~
    $ hadoop fs -ls input/slaves
    Found 1 items
    -rw-r--r--   1 Jazz Jazz         10 2011-10-21 11:45 /user/Jazz/input/slaves
    
    Jazz@human ~
    $ hadoop fs -chmod 700 input/slaves
    
    Jazz@human ~
    $ hadoop fs -ls input/slaves
    Found 1 items
    -rw-------   1 Jazz Jazz         10 2011-10-21 11:45 /user/Jazz/input/slaves
    

-chown

  • 改變文件的擁有者
    Jazz@human ~
    $ hadoop fs -chown hadoop input/slaves
    
    Jazz@human ~
    $ hadoop fs -ls input/slaves
    Found 1 items
    -rw-------   1 hadoop Jazz         10 2011-10-21 11:45 /user/Jazz/input/slaves
    

-copyFromLocal, -put

  • 從本機(local)上傳檔案到 HDFS
    Jazz@human ~
    $ hadoop fs -put fromHDFS dfs_input
    
    Jazz@human ~
    $ hadoop fs -ls
    Found 2 items
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:33 /user/Jazz/dfs_input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:00 /user/Jazz/input
    

-copyToLocal, -get

  • 把 HDFS 上的檔案下載到本機(local)
    Jazz@human ~
    $ hadoop fs -get dfs_input input1
    

-cp

  • 將文件從 HDFS 原本路徑複製到 HDFS 目標路徑
    Jazz@human ~
    $ hadoop fs -cp dfs_input input1
    
    Jazz@human ~
    $ hadoop fs -ls
    Found 3 items
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:33 /user/Jazz/dfs_input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:00 /user/Jazz/input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:34 /user/Jazz/input1
    

-du

  • 顯示目錄中所有文件的大小
    Jazz@human ~
    $ hadoop fs -du input
    Found 12 items
    3936        hdfs://localhost:9000/user/Jazz/input/capacity-scheduler.xml
    535         hdfs://localhost:9000/user/Jazz/input/configuration.xsl
    326         hdfs://localhost:9000/user/Jazz/input/core-site.xml
    2409        hdfs://localhost:9000/user/Jazz/input/hadoop-env.sh
    1245        hdfs://localhost:9000/user/Jazz/input/hadoop-metrics.properties
    4190        hdfs://localhost:9000/user/Jazz/input/hadoop-policy.xml
    196         hdfs://localhost:9000/user/Jazz/input/hdfs-site.xml
    2815        hdfs://localhost:9000/user/Jazz/input/log4j.properties
    212         hdfs://localhost:9000/user/Jazz/input/mapred-site.xml
    10          hdfs://localhost:9000/user/Jazz/input/slaves
    1243        hdfs://localhost:9000/user/Jazz/input/ssl-client.xml.example
    1195        hdfs://localhost:9000/user/Jazz/input/ssl-server.xml.example
    

-dus

  • 顯示該目錄/文件的總大小
    Jazz@human ~
    $ hadoop fs -dus input
    hdfs://localhost:9000/user/Jazz/input   18312
    

-expunge

  • 清空垃圾桶
    Jazz@human ~
    $ hadoop fs -expunge
    

-getmerge

  • 將來源目錄 <src> 下所有的文件都集合到本機一個 <localdst> 檔案內
  • 語法:hadoop fs -getmerge <src> <localdst>
    Jazz@human ~
    $ mkdir -p in1
    
    Jazz@human ~
    $ echo "this is one; " > in1/input
    
    Jazz@human ~
    $ echo "this is two; " > in1/input2
    
    Jazz@human ~
    $ hadoop fs -put in1 in1
    
    Jazz@human ~
    $ hadoop fs -getmerge in1 merge.txt
    
    Jazz@human ~
    $ cat ./merge.txt
    this is one;
    this is two;
    

-ls

  • 列出文件或目錄的資訊
  • 文件名 <副本數> 文件大小 修改日期 修改時間 權限 用戶ID 組ID
  • 目錄名 <dir> 修改日期 修改時間 權限 用戶ID 組ID
    Jazz@human ~
    $ hadoop fs -ls
    Found 3 items
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:33 /user/Jazz/dfs_input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:00 /user/Jazz/input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:34 /user/Jazz/input1
    

-lsr

  • ls 命令的遞迴版本
    Jazz@human ~
    $ hadoop fs -lsr
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:33 /user/Jazz/dfs_input
    -rw-r--r--   1 Jazz supergroup       3936 2011-10-21 12:33 /user/Jazz/dfs_input/capacity-scheduler.xml
    -rw-r--r--   1 Jazz supergroup        535 2011-10-21 12:33 /user/Jazz/dfs_input/configuration.xsl
    -rw-r--r--   1 Jazz supergroup        326 2011-10-21 12:33 /user/Jazz/dfs_input/core-site.xml
    -rw-r--r--   1 Jazz supergroup       2409 2011-10-21 12:33 /user/Jazz/dfs_input/hadoop-env.sh
    -rw-r--r--   1 Jazz supergroup       1245 2011-10-21 12:33 /user/Jazz/dfs_input/hadoop-metrics.properties
    -rw-r--r--   1 Jazz supergroup       4190 2011-10-21 12:33 /user/Jazz/dfs_input/hadoop-policy.xml
    -rw-r--r--   1 Jazz supergroup        196 2011-10-21 12:33 /user/Jazz/dfs_input/hdfs-site.xml
    -rw-r--r--   1 Jazz supergroup       2815 2011-10-21 12:33 /user/Jazz/dfs_input/log4j.properties
    -rw-r--r--   1 Jazz supergroup        212 2011-10-21 12:33 /user/Jazz/dfs_input/mapred-site.xml
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 12:33 /user/Jazz/dfs_input/masters
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 12:33 /user/Jazz/dfs_input/slaves
    -rw-r--r--   1 Jazz supergroup       1243 2011-10-21 12:33 /user/Jazz/dfs_input/ssl-client.xml.example
    -rw-r--r--   1 Jazz supergroup       1195 2011-10-21 12:33 /user/Jazz/dfs_input/ssl-server.xml.example
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:40 /user/Jazz/in1
    -rw-r--r--   1 Jazz supergroup         14 2011-10-21 12:40 /user/Jazz/in1/input
    -rw-r--r--   1 Jazz supergroup         14 2011-10-21 12:40 /user/Jazz/in1/input2
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:00 /user/Jazz/input
    -rw-r--r--   1 Jazz   supergroup       3936 2011-10-21 11:45 /user/Jazz/input/capacity-scheduler.xml
    -rw-r--r--   1 Jazz   supergroup        535 2011-10-21 11:45 /user/Jazz/input/configuration.xsl
    -rw-r--r--   1 Jazz   supergroup        326 2011-10-21 11:45 /user/Jazz/input/core-site.xml
    -rw-r--r--   1 Jazz   supergroup       2409 2011-10-21 11:45 /user/Jazz/input/hadoop-env.sh
    -rw-r--r--   1 Jazz   supergroup       1245 2011-10-21 11:45 /user/Jazz/input/hadoop-metrics.properties
    -rw-r--r--   1 Jazz   supergroup       4190 2011-10-21 11:45 /user/Jazz/input/hadoop-policy.xml
    -rw-r--r--   1 Jazz   supergroup        196 2011-10-21 11:45 /user/Jazz/input/hdfs-site.xml
    -rw-r--r--   1 Jazz   supergroup       2815 2011-10-21 11:45 /user/Jazz/input/log4j.properties
    -rw-r--r--   1 Jazz   supergroup        212 2011-10-21 11:45 /user/Jazz/input/mapred-site.xml
    -rw-------   1 hadoop Jazz               10 2011-10-21 11:45 /user/Jazz/input/slaves
    -rw-r--r--   1 Jazz   supergroup       1243 2011-10-21 11:45 /user/Jazz/input/ssl-client.xml.example
    -rw-r--r--   1 Jazz   supergroup       1195 2011-10-21 11:45 /user/Jazz/input/ssl-server.xml.example
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:34 /user/Jazz/input1
    -rw-r--r--   1 Jazz supergroup       3936 2011-10-21 12:34 /user/Jazz/input1/capacity-scheduler.xml
    -rw-r--r--   1 Jazz supergroup        535 2011-10-21 12:34 /user/Jazz/input1/configuration.xsl
    -rw-r--r--   1 Jazz supergroup        326 2011-10-21 12:34 /user/Jazz/input1/core-site.xml
    -rw-r--r--   1 Jazz supergroup       2409 2011-10-21 12:34 /user/Jazz/input1/hadoop-env.sh
    -rw-r--r--   1 Jazz supergroup       1245 2011-10-21 12:34 /user/Jazz/input1/hadoop-metrics.properties
    -rw-r--r--   1 Jazz supergroup       4190 2011-10-21 12:34 /user/Jazz/input1/hadoop-policy.xml
    -rw-r--r--   1 Jazz supergroup        196 2011-10-21 12:34 /user/Jazz/input1/hdfs-site.xml
    -rw-r--r--   1 Jazz supergroup       2815 2011-10-21 12:34 /user/Jazz/input1/log4j.properties
    -rw-r--r--   1 Jazz supergroup        212 2011-10-21 12:34 /user/Jazz/input1/mapred-site.xml
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 12:34 /user/Jazz/input1/masters
    -rw-r--r--   1 Jazz supergroup         10 2011-10-21 12:34 /user/Jazz/input1/slaves
    -rw-r--r--   1 Jazz supergroup       1243 2011-10-21 12:34 /user/Jazz/input1/ssl-client.xml.example
    -rw-r--r--   1 Jazz supergroup       1195 2011-10-21 12:34 /user/Jazz/input1/ssl-server.xml.example
    

-mkdir

  • 建立資料夾
    Jazz@human ~
    $ hadoop fs -mkdir tmp
    Jazz@human ~
    $ hadoop fs -ls
    Found 5 items
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:33 /user/Jazz/dfs_input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:40 /user/Jazz/in1
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:00 /user/Jazz/input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:34 /user/Jazz/input1
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:43 /user/Jazz/tmp
    

-moveFromLocal

  • 將 local 端的資料夾剪下移動到 HDFS 上
    Jazz@human ~
    $ hadoop fs -moveFromLocal in1 in2
    Jazz@human ~
    $ hadoop fs -ls
    Found 6 items
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:33 /user/Jazz/dfs_input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:40 /user/Jazz/in1
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:44 /user/Jazz/in2
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:00 /user/Jazz/input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:34 /user/Jazz/input1
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:43 /user/Jazz/tmp
    

-mv

  • 更改資料的名稱
    Jazz@human ~
    $ hadoop fs -mv in2 in3
    
    Jazz@human ~
    $ hadoop fs -ls
    Found 6 items
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:33 /user/Jazz/dfs_input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:40 /user/Jazz/in1
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:44 /user/Jazz/in3
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:00 /user/Jazz/input
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:34 /user/Jazz/input1
    drwxr-xr-x   - Jazz supergroup          0 2011-10-21 12:43 /user/Jazz/tmp
    

-rm

  • 刪除指定的檔案(不能是資料夾)
    Jazz@human ~
    $ hadoop fs -rm in1/input
    Deleted hdfs://localhost:9000/user/Jazz/in1/input
    

-rmr

  • 遞迴刪除資料夾(包含在內的所有檔案),可以是多個資料夾
    Jazz@human ~
    $ hadoop fs -rmr dfs_input in1 in3 input1
    Deleted hdfs://localhost:9000/user/Jazz/dfs_input
    Deleted hdfs://localhost:9000/user/Jazz/in1
    Deleted hdfs://localhost:9000/user/Jazz/in3
    Deleted hdfs://localhost:9000/user/Jazz/input1
    

-setrep

  • 設定副本係數
  • 語法:hadoop fs -setrep [-R] [-w] <rep> <path/file>
    Jazz@human ~
    $ hadoop fs -setrep -w 1 -R input
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/capacity-scheduler.xml
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/configuration.xsl
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/core-site.xml
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/hadoop-env.sh
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/hadoop-metrics.properties
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/hadoop-policy.xml
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/hdfs-site.xml
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/log4j.properties
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/mapred-site.xml
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/slaves
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/ssl-client.xml.example
    Replication 1 set: hdfs://localhost:9000/user/Jazz/input/ssl-server.xml.example
    Waiting for hdfs://localhost:9000/user/Jazz/input/capacity-scheduler.xml ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/configuration.xsl ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/core-site.xml ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/hadoop-env.sh ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/hadoop-metrics.properties ...done
    Waiting for hdfs://localhost:9000/user/Jazz/input/hadoop-policy.xml ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/hdfs-site.xml ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/log4j.properties ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/mapred-site.xml ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/slaves ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/ssl-client.xml.example ... done
    Waiting for hdfs://localhost:9000/user/Jazz/input/ssl-server.xml.example ... done
    $ bin/hadoop fs -setrep -w 2 -R input 
    Replication 2 set: hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/1.txt
    Replication 2 set: hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/2.txt
    Replication 2 set: hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/3.txt
    Replication 2 set: hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/4.txt
    Waiting for hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/1.txt ... done
    Waiting for hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/2.txt ... done
    Waiting for hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/3.txt ... done
    Waiting for hdfs://gm1.nchc.org.tw:9000/user/hadooper/input/4.txt ... done
    

-stat

  • 印出時間資訊
    Jazz@human ~
    $ hadoop fs -stat input
    2011-10-21 04:00:44
    

-tail

  • 將文件的最後1k內容輸出
  • 用法:hadoop fs -tail [-f] 檔案 (-f 參數用來顯示如果檔案增大,則秀出被append上得內容)
    Jazz@human ~
    $ hadoop fs -tail input/log4j.properties
    g4j.RollingFileAppender
    #log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
    
    # Logfile size and and 30-day backups
    #log4j.appender.RFA.MaxFileSize=1MB
    #log4j.appender.RFA.MaxBackupIndex=30
    
    #log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
    #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
    #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L))
    - %m%n
    
    #
    # FSNamesystem Audit logging
    # All audit events are logged at INFO level
    #
    log4j.logger.org.apache.hadoop.fs.FSNamesystem.audit=WARN
    
    # Custom Logging levels
    
    #log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
    #log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
    #log4j.logger.org.apache.hadoop.fs.FSNamesystem=DEBUG
    
    # Jets3t library
    log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
    
    #
    # Event Counter Appender
    # Sends counts of logging messages at different severity levels to Hadoop Metric
    s.
    #
    log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter
    

-test

  • 測試檔案, -e 檢查文件是否存在(1=存在, 0=否), -z 檢查文件是否為空(1=空, 0=不為空), -d 檢查是否為目錄(1=存在, 0=否)
    • 要用echo $? 來看回傳值為 0 or 1
  • 用法: bin/hadoop fs -test -[ezd] URI
    ########## -e 用來判斷檔案是否存在,回傳 0 為真,回傳 1 為偽 ##########
    
    Jazz@human ~
    $ hadoop fs -test -e input/slaves
    
    Jazz@human ~
    $ echo $?
    0
    
    Jazz@human ~
    $ hadoop fs -test -e input/masters
    
    Jazz@human ~
    $ echo $?
    1
    
    ########## -z 用來判斷檔案大小是否為零,回傳 0 為真,回傳 1 為偽 ##########
    
    Jazz@human ~
    $ hadoop fs -test -z  input/slaves
    
    Jazz@human ~
    $ echo $?
    1
    
    Jazz@human ~
    $ hadoop fs -test -z  input/masters
    test: File does not exist: input/masters
    
    ########## -d 用來判斷是不是目錄,回傳 0 為真,回傳 1 為偽 ##########
    
    Jazz@human ~
    $ hadoop fs -test -d input/slaves
    
    Jazz@human ~
    $ echo $?
    1
    
    Jazz@human ~
    $ hadoop fs -test -d input
    
    Jazz@human ~
    $ echo $?
    0
    
    

-text

  • 將檔案(如壓縮檔, textrecordinputstream)輸出為純文字格式
  • hadoop fs -text <src>
    Jazz@human ~
    $ tar zcvf input.tar.gz input1
    input1/
    input1/capacity-scheduler.xml
    input1/configuration.xsl
    input1/core-site.xml
    input1/hadoop-env.sh
    input1/hadoop-metrics.properties
    input1/hadoop-policy.xml
    input1/hdfs-site.xml
    input1/log4j.properties
    input1/mapred-site.xml
    input1/masters
    input1/slaves
    input1/ssl-client.xml.example
    input1/ssl-server.xml.example
    Jazz@human ~
    $ hadoop fs -put input1.tar.gz .
    Jazz@human ~
    $ hadoop fs -text input.tar.gz
    <略>
    
  • 註:目前沒支援 zip 的函式庫
    Jazz@human ~
    $ zip -r input1.zip input1/
    updating: input1/ (stored 0%)
      adding: input1/capacity-scheduler.xml (deflated 71%)
      adding: input1/configuration.xsl (deflated 50%)
      adding: input1/core-site.xml (deflated 46%)
      adding: input1/hadoop-env.sh (deflated 58%)
      adding: input1/hadoop-metrics.properties (deflated 78%)
      adding: input1/hadoop-policy.xml (deflated 83%)
      adding: input1/hdfs-site.xml (deflated 35%)
      adding: input1/log4j.properties (deflated 67%)
      adding: input1/mapred-site.xml (deflated 34%)
      adding: input1/masters (stored 0%)
      adding: input1/slaves (stored 0%)
      adding: input1/ssl-client.xml.example (deflated 79%)
      adding: input1/ssl-server.xml.example (deflated 78%)
    Jazz@human ~
    $ hadoop fs -put input1.zip .
    Jazz@human ~
    $ hadoop fs -text input1.zip
    PK
    <略>
    

-touchz

  • 建立一個空文件
    Jazz@human ~
    $ hadoop fs -touchz empty
    
    Jazz@human ~
    $ hadoop fs -test -z empty ; echo $?
    0
    
Last modified 13 years ago Last modified on Oct 21, 2011, 2:01:42 PM