wiki:YZU130807/Lab5

◢ <實作四> | <回課程大綱> ▲ | <實作六> ◣

實作五 Lab 5

HDFS 進階指令操作與行為觀察(叢集)
HDFS advanced commands on full distributed mode
以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。

prepare big data set

準備一個大的資料集

  • 首先,讓我們產生一個大小為 200MB 的檔案。
    h998@hadoop:~$ dd if=/dev/zero of=200mb.img bs=1M count=200
    200+0 records in
    200+0 records out
    209715200 bytes (210 MB) copied, 0.239545 s, 875 MB/s
    
  • 驗證一下檔案大小
    h998@hadoop:~$ du -sh 200mb.img 
    200M	200mb.img
    
  • 將 200mb.img 上傳到 HDFS
    h998@hadoop:~$ hadoop fs -put 200mb.img 200mb.img 
    
  • 驗證一下,上傳是否成功?
    h998@hadoop:~$ hadoop fs -ls 200mb.img 
    Found 1 items
    -rw-r--r--   2 h998 supergroup  209715200 2013-08-12 12:06 /user/h998/200mb.img
    

fsck

檔案系統檢查

  • 首先,讓我們學習一下 fsck 的基本用法
    ~$ hadoop fsck 
    Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]
      <path>  start checking from this path
      -move move corrupted files to /lost+found
      -delete delete corrupted files
      -files  print out files being checked
      -openforwrite print out files opened for write
      -blocks print out block report
      -locations  print out locations for every block
      -racks  print out network topology for data-node locations
        By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on their block allocation status
    
  • 我們先不給任何參數,只給絕對路徑看看結果
    h998@hadoop:~$ hadoop fsck /user/${USER}/200mb.img
    .Status: HEALTHY
     Total size:	209715200 B
     Total dirs:	0
     Total files:	1
     Total blocks (validated):	2 (avg. block size 104857600 B)
     Minimally replicated blocks:	2 (100.0 %)
     Over-replicated blocks:	0 (0.0 %)
     Under-replicated blocks:	0 (0.0 %)
     Mis-replicated blocks:		0 (0.0 %)
     Default replication factor:	2
     Average block replication:	2.0
     Corrupt blocks:		0
     Missing replicas:		0 (0.0 %)
     Number of data-nodes:		12
     Number of racks:		1
    
    
    The filesystem under path '/user/h998/200mb.img' is HEALTHY
    
  • 接著,我們要來使用 fsck 的參數,來觀察 200mb.img 到底有幾個區塊?這些區塊又分別存放在哪幾台機器中呢?
    h998@hadoop:~$ hadoop fsck /user/${USER}/200mb.img -files -blocks -locations -racks
    /user/h998/200mb.img 209715200 bytes, 2 block(s):  OK
    0. blk_-6674004733773524889_19333928 len=134217728 repl=2 [/default-rack/192.168.1.4:50010, /default-rack/192.168.1.8:50010]
    1. blk_-2951307914939094717_19333928 len=75497472 repl=2 [/default-rack/192.168.1.14:50010, /default-rack/192.168.1.2:50010]
    
    Status: HEALTHY
     Total size:	209715200 B
     Total dirs:	0
     Total files:	1
     Total blocks (validated):	2 (avg. block size 104857600 B)
     Minimally replicated blocks:	2 (100.0 %)
     Over-replicated blocks:	0 (0.0 %)
     Under-replicated blocks:	0 (0.0 %)
     Mis-replicated blocks:		0 (0.0 %)
     Default replication factor:	2
     Average block replication:	2.0
     Corrupt blocks:		0
     Missing replicas:		0 (0.0 %)
     Number of data-nodes:		12
     Number of racks:		1
    
    
    The filesystem under path '/user/h998/200mb.img' is HEALTHY
    
Last modified 11 years ago Last modified on Aug 12, 2013, 12:12:29 PM