[[PageOutline]]
◢ <[wiki:YZU130807/Lab4 實作四]> | <[wiki:YZU130807 回課程大綱]> ▲ | <[wiki:YZU130807/Lab6 實作六]> ◣
= 實作五 Lab 5 =
{{{
#!html
HDFS 進階指令操作與行為觀察(叢集)
HDFS advanced commands on full distributed mode
}}}
{{{
#!text
以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。
}}}
== prepare big data set ==
== 準備一個大的資料集 ==
* 首先,讓我們產生一個大小為 200MB 的檔案。
{{{
h998@hadoop:~$ dd if=/dev/zero of=200mb.img bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 0.239545 s, 875 MB/s
}}}
* 驗證一下檔案大小
{{{
h998@hadoop:~$ du -sh 200mb.img
200M 200mb.img
}}}
* 將 200mb.img 上傳到 HDFS
{{{
h998@hadoop:~$ hadoop fs -put 200mb.img 200mb.img
}}}
* 驗證一下,上傳是否成功?
{{{
h998@hadoop:~$ hadoop fs -ls 200mb.img
Found 1 items
-rw-r--r-- 2 h998 supergroup 209715200 2013-08-12 12:06 /user/h998/200mb.img
}}}
== fsck ==
== 檔案系統檢查 ==
* 首先,讓我們學習一下 fsck 的基本用法
{{{
#!sh
~$ hadoop fsck
Usage: DFSck [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]
start checking from this path
-move move corrupted files to /lost+found
-delete delete corrupted files
-files print out files being checked
-openforwrite print out files opened for write
-blocks print out block report
-locations print out locations for every block
-racks print out network topology for data-node locations
By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status
}}}
* 我們先不給任何參數,只給絕對路徑看看結果
{{{
h998@hadoop:~$ hadoop fsck /user/${USER}/200mb.img
.Status: HEALTHY
Total size: 209715200 B
Total dirs: 0
Total files: 1
Total blocks (validated): 2 (avg. block size 104857600 B)
Minimally replicated blocks: 2 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 12
Number of racks: 1
The filesystem under path '/user/h998/200mb.img' is HEALTHY
}}}
* 接著,我們要來使用 fsck 的參數,來觀察 200mb.img 到底有幾個區塊?這些區塊又分別存放在哪幾台機器中呢?
{{{
h998@hadoop:~$ hadoop fsck /user/${USER}/200mb.img -files -blocks -locations -racks
/user/h998/200mb.img 209715200 bytes, 2 block(s): OK
0. blk_-6674004733773524889_19333928 len=134217728 repl=2 [/default-rack/192.168.1.4:50010, /default-rack/192.168.1.8:50010]
1. blk_-2951307914939094717_19333928 len=75497472 repl=2 [/default-rack/192.168.1.14:50010, /default-rack/192.168.1.2:50010]
Status: HEALTHY
Total size: 209715200 B
Total dirs: 0
Total files: 1
Total blocks (validated): 2 (avg. block size 104857600 B)
Minimally replicated blocks: 2 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 12
Number of racks: 1
The filesystem under path '/user/h998/200mb.img' is HEALTHY
}}}