實作七 Lab 7
Hadoop Streaming 搭配不同程式語言練習
Hadoop Streaming in different Language
Hadoop Streaming in different Language
以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。
搭配現存二進位執行檔
Existing Binary
~$ hadoop fs -put /etc/hadoop/conf lab7_input ~$ hadoop jar hadoop-streaming.jar -input lab7_input -output lab7_out1 -mapper /bin/cat -reducer /usr/bin/wc ~$ hadoop fs -cat lab7_out1/part-00000
搭配 Bash Shell Script
~$ echo "sed -e \"s/ /\n/g\" | grep ." > streamingMapper.sh ~$ echo "uniq -c | awk '{print \$2 \"\t\" \$1}'" > streamingReducer.sh ~$ chmod a+x streamingMapper.sh ~$ chmod a+x streamingReducer.sh ~$ hadoop jar hadoop-streaming.jar -input lab7_input -output lab7_out2 -mapper streamingMapper.sh -reducer streamingReducer.sh -file streamingMapper.sh -file streamingReducer.sh ~$ hadoop fs -cat lab7_out2/part-00000
搭配 PHP Script
- 編輯 mapper 的 php 程式
~$ cat > mapper.php << EOF #!/usr/bin/php <?php \$word2count = array(); // 標準輸入為 STDIN (standard input) while ((\$line = fgets(STDIN)) !== false) { // 移除小寫與空白 \$line = strtolower(trim(\$line)); // 將行拆解成各個字於 words 陣列中 \$words = preg_split('/\W/', \$line, 0, PREG_SPLIT_NO_EMPTY); // 將字 +1 foreach (\$words as \$word) { \$word2count[\$word] += 1; } } // 將結果寫到 STDOUT (standard output) foreach (\$word2count as \$word => \$count) { // 印出 [字 , "tab符號" , "數字" , "結束字元"] echo \$word, chr(9), \$count, PHP_EOL; } ?> EOF
- 編輯 reduce 的 php 程式
~$ cat > reducer.php << EOF #!/usr/bin/php <?php \$word2count = array(); // 輸入為 STDIN while ((\$line = fgets(STDIN)) !== false) { // 移除多餘的空白 \$line = trim(\$line); // 每一行的格式為 (單字 "tab" 數字) ,紀錄到(\$word, \$count) list(\$word, \$count) = explode(chr(9), \$line); // 轉換格式string -> int \$count = intval(\$count); // 加總 if (\$count > 0) \$word2count[\$word] += \$count; } // 此行不必要,但可讓output排列更完整 ksort(\$word2count); // 將結果寫到 STDOUT (standard output) foreach (\$word2count as \$word => \$count) { echo \$word, chr(9), \$count, PHP_EOL; } ?> EOF
- 修改執行權限
~$ chmod a+x *.php
- 測試是否能運作
~$ echo "i love hadoop, hadoop love u" | ./mapper.php | ./reducer.php
- 開始執行
~$ hadoop jar hadoop-streaming.jar -mapper mapper.php -reducer reducer.php -input lab7_input -output lab7_out3 -file mapper.php -file reducer.php
- 檢查結果
~$ hadoop fs -cat lab7_out3/part-00000
Last modified 12 years ago
Last modified on Sep 6, 2012, 11:45:16 PM