[[PageOutline]] ◢ <[wiki:FDC110829/Lab4 實作四]> | <[wiki:FDC110829 回課程大綱]> ▲ | <[wiki:FDC110829/Lab6 實作六]> ◣ = 實作五 Lab 5 = {{{ #!html

Hadoop Streaming in different Language
}}} == Existing Binary == {{{ ~$ hadoop fs -put /etc/hadoop/conf lab7_input ~$ hadoop jar hadoop-streaming.jar -input lab7_input -output lab7_out1 -mapper /bin/cat -reducer /usr/bin/wc ~$ hadoop fs -cat lab7_out1/part-00000 }}} == Bash Shell Script == {{{ ~$ echo "sed -e \"s/ /\n/g\" | grep ." > streamingMapper.sh ~$ echo "uniq -c | awk '{print \$2 \"\t\" \$1}'" > streamingReducer.sh ~$ chmod a+x streamingMapper.sh ~$ chmod a+x streamingReducer.sh ~$ hadoop jar hadoop-streaming.jar -input lab7_input -output lab7_out2 -mapper streamingMapper.sh -reducer streamingReducer.sh -file streamingMapper.sh -file streamingReducer.sh ~$ hadoop fs -cat lab7_out2/part-00000 }}} == PHP Script == * 編輯 mapper 的 php 程式 {{{ ~$ cat > mapper.php << EOF #!/usr/bin/php \$count) { // 印出 [字 , "tab符號" , "數字" , "結束字元"] echo \$word, chr(9), \$count, PHP_EOL; } ?> EOF }}} * 編輯 reduce 的 php 程式 {{{ ~$ cat > reducer.php << EOF #!/usr/bin/php int \$count = intval(\$count); // 加總 if (\$count > 0) \$word2count[\$word] += \$count; } // 此行不必要,但可讓output排列更完整 ksort(\$word2count); // 將結果寫到 STDOUT (standard output) foreach (\$word2count as \$word => \$count) { echo \$word, chr(9), \$count, PHP_EOL; } ?> EOF }}} * 修改執行權限 {{{ ~$ chmod a+x *.php }}} * 測試是否能運作 {{{ ~$ echo "i love hadoop, hadoop love u" | ./mapper.php | ./reducer.php }}} * 開始執行 {{{ ~$ hadoop jar hadoop-streaming.jar -mapper mapper.php -reducer reducer.php -input lab7_input -output lab7_out3 -file mapper.php -file reducer.php }}} * 檢查結果 {{{ ~$ hadoop fs -cat lab7_out3/part-00000 }}}