實做四: Hadoop 程式編譯
練習 0 : hello : 印出我們的 key value 為何
- 下載 nchc-example.jar
$ wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/Hadoop_Lab4/nchc-example.jar
- 執行客製化的hadoop 程式
$ bin/hadoop jar nchc-example.jar
- 輸出結果
****************************************** 歡迎使用 NCHC.Hadoop 的運算功能 指令: Hadoop jar nchc-example-*.jar <功能> 功能: wordcount: 計算輸入資料夾內分別在每個檔案的字數統計 mwc: 統合計算所有輸入檔的字數統計 grep: 算出包指定字串的個數 nchcgrep: 整合來源檔內的每個字與其所有出現的所在列 hello: 印出內容並統計累進字數 ******************************************
- 使用hello
$ bin/hadoop jar nchc-example.jar hello
- 使用提示:
hello <inDir> <outDir> <m> <r>
- 範例:
$ bin/hadoop jar nchc-example.jar hello lab3_input lab4_out6 1 1
練習 1 : Word Count 初級版
- 上傳內容到hdfs內
$ cd /opt/hadoop $ mkdir lab4_input $ echo "I like NCHC Cloud Course." > lab4_input/input1 $ echo "I like nchc Cloud Course, and we enjoy this course." > lab4_input/input2 $ bin/hadoop fs -put lab4_input lab4_input $ bin/hadoop fs -ls lab4_input
- 下載 WordCount.java 並存到/opt/hadoop;
$ wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/jazz/Hadoop_Lab6/WordCount.java
- 運作程式
$ mkdir MyJava $ javac -classpath hadoop-*-core.jar -d MyJava WordCount.java $ jar -cvf wordcount.jar -C MyJava . $ bin/hadoop jar wordcount.jar WordCount lab4_input/ lab4_out1/ $ bin/hadoop fs -cat lab4_out1/part-00000
- lab4_out1 執行結果
Cloud 2 Course, 1 Course. 1 I 2 NCHC 1 and 1 course. 1 enjoy 1 like 2 nchc 1 this 1 we 1
練習 2 : Word Count 進階版
$ echo "\." >pattern.txt && echo "\," >>pattern.txt $ bin/hadoop fs -put pattern.txt ./ $ mkdir MyJava2
- 下載 WordCount2.java 並存到/opt/hadoop;
$ wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/jazz/Hadoop_Lab6/WordCount2.java
$ javac -classpath hadoop-*-core.jar -d MyJava2 WordCount2.java $ jar -cvf wordcount2.jar -C MyJava2 . $ bin/hadoop jar wordcount2.jar WordCount2 lab4_input lab4_out2 -skip pattern.txt $ bin/hadoop fs -cat lab4_out2/part-00000
- lab4_out2 執行結果
Cloud 2 Course 2 I 2 NCHC 1 and 1 course 1 enjoy 1 like 2 nchc 1 this 1 we 1
$ bin/hadoop jar wordcount2.jar WordCount2 -Dwordcount.case.sensitive=false lab4_input lab4_out3 -skip pattern.txt $ bin/hadoop fs -cat lab4_out3/part-00000
- lab4_out3 執行結果
and 1 cloud 2 course 3 enjoy 1 i 2 like 2 nchc 2 this 1 we 1
Last modified 15 years ago
Last modified on Apr 26, 2010, 12:02:53 PM