wiki:NCTU110329/Lab6

Version 2 (modified by jazz, 14 years ago) (diff)

--

◢ <實作五> | <回課程大綱> ▲ | <實作七> ◣

實作六

MapReduce 程式編譯
Compiling Hadoop MapReduce Java Program

Practice 1 : Word Count (Basic)

  • 上傳內容到 HDFS 內
    upload data to HDFS
$ mkdir lab6_input
$ echo "I like NCTU Cloud Course." > lab6_input/input1
$ echo "I like nctu Cloud Course, and we enjoy this course." > lab6_input/input2
$ hadoop fs -put lab6_input lab6_input
$ hadoop fs -ls lab6_input
Found 2 items
-rw-r--r--   2 hXXXX supergroup         26 2011-04-19 10:07 /user/hXXXX/lab6_input/input1
-rw-r--r--   2 hXXXX supergroup         52 2011-04-19 10:07 /user/hXXXX/lab6_input/input2
  • 運作程式
    Compile WordCount.java and run it by hadoop jar command
$ mkdir MyJava
$ ln -s /usr/lib/hadoop/hadoop-*-core.jar hadoop-core.jar
$ javac -classpath hadoop-core.jar -d MyJava WordCount.java
$ jar -cvf wordcount.jar -C MyJava .
$ hadoop jar wordcount.jar WordCount lab6_input/ lab6_out1/
$ hadoop fs -cat lab6_out1/part-00000
  • lab6_out1 執行結果
    You should see results like this :
    Cloud 2
    Course, 1
    Course. 1
    I 2
    NCTU  1
    and 1
    course. 1
    enjoy 1
    like  2
    nctu  1
    this  1
    we  1
    

Practice 2 : Word Count (Advanced)

$ echo "\." >pattern.txt && echo "\," >>pattern.txt
$ hadoop fs -put pattern.txt .
$ mkdir -p MyJava2
$ javac -classpath hadoop-core.jar -d MyJava2 WordCount2.java
$ jar -cvf wordcount2.jar -C MyJava2 .
$ hadoop jar wordcount2.jar WordCount2 lab6_input lab6_out2 -skip pattern.txt
$ hadoop fs -cat lab6_out2/part-00000
  • lab6_out2 執行結果
    You should see results like this:
    Cloud 2
    Course  2
    I 2
    NCTU  1
    and 1
    course  1
    enjoy 1
    like  2
    nctu  1
    this  1
    we  1
    
  • Let's given case insensitive and ignore pattern for this example
    $ hadoop jar wordcount2.jar WordCount2 -Dwordcount.case.sensitive=false lab6_input lab6_out3 -skip pattern.txt
    $ hadoop fs -cat lab6_out3/part-00000
    
  • lab6_out3 執行結果
    You should see results like this:
    and 1
    cloud 2
    course  3
    enjoy 1
    i 2
    like  2
    nctu  2
    this  1
    we  1