| 39 | |
| 40 | = 作業一 Homework 1 = |
| 41 | |
| 42 | * 題目:請嘗試將 [wiki:NCTU110329/Lab6 實作六] 的 WordCount2.java 改成逆向索引(Reverse Index) !ReverseIndex.java。使 !ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。以實作六最後的執行方法,忽略句點(\.)與逗點(\,),並且忽略大小寫(case.sensitive=false), |
| 43 | * Please try to modified WordCount2.java downloaded from [wiki:NCTU110329/Lab6 Lab6]. Rename it to !ReverseIndex.java. Let !ReverseIndex output as "Keyword <TAB> filename(separated by comma)". Try to run it by ignoring "\." and "\," pattern and case-insensitive. |
| 44 | * 參考步驟:[[BR]]Here is the reference steps: |
| 45 | {{{ |
| 46 | #!sh |
| 47 | $ wget http://hadoop.nchc.org.tw/WordCount2.java -O ReverseIndex.java |
| 48 | $ vi ReverseIndex.java #### DO YOUR MODIFICATION - 修改對應的程式碼 |
| 49 | $ mkdir -p MyJava3 |
| 50 | $ javac -classpath hadoop-core.jar -d MyJava3 ReverseIndex.java |
| 51 | $ jar -cvf reverseindex.jar -C MyJava3 . |
| 52 | $ hadoop jar reverseindex.jar ReverseIndex -Dwordcount.case.sensitive=false lab6_input lab6_out4 -skip pattern.txt |
| 53 | $ hadoop fs -cat lab6_out4/part-00000 |
| 54 | }}} |
| 55 | * 參考結果應該為:(路徑不限)[[BR]]The reference result should be as following:(no limitation for the format of "path") |
| 56 | {{{ |
| 57 | and input2 |
| 58 | cloud input1,input2 |
| 59 | course input1,input2,input2 |
| 60 | enjoy input2 |
| 61 | i input1,input2 |
| 62 | like input1,input2 |
| 63 | nctu input1,input2 |
| 64 | this input2 |
| 65 | we input2 |
| 66 | }}} |
| 67 | * 繳交期限:2011年6月13日(一) 上午 11:59 |
| 68 | * Due date: 11:59 AM, Monday, June 13th, Year 2011 |
| 69 | * 繳交方式:將原始碼與報告以附件方式寄至 jazz _AT_ nchc _DOT_ org _DOT_ tw (1) 程式原始碼一份:以 ${學號}.zip 方式壓縮與命名 (2) 報告一份:以 ${學號} 命名。 |
| 70 | * Please e-mail the java source code and report (doc or PDF) to jazz _AT_ nchc _DOT_ org _DOT_ tw |
| 71 | * 提示:[[BR]]Hint: |
| 72 | * 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, !IntWritable) 改成 (Text, Text) |
| 73 | * Replace (Key,Value) pair from (Text, !IntWritable) to (Text, Text) |
| 74 | * 加分題:(Extra) |
| 75 | * 試將出現次數統計加入結果,亦即參考結果如下:[[BR]]Try to add count of each file in the result, i.e. The reference result should be as following: |
| 76 | {{{ |
| 77 | and input2(1) |
| 78 | cloud input1(1),input2(1) |
| 79 | course input1(1),input2(2) |
| 80 | enjoy input2(1) |
| 81 | i input1(1),input2(1) |
| 82 | like input1(1),input2(1) |
| 83 | nctu input1(1),input2(1) |
| 84 | this input2(1) |
| 85 | we input2(1) |
| 86 | }}} |
| 87 | * 配分比例: |
| 88 | * 標準題原始碼 Source Code:60% |
| 89 | * 報告 Report :20% |
| 90 | * 參考內容入下:Reference Items should be shown in your report |
| 91 | * 封面 Cover : 姓名、學號 ( Your Name and ID ) |
| 92 | * 於 hadoop.nchc.org.tw 執行的擷圖(Screenshot of your program running on hadoop.nchc.org.tw) |
| 93 | * 執行結果 The result of your program |
| 94 | * 加分題:20% |