close
Warning:
Can't synchronize with repository "(default)" (Unsupported version control system "svn": /usr/lib/python2.7/dist-packages/libsvn/_fs.so: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.
作業一
- 題目:請參考 hadoop_labs/lab013 改成逆向索引(Reverse Index)。使 ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。
- 參考: 以連結之執行方法,忽略句點(\.)與逗點(\,),並且忽略大小寫(case.sensitive=false),
- 參考步驟:
Here is the reference steps:
$ mkdir hw1_input
$ echo "I like NCU Course." > hw1_input/input1
$ echo "I like ncu Course, and we enjoy this course." > hw1_input/input2
$ hadoop fs -put hw1_input hw1_input
$ echo "\." >pattern.txt && echo "\," >> pattern.txt
$ hadoop fs -put pattern.txt .
$ hadoop jar WordCount -Dwordcount.case.sensitive=false hw1_input hw1_out -skip pattern.txt
$ hadoop fs -cat hw1_out/part-00000
- 參考結果應該為:(路徑不限)
The reference result should be as following:(no limitation for the format of "path")
and input2
cloud input1,input2
course input1,input2,input2
enjoy input2
i input1,input2
like input1,input2
ntu input1,input2
this input2
we input2
- 繳交期限:2016年5月2日(一) 上午 11:59
- 繳交方式:將原始碼與報告以附件方式寄至 jazzwang@… (1) 程式原始碼一份:以 ${學號}.zip 方式壓縮與命名 (2) 報告一份:以 ${學號} 命名。
- 提示:
Hint:
- 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, IntWritable) 改成 (Text, Text)
- Replace (Key,Value) pair from (Text, IntWritable) to (Text, Text)
- 加分題:(Extra)
- 試將出現次數統計加入結果,亦即參考結果如下:
Try to add count of each file in the result, i.e. The reference result should be as following:
and input2(1)
cloud input1(1),input2(1)
course input1(1),input2(2)
enjoy input2(1)
i input1(1),input2(1)
like input1(1),input2(1)
nctu input1(1),input2(1)
this input2(1)
we input2(1)
- 配分比例:
- 標準題原始碼 Source Code:60%
- 報告 Report :20%
- 參考內容入下:Reference Items should be shown in your report
- 封面 Cover : 姓名、學號 ( Your Name and ID )
- 執行結果 The result of your program
- 加分題:20%
Download in other formats: