Changes between Version 18 and Version 19 of YMU110509


Ignore:
Timestamp:
May 30, 2011, 3:39:47 PM (13 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • YMU110509

    v18 v19  
    3737 * http://hadoop.nchc.org.tw/hadoop-doc/api/index.html - Hadoop 0.20.2 javadoc 文件
    3838 * http://forum.hadoop.tw - 台灣 Hadoop 使用者討論區
     39
     40= 作業一 Homework 1 =
     41
     42 * 題目:請嘗試將 [wiki:NCTU110329/Lab6 實作六] 的 WordCount2.java 改成逆向索引(Reverse Index) !ReverseIndex.java。使 !ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。以實作六最後的執行方法,忽略句點(\.)與逗點(\,),並且忽略大小寫(case.sensitive=false),
     43 * Please try to modified WordCount2.java downloaded from [wiki:NCTU110329/Lab6 Lab6]. Rename it to !ReverseIndex.java. Let !ReverseIndex output as "Keyword <TAB> filename(separated by comma)". Try to run it by ignoring "\." and "\," pattern and case-insensitive.
     44 * 參考步驟:[[BR]]Here is the reference steps:
     45{{{
     46#!sh
     47$ wget http://hadoop.nchc.org.tw/WordCount2.java -O ReverseIndex.java
     48$ vi ReverseIndex.java #### DO YOUR MODIFICATION - 修改對應的程式碼
     49$ mkdir -p MyJava3
     50$ javac -classpath hadoop-core.jar -d MyJava3 ReverseIndex.java
     51$ jar -cvf reverseindex.jar -C MyJava3 .
     52$ hadoop jar reverseindex.jar ReverseIndex -Dwordcount.case.sensitive=false lab6_input lab6_out4 -skip pattern.txt
     53$ hadoop fs -cat lab6_out4/part-00000
     54}}}
     55 * 參考結果應該為:(路徑不限)[[BR]]The reference result should be as following:(no limitation for the format of "path")
     56{{{
     57and     input2
     58cloud   input1,input2
     59course  input1,input2,input2
     60enjoy   input2
     61i       input1,input2
     62like    input1,input2
     63nctu    input1,input2
     64this    input2
     65we      input2
     66}}}
     67 * 繳交期限:2011年6月13日(一) 上午 11:59
     68 * Due date: 11:59 AM, Monday, June 13th, Year 2011
     69 * 繳交方式:將原始碼與報告以附件方式寄至 jazz _AT_ nchc _DOT_ org _DOT_ tw (1) 程式原始碼一份:以 ${學號}.zip 方式壓縮與命名 (2) 報告一份:以 ${學號} 命名。
     70 * Please e-mail the java source code and report (doc or PDF) to jazz _AT_ nchc _DOT_ org _DOT_ tw
     71 * 提示:[[BR]]Hint:
     72  * 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, !IntWritable) 改成 (Text, Text)
     73  * Replace (Key,Value) pair from (Text, !IntWritable) to (Text, Text)
     74 * 加分題:(Extra)
     75  * 試將出現次數統計加入結果,亦即參考結果如下:[[BR]]Try to add count of each file in the result, i.e. The reference result should be as following:
     76{{{
     77and     input2(1)
     78cloud   input1(1),input2(1)
     79course  input1(1),input2(2)
     80enjoy   input2(1)
     81i       input1(1),input2(1)
     82like    input1(1),input2(1)
     83nctu    input1(1),input2(1)
     84this    input2(1)
     85we      input2(1)
     86}}}
     87 * 配分比例:
     88  * 標準題原始碼 Source Code:60%
     89  * 報告 Report :20%
     90    * 參考內容入下:Reference Items should be shown in your report
     91    * 封面 Cover : 姓名、學號 ( Your Name and ID )
     92    * 於 hadoop.nchc.org.tw 執行的擷圖(Screenshot of your program running on hadoop.nchc.org.tw)
     93    * 執行結果 The result of your program
     94  * 加分題:20%