Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Version 1 and Version 2 of Hadoop_Lab5_1

Timestamp:: Sep 15, 2009, 4:47:51 PM (16 years ago)
Author:: waue
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

Hadoop_Lab5_1

-                      v1
+                      v2
  * 在這我們編輯一個範例程式 : WordCount
- == 3.1 mapper.java ==
-. new
- || File ->  || new ->  || mapper ||
-[[Image(wiki:waue/2009/0617:file-new-mapper.png)]]
------------
-. create
-[[Image(wiki:waue/2009/0617:3-1.png)]]
-{{{
-#!sh
-source folder-> 輸入： icas/src
-Package : Sample
-Name -> : mapper
-}}}
-----------
-. modify
-{{{
-#!java
-package Sample;
-import java.io.IOException;
-import java.util.StringTokenizer;
-import org.apache.hadoop.io.IntWritable;
-import org.apache.hadoop.io.LongWritable;
-import org.apache.hadoop.io.Text;
-import org.apache.hadoop.mapred.MapReduceBase;
-import org.apache.hadoop.mapred.Mapper;
-import org.apache.hadoop.mapred.OutputCollector;
-import org.apache.hadoop.mapred.Reporter;
-public class mapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
-    private final static IntWritable one = new IntWritable(1);
-    private Text word = new Text();
-    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
-      String line = value.toString();
-      StringTokenizer tokenizer = new StringTokenizer(line);
-      while (tokenizer.hasMoreTokens()) {
-        word.set(tokenizer.nextToken());
-        output.collect(word, one);
+      }
+    }
+  }
-}}}
-建立mapper.java後，貼入程式碼
-[[Image(wiki:waue/2009/0617:3-2.png)]]
-------------
-== 3.2 reducer.java ==
-. new
- * File -> new -> reducer
-[[Image(wiki:waue/2009/0617:file-new-reducer.png)]]
--------
-. create
-[[Image(wiki:waue/2009/0617:3-3.png)]]
-{{{
-#!sh
-source folder-> 輸入： icas/src
-Package : Sample
-Name -> : reducer
-}}}
------------
-. modify
-{{{
-#!java
-package Sample;
-import java.io.IOException;
-import java.util.Iterator;
-import org.apache.hadoop.io.IntWritable;
-import org.apache.hadoop.io.Text;
-import org.apache.hadoop.mapred.MapReduceBase;
-import org.apache.hadoop.mapred.OutputCollector;
-import org.apache.hadoop.mapred.Reducer;
-import org.apache.hadoop.mapred.Reporter;
-public class reducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
-    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
-      int sum = 0;
-      while (values.hasNext()) {
-        sum += values.next().get();
+      }
-      output.collect(key, new IntWritable(sum));
+    }
+  }
-}}}
- * File -> new -> Map/Reduce Driver
-[[Image(wiki:waue/2009/0617:file-new-mr-driver.png)]]
-----------
-== 3.3 WordCount.java (main function) ==
-. new
-建立WordCount.java，此檔用來驅動mapper 與 reducer，因此選擇 Map/Reduce Driver
-[[Image(wiki:waue/2009/0617:3-4.png)]]
-------------
-. create
-{{{
-#!sh
-source folder-> 輸入： icas/src
-Package : Sample
-Name -> : WordCount.java
-}}}
--------
-. modify
-{{{
-#!java
-package Sample;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.io.IntWritable;
-import org.apache.hadoop.io.Text;
-import org.apache.hadoop.mapred.FileInputFormat;
-import org.apache.hadoop.mapred.FileOutputFormat;
-import org.apache.hadoop.mapred.JobClient;
-import org.apache.hadoop.mapred.JobConf;
-import org.apache.hadoop.mapred.TextInputFormat;
-import org.apache.hadoop.mapred.TextOutputFormat;
-public class WordCount {
-   public static void main(String[] args) throws Exception {
-     JobConf conf = new JobConf(WordCount.class);
-     conf.setJobName("wordcount");
-     conf.setOutputKeyClass(Text.class);
-     conf.setOutputValueClass(IntWritable.class);
-     conf.setMapperClass(mapper.class);
-     conf.setCombinerClass(reducer.class);
-     conf.setReducerClass(reducer.class);
-     conf.setInputFormat(TextInputFormat.class);
-     conf.setOutputFormat(TextOutputFormat.class);
-    FileInputFormat.setInputPaths(conf, new Path("/user/hadooper/input"));
-    FileOutputFormat.setOutputPath(conf, new Path("lab5_out2"));
-     JobClient.runJob(conf);
+   }
+}
-}}}
-三個檔完成後並存檔後，整個程式建立完成
-[[Image(wiki:waue/2009/0617:3-5.png)]]
--------
- * 三個檔都存檔後，可以看到icas專案下的src，bin都有檔案產生，我們用指令來check
-{{{
-$ cd workspace/icas
-$ ls src/Sample/
-mapper.java  reducer.java  WordCount.java
-$ ls bin/Sample/
-mapper.class  reducer.class  WordCount.class
-}}}
  = 四、測試範例程式 =
-在此提供兩種方法來run我們從eclipse 上編譯出的code。
-方法一是直接在eclipse上用圖形介面操作，參閱 4.1  在eclipse上操作
-方法二是產生jar檔後搭配自動編譯程式Makefile，參閱4.2
- == 4.1 法一：在eclipse上操作 ==
  * 右鍵點選專案資料夾：icas -> run as -> run on Hadoop
 …
+= 五、結論 =
+ * 搭配eclipse ，我們可以更有效率的開發hadoop
+ * hadoop 0.20 與之前的版本api以及設定都有些改變，可以看 [wiki:waue/2009/0617 hadoop 0.20 coding (eclipse )]
+ * 有更多的時間請見 [http://trac.nchc.org.tw/cloud/wiki/Hadoop_Lab5_2 進階版]