wiki:Hadoop_Lab5_2
hadoop 程式開發 (eclipse plugin) (進階)

個別編譯程式

1 mapper.java

  1. new

File -> new -> mapper


  1. create

source folder-> 輸入: icas/src
Package : Sample
Name -> : mapper

  1. modify

package Sample;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class mapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      String line = value.toString();
      StringTokenizer tokenizer = new StringTokenizer(line);
      while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        output.collect(word, one);
      }
    }
  }

建立mapper.java後,貼入程式碼


2 reducer.java

  1. new
  • File -> new -> reducer


  1. create

source folder-> 輸入: icas/src
Package : Sample
Name -> : reducer

  1. modify

package Sample;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class reducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
        sum += values.next().get();
      }
      output.collect(key, new IntWritable(sum));
    }
  }


3 WordCount.java (main function)

  1. new

建立WordCount.java,此檔用來驅動mapper 與 reducer,因此選擇 Map/Reduce? Driver


  1. create
source folder-> 輸入: icas/src
Package : Sample
Name -> : WordCount.java

  1. modify
package Sample;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

   public static void main(String[] args) throws Exception {
     JobConf conf = new JobConf(WordCount.class);
     conf.setJobName("wordcount");

     conf.setOutputKeyClass(Text.class);
     conf.setOutputValueClass(IntWritable.class);

     conf.setMapperClass(mapper.class);
     conf.setCombinerClass(reducer.class);
     conf.setReducerClass(reducer.class);

     conf.setInputFormat(TextInputFormat.class);
     conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path("/user/hadooper/input"));
    FileOutputFormat.setOutputPath(conf, new Path("lab5_out2"));

     JobClient.runJob(conf);
   }
}

三個檔完成後並存檔後,整個程式建立完成


  • 三個檔都存檔後,可以看到icas專案下的src,bin都有檔案產生,我們用指令來check

$ cd workspace/icas
$ ls src/Sample/
mapper.java  reducer.java  WordCount.java
$ ls bin/Sample/
mapper.class  reducer.class  WordCount.class

eclipse 可以產生出jar檔

File -> Export -> java -> JAR file
-> next ->


選擇要匯出的專案 -> jarfile: /home/hadooper/mytest.jar ->
next ->


next ->


main class: 選擇有Main的class ->
Finish


  • 以上的步驟就可以在/home/hadooper/ 產生出你的 mytest.jar

用Makefile 來更快速編譯

  • 程式常常修改,每次都做這些動作也很累很煩,讓我們來體驗一下用指令比用圖形介面操作還方便

1 產生Makefile 檔

$ cd /home/hadooper/workspace/icas/
$ gedit Makefile
  • 輸入以下Makefile的內容 (注意 ":" 後面要接 "tab" 而不是 "空白")
    JarFile="sample-0.1.jar"
    MainFunc="Sample.WordCount"
    LocalOutDir="/tmp/output"
    HADOOP_BIN="/opt/hadoop/bin"
    
    all:jar run output clean
    
    jar:
    	jar -cvf ${JarFile} -C bin/ .
    
    run:
    	${HADOOP_BIN}/hadoop jar ${JarFile} ${MainFunc} input output
    
    clean:
    	${HADOOP_BIN}/hadoop fs -rmr output
    
    output:
    	rm -rf ${LocalOutDir}
    	${HADOOP_BIN}/hadoop fs -get output ${LocalOutDir}
    	gedit ${LocalOutDir}/part-r-00000 & 
    
    help:
    	@echo "Usage:"
    	@echo " make jar     - Build Jar File."
    	@echo " make clean   - Clean up Output directory on HDFS."
    	@echo " make run     - Run your MapReduce code on Hadoop."
    	@echo " make output  - Download and show output file"
    	@echo " make help    - Show Makefile options."
    	@echo " "
    	@echo "Example:"
    	@echo " make jar; make run; make output; make clean"
    
  • 或是直接下載 Makefile
    $ cd /home/hadooper/workspace/icas/
    $ wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/Hadoop_Lab5/Makefile
    

2 執行

  • 執行Makefile,可以到該目錄下,執行make [參數],若不知道參數為何,可以打make 或 make help
  • make 的用法說明
$ cd /home/hadooper/workspace/icas/
$ make
Usage:
 make jar     - Build Jar File.
 make clean   - Clean up Output directory on HDFS.
 make run     - Run your MapReduce code on Hadoop.
 make output  - Download and show output file
 make help    - Show Makefile options.
 
Example:
 make jar; make run; make output; make clean
  • 下面提供各種make 的參數

make jar

  • 1. 編譯產生jar檔

$ make jar

make run

  • 2. 跑我們的wordcount 於hadoop上
$ make run
  • make run基本上能正確無誤的運作到結束,因此代表我們在eclipse編譯的程式可以順利在hadoop0.18.3的平台上運行。
  • 而回到eclipse視窗,我們可以看到下方視窗run完的job會呈現出來;左方視窗也多出output資料夾,part-r-00000就是我們的結果檔


  • 因為有設定完整的javadoc, 因此可以得到詳細的解說與輔助

make output

  • 3. 這個指令是幫助使用者將結果檔從hdfs下載到local端,並且用gedit來開啟你的結果檔
$ make output

make clean

  • 4. 這個指令用來把hdfs上的output資料夾清除。如果你還想要在跑一次make run,請先執行make clean,否則hadoop會告訴你,output資料夾已經存在,而拒絕工作喔!

$ make clean

練習:匯入專案

Last modified 15 years ago Last modified on Sep 15, 2009, 4:45:41 PM