Version 1 (modified by jazz, 12 years ago) (diff) |
---|
◢ <實作十九> | <回課程大綱> ▲ | <實作二十一> ◣
實作二十 Lab20
預設的輸入格式
TextInputFormat
請先連線至 nodeN.3du.me , N 為您的報名編號
- 為了觀察 FileInputFormat? 的行為,我們使用 update jar 的技巧,對 TextInputFormat?.java 做了小幅度的修改。
- 官方實作的 TextInputFormat?.java 有兩個(一個新版,一個舊版)
user@node1:~$ find /home/user/hadoop/src/ -name "TextInputFormat.java" /home/user/hadoop/src/mapred/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.java /home/user/hadoop/src/mapred/org/apache/hadoop/mapred/TextInputFormat.java
- 這裡我們修改新版的 TextInputFormat?
--- /home/user/hadoop/src/mapred/org/apache/hadoop/mapreduce/lib/input/TextInp utFormat.java 2012-10-03 13:17:16.000000000 +0800 +++ /home/user/hadoop_labs/lab011/src/TextInputFormat.java 2013-10-19 11: 25:16.419320587 +0800 @@ -38,11 +38,13 @@ public RecordReader<LongWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) { + System.err.println("TextInputFormat.createRecordReader()"); return new LineRecordReader(); } @Override @Override protected boolean isSplitable(JobContext context, Path file) { + System.err.println("TextInputFormat.isSplitable(context," + file.toString() + ")"); CompressionCodec codec = new CompressionCodecFactory(context.getConfiguration()).getCodec(file); return codec == null;
unset HADOOP_CONF_DIR cd ~/hadoop_labs/lab011 ant cd ~/hadoop_labs/lab010 mkdir -p my_input echo "A B C D" > my_input/input1 echo "C D A B" > my_input/input2 hadoop fs -put my_input my_input sed -i 's#setNumReduceTasks(0)#setNumReduceTasks(1)#g' ~/hadoop_labs/lab010/src/WordCount.java hadoop jar WordCount.jar my_input my_output
-
export HADOOP_CONF_DIR=~/hadoop/conf.local/ hadoop jar WordCount.jar my_input my_output unset HADOOP_CONF_DIR
實作習題
<問題 1> 請問執行任務時,同時有幾個 Mapper?
<問題 2> 請問執行任務後,最後產生幾個輸出檔?