| 1 | = 2010-08-04 = |
| 2 | |
| 3 | == Hadoop : Steaming == |
| 4 | |
| 5 | * [範例] [http://www.mail-archive.com/common-user@hadoop.apache.org/msg00422.html 使用 gzip 當作輸入格式] |
| 6 | * 目前只支援三種壓縮格式,詳[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html org.apache.hadoop.io.compress.CompressionCodec] |
| 7 | {{{ |
| 8 | #!sh |
| 9 | hadoop dfs -rmr $4 |
| 10 | hadoop jar /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar |
| 11 | -mapper $1 -reducer $2 -input $3/* -output |
| 12 | $4 -file $1 -file $2 -jobconf mapred.job.name="$5" -jobconf |
| 13 | stream.recordreader.compression=gzip \ |
| 14 | -jobconf mapred.output.compress=true \ |
| 15 | -jobconf |
| 16 | mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec |
| 17 | }}} |