wiki:waue/2011/0426_1

Version 4 (modified by waue, 13 years ago) (diff)

--

HBase 進階課程
1. Streaming

回課程大綱 << Stream 範例

Hadoop Streaming with commands

  • 範例一:使用 cat 當 mapper,使用 wc 當 reducer
    $ cd /opt/hadoop
    $ bin/start-all.sh
    $ bin/hadoop fs -put conf input
    $ bin/hadoop jar ./contrib/streaming/hadoop-0.20.2-streaming.jar -input input -output output -mapper /bin/cat -reducer /usr/bin/wc
    
  • 範例二:使用 Bash Shell Script 當 Mapper 與 Reducer
    $ echo "sed -e \"s/ /\n/g\" | grep ." > streamingMapper.sh
    $ echo "uniq -c | awk '{print \$2 \"\t\" \$1}'" > streamingReducer.sh
    $ chmod 755 streaming*.sh
    $ bin/hadoop fs -rmr input output
    $ bin/hadoop fs -put conf input
    $ bin/hadoop jar ./contrib/streaming/hadoop-0.20.2-streaming.jar -input input -output output -mapper streamingMapper.sh -reducer streamingReducer.sh -file streamingMapper.sh -file streamingReducer.sh
    
  • 結果
    $ bin/hadoop dfs -cat output/part-00000
    restriction,	1
    rights	1
    sell	1
    shall	2
    so,	1
    software	3
    source	3
    subject	1
    sublicense,	1
    substantial	1
    the	10
    this	4
    to	7
    use,	1
    used	1
    whom	1
    without	2