wiki:Hinet120702/Lab9

◢ <實作八> | <回課程大綱> ▲ | <實作十> ◣

實作九 Lab 9

Hadoop Streaming 搭配不同程式語言練習
Hadoop Streaming in different Language
以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。

搭配現存二進位執行檔

Existing Binary

~$ hadoop fs -put /etc/hadoop/conf lab9_input
~$ hadoop jar hadoop-streaming.jar -input lab9_input -output lab9_out1 -mapper /bin/cat -reducer /usr/bin/wc
~$ hadoop fs -cat lab9_out1/part-00000

搭配 Bash Shell Script

~$ echo "sed -e \"s/ /\n/g\" | grep ." > streamingMapper.sh
~$ echo "uniq -c | awk '{print \$2 \"\t\" \$1}'" > streamingReducer.sh
~$ chmod a+x streamingMapper.sh
~$ chmod a+x streamingReducer.sh
~$ hadoop jar hadoop-streaming.jar -input lab9_input -output lab9_out2 -mapper streamingMapper.sh -reducer streamingReducer.sh -file streamingMapper.sh -file streamingReducer.sh
~$ hadoop fs -cat lab9_out2/part-00000

搭配 PHP Script

  • 編輯 mapper 的 php 程式
    ~$ cat > mapper.php << EOF
    #!/usr/bin/php
    <?php
    
    \$word2count = array();
    
    // 標準輸入為 STDIN (standard input)
    while ((\$line = fgets(STDIN)) !== false) {
       // 移除小寫與空白
       \$line = strtolower(trim(\$line));
       // 將行拆解成各個字於 words 陣列中
       \$words = preg_split('/\W/', \$line, 0, PREG_SPLIT_NO_EMPTY);
       // 將字 +1
       foreach (\$words as \$word) {
           \$word2count[\$word] += 1;
       }
    }
    
    // 將結果寫到 STDOUT (standard output)
    foreach (\$word2count as \$word => \$count) {
       // 印出 [字 , "tab符號" ,  "數字" , "結束字元"]
       echo \$word, chr(9), \$count, PHP_EOL;
    }
    ?>
    EOF
    
  • 編輯 reduce 的 php 程式
    ~$ cat > reducer.php << EOF
    #!/usr/bin/php
    <?php
    
    \$word2count = array();
    
    // 輸入為 STDIN
    while ((\$line = fgets(STDIN)) !== false) {
        // 移除多餘的空白
        \$line = trim(\$line);
        // 每一行的格式為 (單字 "tab" 數字) ,紀錄到(\$word, \$count)
        list(\$word, \$count) = explode(chr(9), \$line);
        // 轉換格式string -> int
        \$count = intval(\$count);
        // 加總
        if (\$count > 0) \$word2count[\$word] += \$count;
    }
    
    // 此行不必要,但可讓output排列更完整
    ksort(\$word2count);
    
    // 將結果寫到 STDOUT (standard output)
    foreach (\$word2count as \$word => \$count) {
        echo \$word, chr(9), \$count, PHP_EOL;
    }
    ?>
    EOF
    
  • 修改執行權限
    ~$ chmod a+x *.php
    
  • 測試是否能運作
    ~$ echo "i love hadoop, hadoop love u" | ./mapper.php | ./reducer.php
    
  • 開始執行
    ~$ hadoop jar hadoop-streaming.jar -mapper mapper.php -reducer reducer.php -input lab9_input -output lab9_out3 -file mapper.php -file reducer.php
    
  • 檢查結果
    ~$ hadoop fs -cat lab9_out3/part-00000
    
Last modified 13 years ago Last modified on Jul 2, 2012, 1:42:39 AM