wiki:NTUT160220/Lab6

◢ <實作五> | <回課程大綱> ▲ | <實作七> ◣

實作六 Lab 6

在完全分散模式執行 MapReduce 基本運算
Running MapReduce in Full Distributed Mode by Examples
以下練習,請連線至 hdp01.3du.me 操作。底下的 userXX 等於您的用戶名稱。
以下練習,請連線至 hdp02.3du.me 操作。底下的 userXX 等於您的用戶名稱。
以下練習,請連線至 hdp03.3du.me 操作。底下的 userXX 等於您的用戶名稱。
以下練習,請連線至 hdp04.3du.me 操作。底下的 userXX 等於您的用戶名稱。

Sample 1 : WordCount

  • 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列
    WordCount example will count each word shown in documents and sorting from a to z.
    ~$ hadoop fs -put /opt/hadoop/conf lab5_input
    ~$ hadoop fs -rmr lab5_out2        ## 這行是預防上次有執行過這個範例的防呆步驟
    ~$ hadoop jar hadoop-examples.jar wordcount lab5_input lab5_out2
    
  • 檢查輸出結果的方法同之前方法
    Let's check the computed result of wordcount from HDFS :
    $ hadoop fs -ls lab5_out2
    $ hadoop fs -cat lab5_out2/part-r-00000 
    
  • 結果如下
    You should see results like this:
    "".	4
    "*"	9
    "127.0.0.1"	3
    "AS	2
    "License");	2
    "_logs/history/"	1
    "alice,bob	9
    
    ( ... skip ... )
    

Sample 2: grep

  • grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計
    grep is a command to extract specific characters in documents. In hadoop examples, you can use this command to extract strings match the regular expression and count for matched strings.
    $ hadoop fs -ls lab5_input
    $ hadoop jar hadoop-examples.jar grep lab5_input lab5_out3 'dfs[a-z.]+' 
    
  • 運作的畫面如下:
    You should see procedure like this:
    11/04/19 10:00:20 INFO mapred.FileInputFormat: Total input paths to process : 25
    11/04/19 10:00:20 INFO mapred.JobClient: Running job: job_201104120101_0645
    11/04/19 10:00:21 INFO mapred.JobClient:  map 0% reduce 0%
    ( ... skip ... )
    
  • 接著查看結果
    Let's check the computed result of grep from HDFS :
    $ hadoop fs -ls lab5_out3
    Found 2 items
    drwx------   - hXXXX supergroup          0 2011-04-19 10:00 /user/hXXXX/lab5_out1/_logs
    -rw-r--r--   2 hXXXX supergroup       1146 2011-04-19 10:00 /user/hXXXX/lab5_out1/part-00000
    $ hadoop fs -cat lab5_out3/part-00000 
    
  • 結果如下
    You should see results like this:
    4	dfs.permissions
    4	dfs.replication
    4	dfs.name.dir
    3	dfs.namenode.decommission.interval.
    3	dfs.namenode.decommission.nodes.per.interval
    3	dfs.
    ( ... skip ... )
    

More Examples

可執行的指令一覽表:
Here is a list of hadoop examples :

aggregatewordcount An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist An Aggregate based map/reduce program that computes the histogram of the words in the input files.
grep A map/reduce program that counts the matches of a regex in the input.
join A job that effects a join over sorted, equally partitioned datasets
multifilewc A job that counts words from several files.
pentomino A map/reduce tile laying program to find solutions to pentomino problems.
pi A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter A map/reduce program that writes 10GB of random textual data per node.
randomwriter A map/reduce program that writes 10GB of random data per node.
sleep A job that sleeps at each map and reduce task.
sort A map/reduce program that sorts the data written by the random writer.
sudoku A sudoku solver.
wordcount A map/reduce program that counts the words in the input files.

You could find more detail at org.apache.hadoop.examples

Last modified 9 years ago Last modified on Feb 19, 2016, 9:58:04 PM