Changes between Initial Version and Version 1 of NCTU110329/Lab5


Ignore:
Timestamp:
Apr 12, 2011, 11:45:12 AM (14 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • NCTU110329/Lab5

    v1 v1  
     1[[PageOutline]]
     2
     3◢ <[wiki:NCTU110329/Lab4 實作四]> | <[wiki:NCTU110329 回課程大綱]> ▲ | <[wiki:NCTU110329/Lab6 實作六]> ◣
     4
     5= 實作五 =
     6
     7{{{
     8#!html
     9<div style="text-align: center;"><big style="font-weight: bold;"><big>執行 MapReduce 基本運算<br/>Running MapReduce by Examples</big></big></div>
     10}}}
     11
     12= Sample 1: grep = 
     13 
     14 * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計[[BR]]grep is a command to extract specific characters in documents. In hadoop examples, you can use this command to extract strings match the regular expression and count for matched strings.
     15 
     16{{{
     17$ cd /opt/hadoop
     18$ bin/hadoop fs -put conf lab3_input
     19$ bin/hadoop fs -ls lab3_input
     20$ bin/hadoop jar hadoop-*-examples.jar grep lab3_input lab3_out1 'dfs[a-z.]+'
     21 
     22}}}
     23 
     24 運作的畫面如下:[[BR]]You should see procedure like this:
     25 
     26{{{
     27
     2809/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     2909/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     3009/03/24 12:33:45 INFO mapred.JobClient: Running job: job_200903232025_0003
     3109/03/24 12:33:46 INFO mapred.JobClient:  map 0% reduce 0%
     3209/03/24 12:33:47 INFO mapred.JobClient:  map 10% reduce 0%
     3309/03/24 12:33:49 INFO mapred.JobClient:  map 20% reduce 0%
     3409/03/24 12:33:51 INFO mapred.JobClient:  map 30% reduce 0%
     3509/03/24 12:33:52 INFO mapred.JobClient:  map 40% reduce 0%
     3609/03/24 12:33:54 INFO mapred.JobClient:  map 50% reduce 0%
     3709/03/24 12:33:55 INFO mapred.JobClient:  map 60% reduce 0%
     3809/03/24 12:33:57 INFO mapred.JobClient:  map 70% reduce 0%
     3909/03/24 12:33:59 INFO mapred.JobClient:  map 80% reduce 0%
     4009/03/24 12:34:00 INFO mapred.JobClient:  map 90% reduce 0%
     4109/03/24 12:34:02 INFO mapred.JobClient:  map 100% reduce 0%
     4209/03/24 12:34:10 INFO mapred.JobClient:  map 100% reduce 10%
     4309/03/24 12:34:12 INFO mapred.JobClient:  map 100% reduce 13%
     4409/03/24 12:34:15 INFO mapred.JobClient:  map 100% reduce 20%
     4509/03/24 12:34:20 INFO mapred.JobClient:  map 100% reduce 23%
     4609/03/24 12:34:22 INFO mapred.JobClient: Job complete: job_200903232025_0003
     4709/03/24 12:34:22 INFO mapred.JobClient: Counters: 16
     4809/03/24 12:34:22 INFO mapred.JobClient:   File Systems
     4909/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes read=48245
     5009/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes written=1907
     5109/03/24 12:34:22 INFO mapred.JobClient:     Local bytes read=1549
     5209/03/24 12:34:22 INFO mapred.JobClient:     Local bytes written=3584
     5309/03/24 12:34:22 INFO mapred.JobClient:   Job Counters
     54......
     55}}}
     56
     57 
     58 * 接著查看結果[[BR]]Let's check the computed result of '''grep''' from HDFS :
     59
     60{{{
     61$ bin/hadoop fs -ls lab3_out1
     62$ bin/hadoop fs -cat lab3_out1/part-00000
     63}}}
     64
     65 結果如下[[BR]]You should see results like this:
     66
     67{{{
     683       dfs.class
     693       dfs.
     702       dfs.period
     711       dfs.http.address
     721       dfs.balance.bandwidth
     731       dfs.block.size
     741       dfs.blockreport.initial
     751       dfs.blockreport.interval
     761       dfs.client.block.write.retries
     771       dfs.client.buffer.dir
     781       dfs.data.dir
     791       dfs.datanode.address
     801       dfs.datanode.dns.interface
     811       dfs.datanode.dns.nameserver
     821       dfs.datanode.du.pct
     831       dfs.datanode.du.reserved
     841       dfs.datanode.handler.count
     851       dfs.datanode.http.address
     861       dfs.datanode.https.address
     871       dfs.datanode.ipc.address
     881       dfs.default.chunk.view.size
     891       dfs.df.interval
     901       dfs.file
     911       dfs.heartbeat.interval
     921       dfs.hosts
     931       dfs.hosts.exclude
     941       dfs.https.address
     951       dfs.impl
     961       dfs.max.objects
     971       dfs.name.dir
     981       dfs.namenode.decommission.interval
     991       dfs.namenode.decommission.interval.
     1001       dfs.namenode.decommission.nodes.per.interval
     1011       dfs.namenode.handler.count
     1021       dfs.namenode.logging.level
     1031       dfs.permissions
     1041       dfs.permissions.supergroup
     1051       dfs.replication
     1061       dfs.replication.consider
     1071       dfs.replication.interval
     1081       dfs.replication.max
     1091       dfs.replication.min
     1101       dfs.replication.min.
     1111       dfs.safemode.extension
     1121       dfs.safemode.threshold.pct
     1131       dfs.secondary.http.address
     1141       dfs.servers
     1151       dfs.web.ugi
     1161       dfsmetrics.log
     117
     118 }}}
     119
     120= Sample 2 : WordCount =
     121 
     122 * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列[[BR]]WordCount example will count each word shown in documents and sorting from a to z.
     123 
     124 {{{
     125 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar wordcount lab3_input lab3_out2
     126 }}}
     127 
     128 檢查輸出結果的方法同之前方法[[BR]]Let's check the computed result of '''wordcount''' from HDFS :
     129
     130{{{
     131$ bin/hadoop fs -ls lab3_out2
     132$ bin/hadoop fs -cat lab3_out2/part-r-00000
     133}}}
     134
     135= Browsing MapReduce and HDFS via Web GUI =
     136 
     137 * [http://localhost:50030 JobTracker Web Interface]
     138
     139 * [http://localhost:50070 NameNode Web Interface]
     140
     141= More Examples =
     142 
     143 可執行的指令一覽表:[[BR]]Here is a list of hadoop examples :
     144
     145 || aggregatewordcount ||  An Aggregate based map/reduce program that counts the words in the input files. ||
     146 || aggregatewordhist || An Aggregate based map/reduce program that computes the histogram of the words in the input files. ||
     147 || grep ||  A map/reduce program that counts the matches of a regex in the input. ||
     148 || join || A job that effects a join over sorted, equally partitioned datasets ||
     149 || multifilewc ||  A job that counts words from several files. ||
     150 || pentomino  || A map/reduce tile laying program to find solutions to pentomino problems. ||
     151 || pi ||  A map/reduce program that estimates Pi using monte-carlo method. ||
     152 || randomtextwriter ||  A map/reduce program that writes 10GB of random textual data per node. ||
     153 || randomwriter || A map/reduce program that writes 10GB of random data per node. ||
     154 || sleep ||  A job that sleeps at each map and reduce task. ||
     155 || sort || A map/reduce program that sorts the data written by the random writer. ||
     156 || sudoku ||  A sudoku solver. ||
     157 || wordcount || A map/reduce program that counts the words in the input files. ||
     158
     159You could find more detail at [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples]