Changes between Version 3 and Version 4 of 0428Hadoop_Lab5


Ignore:
Timestamp:
Apr 26, 2009, 12:56:47 AM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 0428Hadoop_Lab5

    v3 v4  
     1{{{
     2#!html
     3<div style="text-align: center;"><big
     4 style="font-weight: bold;"><big><big>實做五、執行 MapReduce 基本運算</big></big></big></div>
     5}}}
     6[[PageOutline]]
     7
     8 == 1 Hadoop運算命令 grep == 
     9 
     10 * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計
     11 
     12{{{
     13 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar grep input grep_output 'dfs[a-z.]+'
     14 
     15}}}
     16 
     17 運作的畫面如下:
     18 
     19{{{
     20
     2109/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     2209/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     2309/03/24 12:33:45 INFO mapred.JobClient: Running job: job_200903232025_0003
     2409/03/24 12:33:46 INFO mapred.JobClient:  map 0% reduce 0%
     2509/03/24 12:33:47 INFO mapred.JobClient:  map 10% reduce 0%
     2609/03/24 12:33:49 INFO mapred.JobClient:  map 20% reduce 0%
     2709/03/24 12:33:51 INFO mapred.JobClient:  map 30% reduce 0%
     2809/03/24 12:33:52 INFO mapred.JobClient:  map 40% reduce 0%
     2909/03/24 12:33:54 INFO mapred.JobClient:  map 50% reduce 0%
     3009/03/24 12:33:55 INFO mapred.JobClient:  map 60% reduce 0%
     3109/03/24 12:33:57 INFO mapred.JobClient:  map 70% reduce 0%
     3209/03/24 12:33:59 INFO mapred.JobClient:  map 80% reduce 0%
     3309/03/24 12:34:00 INFO mapred.JobClient:  map 90% reduce 0%
     3409/03/24 12:34:02 INFO mapred.JobClient:  map 100% reduce 0%
     3509/03/24 12:34:10 INFO mapred.JobClient:  map 100% reduce 10%
     3609/03/24 12:34:12 INFO mapred.JobClient:  map 100% reduce 13%
     3709/03/24 12:34:15 INFO mapred.JobClient:  map 100% reduce 20%
     3809/03/24 12:34:20 INFO mapred.JobClient:  map 100% reduce 23%
     3909/03/24 12:34:22 INFO mapred.JobClient: Job complete: job_200903232025_0003
     4009/03/24 12:34:22 INFO mapred.JobClient: Counters: 16
     4109/03/24 12:34:22 INFO mapred.JobClient:   File Systems
     4209/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes read=48245
     4309/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes written=1907
     4409/03/24 12:34:22 INFO mapred.JobClient:     Local bytes read=1549
     4509/03/24 12:34:22 INFO mapred.JobClient:     Local bytes written=3584
     4609/03/24 12:34:22 INFO mapred.JobClient:   Job Counters
     47......
     48}}}
     49
     50 
     51 * 接著查看結果
     52
     53{{{
     54  /opt/hadoop$ bin/hadoop fs -ls grep_output
     55  /opt/hadoop$ bin/hadoop fs -cat grep_output/part-00000
     56}}}
     57
     58 結果如下
     59
     60{{{
     613       dfs.class
     623       dfs.
     632       dfs.period
     641       dfs.http.address
     651       dfs.balance.bandwidth
     661       dfs.block.size
     671       dfs.blockreport.initial
     681       dfs.blockreport.interval
     691       dfs.client.block.write.retries
     701       dfs.client.buffer.dir
     711       dfs.data.dir
     721       dfs.datanode.address
     731       dfs.datanode.dns.interface
     741       dfs.datanode.dns.nameserver
     751       dfs.datanode.du.pct
     761       dfs.datanode.du.reserved
     771       dfs.datanode.handler.count
     781       dfs.datanode.http.address
     791       dfs.datanode.https.address
     801       dfs.datanode.ipc.address
     811       dfs.default.chunk.view.size
     821       dfs.df.interval
     831       dfs.file
     841       dfs.heartbeat.interval
     851       dfs.hosts
     861       dfs.hosts.exclude
     871       dfs.https.address
     881       dfs.impl
     891       dfs.max.objects
     901       dfs.name.dir
     911       dfs.namenode.decommission.interval
     921       dfs.namenode.decommission.interval.
     931       dfs.namenode.decommission.nodes.per.interval
     941       dfs.namenode.handler.count
     951       dfs.namenode.logging.level
     961       dfs.permissions
     971       dfs.permissions.supergroup
     981       dfs.replication
     991       dfs.replication.consider
     1001       dfs.replication.interval
     1011       dfs.replication.max
     1021       dfs.replication.min
     1031       dfs.replication.min.
     1041       dfs.safemode.extension
     1051       dfs.safemode.threshold.pct
     1061       dfs.secondary.http.address
     1071       dfs.servers
     1081       dfs.web.ugi
     1091       dfsmetrics.log
     110
     111 }}}
     112
     113 ==2 Hadoop運算命令 WordCount ==
     114 
     115 * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列
     116 
     117 {{{
     118 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar wordcount input wc_output
     119 }}}
     120 
     121 檢查輸出結果的方法同2.1的方法
     122
     123 === 2.3 更多運算命令 ===
     124 
     125 可執行的指令一覽表:
     126
     127 || aggregatewordcount ||  An Aggregate based map/reduce program that counts the words in the input files. ||
     128 || aggregatewordhist || An Aggregate based map/reduce program that computes the histogram of the words in the input files. ||
     129 || grep ||  A map/reduce program that counts the matches of a regex in the input. ||
     130 || join || A job that effects a join over sorted, equally partitioned datasets ||
     131 || multifilewc ||  A job that counts words from several files. ||
     132 || pentomino  || A map/reduce tile laying program to find solutions to pentomino problems. ||
     133 || pi ||  A map/reduce program that estimates Pi using monte-carlo method. ||
     134 || randomtextwriter ||  A map/reduce program that writes 10GB of random textual data per node. ||
     135 || randomwriter || A map/reduce program that writes 10GB of random data per node. ||
     136 || sleep ||  A job that sleeps at each map and reduce task. ||
     137 || sort || A map/reduce program that sorts the data written by the random writer. ||
     138 || sudoku ||  A sudoku solver. ||
     139 || wordcount || A map/reduce program that counts the words in the input files. ||
     140
     141 請參考 [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples]
     142
     143{{{
     144#!html
     145<html lang="zh-tw"><head>
     146
     147<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"><title>a.html</title>
     148
     149</head><body>
     150<br>
     151
     152<p>
     153</p><table summary="" border="1" cellpadding="3" cellspacing="0" width="100%">
     154<tbody><tr class="TableHeadingColor" bgcolor="#ccccff">
     155<th colspan="2" align="left"><font size="+2">
     156<b>Class Summary</b></font></th>
     157</tr>
     158<tr class="TableRowColor" bgcolor="white">
     159
     160<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordCount.html" title="class in org.apache.hadoop.examples">AggregateWordCount</a></b></td>
     161<td>This is an example Aggregated Hadoop Map/Reduce application. It
     162reads the text input files, breaks each line into words and counts
     163them. The output is a locally sorted list of words and the count of how
     164often they occurred. To run: bin/hadoop jar hadoop-*-examples.jar
     165aggregatewordcount in-dir out-dir numOfReducers textinputformat </td>
     166</tr>
     167<tr class="TableRowColor" bgcolor="white">
     168<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordHistogram.html" title="class in org.apache.hadoop.examples">AggregateWordHistogram</a></b></td>
     169<td>This is an example Aggregated Hadoop Map/Reduce application.
     170Computes the histogram of the words in the input texts. To run:
     171bin/hadoop jar hadoop-*-examples.jar aggregatewordhist in-dir out-dir
     172numOfReducers textinputformat </td>
     173</tr>
     174<tr class="TableRowColor" bgcolor="white">
     175<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/ExampleDriver.html" title="class in org.apache.hadoop.examples">ExampleDriver</a></b></td>
     176<td>A description of an example program based on its class and a human-readable description.</td>
     177</tr>
     178
     179<tr class="TableRowColor" bgcolor="white">
     180<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Grep.html" title="class in org.apache.hadoop.examples">Grep</a></b></td>
     181<td>&nbsp;</td>
     182</tr>
     183<tr class="TableRowColor" bgcolor="white">
     184<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Join.html" title="class in org.apache.hadoop.examples">Join</a></b></td>
     185<td>This is the trivial map/reduce program that does absolutely nothing
     186other than use the framework to fragment and sort the input values. To
     187run: bin/hadoop jar build/hadoop-examples.jar join [-m maps] [-r
     188reduces] [-inFormat input format class] [-outFormat output format
     189class] [-outKey output key class] [-outValue output value class]
     190[-joinOp] [in-dir]* in-dir out-dir</inner></td>
     191</tr>
     192<tr class="TableRowColor" bgcolor="white">
     193<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomTextWriter.html" title="class in org.apache.hadoop.examples">RandomTextWriter</a></b></td>
     194<td>This program uses map/reduce to just run a distributed job where
     195there is
     196no interaction between the tasks and each task writes a large unsorted
     197random sequence of words.To run: bin/hadoop jar
     198hadoop-${version}-examples.jar randomtextwriter [-outFormat output
     199format class] output</td>
     200
     201</tr>
     202<tr class="TableRowColor" bgcolor="white">
     203<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomWriter.html" title="class in org.apache.hadoop.examples">RandomWriter</a></b></td>
     204<td>This program uses map/reduce to just run a distributed job where
     205there is
     206no interaction between the tasks and each task write a large unsorted
     207random binary sequence file of BytesWritable.To run: bin/hadoop jar
     208hadoop-${version}-examples.jar randomwriter [-outFormat output format
     209class] output</td>
     210</tr>
     211<tr class="TableRowColor" bgcolor="white">
     212<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Sort.html" title="class in org.apache.hadoop.examples">Sort&lt;K,V&gt;</a></b></td>
     213<td>This is the trivial map/reduce program that does absolutely nothing
     214other than use the framework to fragment and sort the input values.To
     215run: bin/hadoop jar build/hadoop-examples.jar sort [-m maps] [-r
     216reduces] [-inFormat input format class] [-outFormat output format
     217class] [-outKey output key class] [-outValue output value class]
     218[-totalOrder pcnt num samples max splits] in-dir out-dir</td>
     219</tr>
     220<tr class="TableRowColor" bgcolor="white">
     221<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/WordCount.html" title="class in org.apache.hadoop.examples">WordCount</a></b></td>
     222
     223<td>This is an example Hadoop Map/Reduce application.</td>
     224</tr>
     225</tbody></table>
     226</body></html>
     227}}}
     228 
     229
     230 == 3. 使用網頁Gui瀏覽資訊 ==
     231 
     232 * [http://localhost:50030 Map/Reduce Administration]
     233 * [http://localhost:50070 NameNode ]