Changes between Version 1 and Version 2 of 0428Hadoop_Lab2


Ignore:
Timestamp:
Apr 25, 2009, 11:26:36 PM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 0428Hadoop_Lab2

    v1 v2  
    107107 
    108108 
    109  == Content 2. Hadoop 運算命令 ==
    110  
    111  === 2.1 Hadoop運算命令 grep === 
    112109 
    113  * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計
    114  
    115 {{{
    116  /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar grep input grep_output 'dfs[a-z.]+'
    117  
    118 }}}
    119  
    120  運作的畫面如下:
    121  
    122 {{{
    123 
    124 09/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
    125 09/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
    126 09/03/24 12:33:45 INFO mapred.JobClient: Running job: job_200903232025_0003
    127 09/03/24 12:33:46 INFO mapred.JobClient:  map 0% reduce 0%
    128 09/03/24 12:33:47 INFO mapred.JobClient:  map 10% reduce 0%
    129 09/03/24 12:33:49 INFO mapred.JobClient:  map 20% reduce 0%
    130 09/03/24 12:33:51 INFO mapred.JobClient:  map 30% reduce 0%
    131 09/03/24 12:33:52 INFO mapred.JobClient:  map 40% reduce 0%
    132 09/03/24 12:33:54 INFO mapred.JobClient:  map 50% reduce 0%
    133 09/03/24 12:33:55 INFO mapred.JobClient:  map 60% reduce 0%
    134 09/03/24 12:33:57 INFO mapred.JobClient:  map 70% reduce 0%
    135 09/03/24 12:33:59 INFO mapred.JobClient:  map 80% reduce 0%
    136 09/03/24 12:34:00 INFO mapred.JobClient:  map 90% reduce 0%
    137 09/03/24 12:34:02 INFO mapred.JobClient:  map 100% reduce 0%
    138 09/03/24 12:34:10 INFO mapred.JobClient:  map 100% reduce 10%
    139 09/03/24 12:34:12 INFO mapred.JobClient:  map 100% reduce 13%
    140 09/03/24 12:34:15 INFO mapred.JobClient:  map 100% reduce 20%
    141 09/03/24 12:34:20 INFO mapred.JobClient:  map 100% reduce 23%
    142 09/03/24 12:34:22 INFO mapred.JobClient: Job complete: job_200903232025_0003
    143 09/03/24 12:34:22 INFO mapred.JobClient: Counters: 16
    144 09/03/24 12:34:22 INFO mapred.JobClient:   File Systems
    145 09/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes read=48245
    146 09/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes written=1907
    147 09/03/24 12:34:22 INFO mapred.JobClient:     Local bytes read=1549
    148 09/03/24 12:34:22 INFO mapred.JobClient:     Local bytes written=3584
    149 09/03/24 12:34:22 INFO mapred.JobClient:   Job Counters
    150 ......
    151 }}}
    152 
    153  
    154  * 接著查看結果
    155 
    156 {{{
    157   /opt/hadoop$ bin/hadoop fs -ls grep_output
    158   /opt/hadoop$ bin/hadoop fs -cat grep_output/part-00000
    159 }}}
    160 
    161  結果如下
    162 
    163 {{{
    164 3       dfs.class
    165 3       dfs.
    166 2       dfs.period
    167 1       dfs.http.address
    168 1       dfs.balance.bandwidth
    169 1       dfs.block.size
    170 1       dfs.blockreport.initial
    171 1       dfs.blockreport.interval
    172 1       dfs.client.block.write.retries
    173 1       dfs.client.buffer.dir
    174 1       dfs.data.dir
    175 1       dfs.datanode.address
    176 1       dfs.datanode.dns.interface
    177 1       dfs.datanode.dns.nameserver
    178 1       dfs.datanode.du.pct
    179 1       dfs.datanode.du.reserved
    180 1       dfs.datanode.handler.count
    181 1       dfs.datanode.http.address
    182 1       dfs.datanode.https.address
    183 1       dfs.datanode.ipc.address
    184 1       dfs.default.chunk.view.size
    185 1       dfs.df.interval
    186 1       dfs.file
    187 1       dfs.heartbeat.interval
    188 1       dfs.hosts
    189 1       dfs.hosts.exclude
    190 1       dfs.https.address
    191 1       dfs.impl
    192 1       dfs.max.objects
    193 1       dfs.name.dir
    194 1       dfs.namenode.decommission.interval
    195 1       dfs.namenode.decommission.interval.
    196 1       dfs.namenode.decommission.nodes.per.interval
    197 1       dfs.namenode.handler.count
    198 1       dfs.namenode.logging.level
    199 1       dfs.permissions
    200 1       dfs.permissions.supergroup
    201 1       dfs.replication
    202 1       dfs.replication.consider
    203 1       dfs.replication.interval
    204 1       dfs.replication.max
    205 1       dfs.replication.min
    206 1       dfs.replication.min.
    207 1       dfs.safemode.extension
    208 1       dfs.safemode.threshold.pct
    209 1       dfs.secondary.http.address
    210 1       dfs.servers
    211 1       dfs.web.ugi
    212 1       dfsmetrics.log
    213 
    214  }}}
    215 
    216  === 2.2 Hadoop運算命令 WordCount ===
    217  
    218  * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列
    219  
    220  {{{
    221  /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar wordcount input wc_output
    222  }}}
    223  
    224  檢查輸出結果的方法同2.1的方法
    225 
    226  === 2.3 更多運算命令 ===
    227  
    228  可執行的指令一覽表:
    229 
    230  || aggregatewordcount ||  An Aggregate based map/reduce program that counts the words in the input files. ||
    231  || aggregatewordhist || An Aggregate based map/reduce program that computes the histogram of the words in the input files. ||
    232  || grep ||  A map/reduce program that counts the matches of a regex in the input. ||
    233  || join || A job that effects a join over sorted, equally partitioned datasets ||
    234  || multifilewc ||  A job that counts words from several files. ||
    235  || pentomino  || A map/reduce tile laying program to find solutions to pentomino problems. ||
    236  || pi ||  A map/reduce program that estimates Pi using monte-carlo method. ||
    237  || randomtextwriter ||  A map/reduce program that writes 10GB of random textual data per node. ||
    238  || randomwriter || A map/reduce program that writes 10GB of random data per node. ||
    239  || sleep ||  A job that sleeps at each map and reduce task. ||
    240  || sort || A map/reduce program that sorts the data written by the random writer. ||
    241  || sudoku ||  A sudoku solver. ||
    242  || wordcount || A map/reduce program that counts the words in the input files. ||
    243 
    244  請參考 [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples]
    245 
    246 {{{
    247 #!html
    248 <html lang="zh-tw"><head>
    249 
    250 <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"><title>a.html</title>
    251 
    252 </head><body>
    253 <br>
    254 
    255 <p>
    256 </p><table summary="" border="1" cellpadding="3" cellspacing="0" width="100%">
    257 <tbody><tr class="TableHeadingColor" bgcolor="#ccccff">
    258 <th colspan="2" align="left"><font size="+2">
    259 <b>Class Summary</b></font></th>
    260 </tr>
    261 <tr class="TableRowColor" bgcolor="white">
    262 
    263 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordCount.html" title="class in org.apache.hadoop.examples">AggregateWordCount</a></b></td>
    264 <td>This is an example Aggregated Hadoop Map/Reduce application. It
    265 reads the text input files, breaks each line into words and counts
    266 them. The output is a locally sorted list of words and the count of how
    267 often they occurred. To run: bin/hadoop jar hadoop-*-examples.jar
    268 aggregatewordcount in-dir out-dir numOfReducers textinputformat </td>
    269 </tr>
    270 <tr class="TableRowColor" bgcolor="white">
    271 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordHistogram.html" title="class in org.apache.hadoop.examples">AggregateWordHistogram</a></b></td>
    272 <td>This is an example Aggregated Hadoop Map/Reduce application.
    273 Computes the histogram of the words in the input texts. To run:
    274 bin/hadoop jar hadoop-*-examples.jar aggregatewordhist in-dir out-dir
    275 numOfReducers textinputformat </td>
    276 </tr>
    277 <tr class="TableRowColor" bgcolor="white">
    278 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/ExampleDriver.html" title="class in org.apache.hadoop.examples">ExampleDriver</a></b></td>
    279 <td>A description of an example program based on its class and a human-readable description.</td>
    280 </tr>
    281 
    282 <tr class="TableRowColor" bgcolor="white">
    283 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Grep.html" title="class in org.apache.hadoop.examples">Grep</a></b></td>
    284 <td>&nbsp;</td>
    285 </tr>
    286 <tr class="TableRowColor" bgcolor="white">
    287 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Join.html" title="class in org.apache.hadoop.examples">Join</a></b></td>
    288 <td>This is the trivial map/reduce program that does absolutely nothing
    289 other than use the framework to fragment and sort the input values. To
    290 run: bin/hadoop jar build/hadoop-examples.jar join [-m maps] [-r
    291 reduces] [-inFormat input format class] [-outFormat output format
    292 class] [-outKey output key class] [-outValue output value class]
    293 [-joinOp] [in-dir]* in-dir out-dir</inner></td>
    294 </tr>
    295 <tr class="TableRowColor" bgcolor="white">
    296 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomTextWriter.html" title="class in org.apache.hadoop.examples">RandomTextWriter</a></b></td>
    297 <td>This program uses map/reduce to just run a distributed job where
    298 there is
    299 no interaction between the tasks and each task writes a large unsorted
    300 random sequence of words.To run: bin/hadoop jar
    301 hadoop-${version}-examples.jar randomtextwriter [-outFormat output
    302 format class] output</td>
    303 
    304 </tr>
    305 <tr class="TableRowColor" bgcolor="white">
    306 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomWriter.html" title="class in org.apache.hadoop.examples">RandomWriter</a></b></td>
    307 <td>This program uses map/reduce to just run a distributed job where
    308 there is
    309 no interaction between the tasks and each task write a large unsorted
    310 random binary sequence file of BytesWritable.To run: bin/hadoop jar
    311 hadoop-${version}-examples.jar randomwriter [-outFormat output format
    312 class] output</td>
    313 </tr>
    314 <tr class="TableRowColor" bgcolor="white">
    315 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Sort.html" title="class in org.apache.hadoop.examples">Sort&lt;K,V&gt;</a></b></td>
    316 <td>This is the trivial map/reduce program that does absolutely nothing
    317 other than use the framework to fragment and sort the input values.To
    318 run: bin/hadoop jar build/hadoop-examples.jar sort [-m maps] [-r
    319 reduces] [-inFormat input format class] [-outFormat output format
    320 class] [-outKey output key class] [-outValue output value class]
    321 [-totalOrder pcnt num samples max splits] in-dir out-dir</td>
    322 </tr>
    323 <tr class="TableRowColor" bgcolor="white">
    324 <td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/WordCount.html" title="class in org.apache.hadoop.examples">WordCount</a></b></td>
    325 
    326 <td>This is an example Hadoop Map/Reduce application.</td>
    327 </tr>
    328 </tbody></table>
    329 </body></html>
    330 }}}
    331  
    332 
    333  == Content 3. 使用網頁Gui瀏覽資訊 ==
     110 == Content 2. 使用網頁Gui瀏覽資訊 ==
    334111 
    335112 * [http://localhost:50030 Map/Reduce Administration]
    336113 * [http://localhost:50070 NameNode ]
    337114 
    338  == Content 4. 練習 ==
    339  
    340  * 刪除在 hdfs 內的一整個的資料夾 input
    341  * 用網頁秀出你在 wordcount練習的輸出結果[[BR]]
    342  [[Image(2009-03-24-135001_872x741_scrot.png)]]