Changes between Version 1 and Version 2 of III110813/Lab5


Ignore:
Timestamp:
Oct 21, 2011, 2:25:46 PM (12 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • III110813/Lab5

    v1 v2  
    1 [[PageOutline]]
     1
    22
    33◢ <[wiki:III110813/Lab3 實作三]> | <[wiki:III110813 回課程大綱]> ▲ | <[wiki:III110813/Lab5 實作五]> ◣
    44
    55= 實作五 Lab 5 =
    6 
     6[[PageOutline]]
    77{{{
    88#!html
     
    1010}}}
    1111
    12 == MapReduce 範例一『字數統計(WordCount)』 ==
     12== 範例一『字數統計(WordCount)』 ==
    1313
    1414 * STEP 1 : 練習 MapReduce 丟 Job 指令: 『__'''hadoop jar <local jar file> <class name> <parameters>'''__』
     
    8181}}}
    8282   * [[BR]][[Image(Hadoop4Win:hadoop4win_22.jpg,width=600)]]
     83
     84== 範例二『用標準表示法過濾內容 grep』 ==
     85
     86 * grep 這個命令是擷取文件裡面特定的字元,在 Hadoop example 中此指令可以擷取文件中有此指定文字的字串,並作計數統計[[BR]]grep is a command to extract specific characters in documents. In hadoop examples, you can use this command to extract strings match the regular expression and count for matched strings.
     87{{{
     88Jazz@human /opt/hadoop
     89$ hadoop jar hadoop-*-examples.jar  grep input lab5_out1 'dfs[a-z.]+'
     90}}}
     91 * 運作的畫面如下:[[BR]]You should see procedure like this:
     92{{{
     93Jazz@human /opt/hadoop
     94$ hadoop jar hadoop-*-examples.jar  grep input lab5_out1 'dfs[a-z.]+'
     9511/10/21 14:17:39 INFO mapred.FileInputFormat: Total input paths to process : 12
     96
     9711/10/21 14:17:39 INFO mapred.JobClient: Running job: job_201110211130_0002
     9811/10/21 14:17:40 INFO mapred.JobClient:  map 0% reduce 0%
     9911/10/21 14:17:54 INFO mapred.JobClient:  map 8% reduce 0%
     10011/10/21 14:17:57 INFO mapred.JobClient:  map 16% reduce 0%
     10111/10/21 14:18:03 INFO mapred.JobClient:  map 33% reduce 0%
     10211/10/21 14:18:13 INFO mapred.JobClient:  map 41% reduce 0%
     10311/10/21 14:18:16 INFO mapred.JobClient:  map 50% reduce 11%
     10411/10/21 14:18:19 INFO mapred.JobClient:  map 58% reduce 11%
     10511/10/21 14:18:23 INFO mapred.JobClient:  map 66% reduce 11%
     10611/10/21 14:18:30 INFO mapred.JobClient:  map 83% reduce 16%
     10711/10/21 14:18:33 INFO mapred.JobClient:  map 83% reduce 22%
     10811/10/21 14:18:36 INFO mapred.JobClient:  map 91% reduce 22%
     10911/10/21 14:18:39 INFO mapred.JobClient:  map 100% reduce 22%
     11011/10/21 14:18:42 INFO mapred.JobClient:  map 100% reduce 27%
     11111/10/21 14:18:48 INFO mapred.JobClient:  map 100% reduce 30%
     11211/10/21 14:18:54 INFO mapred.JobClient:  map 100% reduce 100%
     11311/10/21 14:18:56 INFO mapred.JobClient: Job complete: job_201110211130_0002
     11411/10/21 14:18:56 INFO mapred.JobClient: Counters: 18
     11511/10/21 14:18:56 INFO mapred.JobClient:   Job Counters
     11611/10/21 14:18:56 INFO mapred.JobClient:     Launched reduce tasks=1
     11711/10/21 14:18:56 INFO mapred.JobClient:     Launched map tasks=12
     11811/10/21 14:18:56 INFO mapred.JobClient:     Data-local map tasks=12
     11911/10/21 14:18:56 INFO mapred.JobClient:   FileSystemCounters
     12011/10/21 14:18:56 INFO mapred.JobClient:     FILE_BYTES_READ=888
     12111/10/21 14:18:56 INFO mapred.JobClient:     HDFS_BYTES_READ=18312
     12211/10/21 14:18:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1496
     12311/10/21 14:18:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=280
     12411/10/21 14:18:56 INFO mapred.JobClient:   Map-Reduce Framework
     12511/10/21 14:18:56 INFO mapred.JobClient:     Reduce input groups=7
     12611/10/21 14:18:56 INFO mapred.JobClient:     Combine output records=7
     12711/10/21 14:18:56 INFO mapred.JobClient:     Map input records=553
     12811/10/21 14:18:56 INFO mapred.JobClient:     Reduce shuffle bytes=224
     12911/10/21 14:18:56 INFO mapred.JobClient:     Reduce output records=7
     13011/10/21 14:18:56 INFO mapred.JobClient:     Spilled Records=14
     13111/10/21 14:18:56 INFO mapred.JobClient:     Map output bytes=193
     13211/10/21 14:18:56 INFO mapred.JobClient:     Map input bytes=18312
     13311/10/21 14:18:56 INFO mapred.JobClient:     Combine input records=10
     13411/10/21 14:18:56 INFO mapred.JobClient:     Map output records=10
     13511/10/21 14:18:56 INFO mapred.JobClient:     Reduce input records=7
     13611/10/21 14:18:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing th
     137e arguments. Applications should implement Tool for the same.
     13811/10/21 14:18:57 INFO mapred.FileInputFormat: Total input paths to process : 1
     13911/10/21 14:18:57 INFO mapred.JobClient: Running job: job_201110211130_0003
     140( ... skip ... )
     141}}}
     142 * 接著查看結果[[BR]]Let's check the computed result of '''grep''' from HDFS :
     143 * 這個例子是要從 input 目錄中的所有檔案中找出符合 dfs 後面接著 a-z 字母一個以上的字串
     144{{{
     145Jazz@human /opt/hadoop
     146$ hadoop fs -ls lab5_out1
     147Found 2 items
     148drwxr-xr-x   - Jazz supergroup          0 2011-10-21 14:18 /user/Jazz/lab5_out1/_logs
     149-rw-r--r--   1 Jazz supergroup         96 2011-10-21 14:19 /user/Jazz/lab5_out1/part-00000
     150
     151Jazz@human /opt/hadoop
     152$ hadoop fs -cat lab5_out1/part-00000
     1533       dfs.class
     1542       dfs.period
     1551       dfs.file
     1561       dfs.replication
     1571       dfs.servers
     1581       dfsadmin
     1591       dfsmetrics.log
     160}}}