Changes between Initial Version and Version 1 of NCHCCloudCourse100802/Lab3


Ignore:
Timestamp:
Aug 5, 2010, 11:01:48 PM (14 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • NCHCCloudCourse100802/Lab3

    v1 v1  
     1{{{
     2#!html
     3<div style="text-align: center;"><big
     4 style="font-weight: bold;"><big><big>實作三、執行 MapReduce 基本運算</big></big></big></div>
     5}}}
     6[[PageOutline]]
     7
     8 = 1 Hadoop運算命令 grep = 
     9 
     10 * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計
     11 
     12{{{
     13$ cd /opt/hadoop
     14$ bin/hadoop fs -put conf lab3_input
     15$ bin/hadoop fs -ls lab3_input
     16$ bin/hadoop jar hadoop-*-examples.jar grep lab3_input lab3_out1 'dfs[a-z.]+'
     17 
     18}}}
     19 
     20 運作的畫面如下:
     21 
     22{{{
     23
     2409/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     2509/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     2609/03/24 12:33:45 INFO mapred.JobClient: Running job: job_200903232025_0003
     2709/03/24 12:33:46 INFO mapred.JobClient:  map 0% reduce 0%
     2809/03/24 12:33:47 INFO mapred.JobClient:  map 10% reduce 0%
     2909/03/24 12:33:49 INFO mapred.JobClient:  map 20% reduce 0%
     3009/03/24 12:33:51 INFO mapred.JobClient:  map 30% reduce 0%
     3109/03/24 12:33:52 INFO mapred.JobClient:  map 40% reduce 0%
     3209/03/24 12:33:54 INFO mapred.JobClient:  map 50% reduce 0%
     3309/03/24 12:33:55 INFO mapred.JobClient:  map 60% reduce 0%
     3409/03/24 12:33:57 INFO mapred.JobClient:  map 70% reduce 0%
     3509/03/24 12:33:59 INFO mapred.JobClient:  map 80% reduce 0%
     3609/03/24 12:34:00 INFO mapred.JobClient:  map 90% reduce 0%
     3709/03/24 12:34:02 INFO mapred.JobClient:  map 100% reduce 0%
     3809/03/24 12:34:10 INFO mapred.JobClient:  map 100% reduce 10%
     3909/03/24 12:34:12 INFO mapred.JobClient:  map 100% reduce 13%
     4009/03/24 12:34:15 INFO mapred.JobClient:  map 100% reduce 20%
     4109/03/24 12:34:20 INFO mapred.JobClient:  map 100% reduce 23%
     4209/03/24 12:34:22 INFO mapred.JobClient: Job complete: job_200903232025_0003
     4309/03/24 12:34:22 INFO mapred.JobClient: Counters: 16
     4409/03/24 12:34:22 INFO mapred.JobClient:   File Systems
     4509/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes read=48245
     4609/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes written=1907
     4709/03/24 12:34:22 INFO mapred.JobClient:     Local bytes read=1549
     4809/03/24 12:34:22 INFO mapred.JobClient:     Local bytes written=3584
     4909/03/24 12:34:22 INFO mapred.JobClient:   Job Counters
     50......
     51}}}
     52
     53 
     54 * 接著查看結果
     55
     56{{{
     57$ bin/hadoop fs -ls lab3_out1
     58$ bin/hadoop fs -cat lab3_out1/part-00000
     59}}}
     60
     61 結果如下
     62
     63{{{
     643       dfs.class
     653       dfs.
     662       dfs.period
     671       dfs.http.address
     681       dfs.balance.bandwidth
     691       dfs.block.size
     701       dfs.blockreport.initial
     711       dfs.blockreport.interval
     721       dfs.client.block.write.retries
     731       dfs.client.buffer.dir
     741       dfs.data.dir
     751       dfs.datanode.address
     761       dfs.datanode.dns.interface
     771       dfs.datanode.dns.nameserver
     781       dfs.datanode.du.pct
     791       dfs.datanode.du.reserved
     801       dfs.datanode.handler.count
     811       dfs.datanode.http.address
     821       dfs.datanode.https.address
     831       dfs.datanode.ipc.address
     841       dfs.default.chunk.view.size
     851       dfs.df.interval
     861       dfs.file
     871       dfs.heartbeat.interval
     881       dfs.hosts
     891       dfs.hosts.exclude
     901       dfs.https.address
     911       dfs.impl
     921       dfs.max.objects
     931       dfs.name.dir
     941       dfs.namenode.decommission.interval
     951       dfs.namenode.decommission.interval.
     961       dfs.namenode.decommission.nodes.per.interval
     971       dfs.namenode.handler.count
     981       dfs.namenode.logging.level
     991       dfs.permissions
     1001       dfs.permissions.supergroup
     1011       dfs.replication
     1021       dfs.replication.consider
     1031       dfs.replication.interval
     1041       dfs.replication.max
     1051       dfs.replication.min
     1061       dfs.replication.min.
     1071       dfs.safemode.extension
     1081       dfs.safemode.threshold.pct
     1091       dfs.secondary.http.address
     1101       dfs.servers
     1111       dfs.web.ugi
     1121       dfsmetrics.log
     113
     114 }}}
     115
     116 = 2 Hadoop運算命令 WordCount =
     117 
     118 * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列
     119 
     120 {{{
     121 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar wordcount lab3_input lab3_out2
     122 }}}
     123 
     124 檢查輸出結果的方法同之前方法
     125
     126{{{
     127$ bin/hadoop fs -ls lab3_out2
     128$ bin/hadoop fs -cat lab3_out2/part-00000
     129}}}
     130
     131 = 3. 使用網頁Gui瀏覽資訊 =
     132 
     133 * [http://localhost:50030 透過 Map/Reduce Admin 來察看程序運作狀態]
     134
     135 * [http://localhost:50070 透過 NameNode 察看運算結果]
     136
     137 = 4. 更多運算命令 =
     138 
     139 可執行的指令一覽表:
     140
     141 || aggregatewordcount ||  An Aggregate based map/reduce program that counts the words in the input files. ||
     142 || aggregatewordhist || An Aggregate based map/reduce program that computes the histogram of the words in the input files. ||
     143 || grep ||  A map/reduce program that counts the matches of a regex in the input. ||
     144 || join || A job that effects a join over sorted, equally partitioned datasets ||
     145 || multifilewc ||  A job that counts words from several files. ||
     146 || pentomino  || A map/reduce tile laying program to find solutions to pentomino problems. ||
     147 || pi ||  A map/reduce program that estimates Pi using monte-carlo method. ||
     148 || randomtextwriter ||  A map/reduce program that writes 10GB of random textual data per node. ||
     149 || randomwriter || A map/reduce program that writes 10GB of random data per node. ||
     150 || sleep ||  A job that sleeps at each map and reduce task. ||
     151 || sort || A map/reduce program that sorts the data written by the random writer. ||
     152 || sudoku ||  A sudoku solver. ||
     153 || wordcount || A map/reduce program that counts the words in the input files. ||
     154
     155 請參考 [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples]