Changes between Initial Version and Version 1 of 0330Hadoop_Lab2


Ignore:
Timestamp:
Mar 29, 2009, 1:50:12 AM (16 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 0330Hadoop_Lab2

    v1 v1  
     1[[PageOutline]]
     2
     3= 實作二: HDFS 指令操作練習  =
     4
     5 == 前言 ==
     6
     7 * 此部份接續實做一
     8   
     9 == Content 1. 基本操作 ==
     10 === 1.1 瀏覽你HDFS目錄  ===
     11
     12{{{
     13/opt/hadoop$ bin/hadoop fs -ls
     14}}}
     15
     16 === 1.2 上傳資料到HDFS目錄 ===
     17 * 上傳
     18
     19{{{
     20/opt/hadoop$ bin/hadoop fs -put conf input
     21}}}
     22
     23 * 檢查
     24
     25{{{
     26/opt/hadoop$ bin/hadoop fs -ls
     27/opt/hadoop$ bin/hadoop fs -ls input
     28}}}
     29 
     30 === 1.3 下載HDFS的資料到本地目錄 ===
     31
     32 * 下載
     33
     34{{{
     35/opt/hadoop$ bin/hadoop fs -get input fromHDFS
     36}}}
     37
     38 * 檢查
     39
     40{{{
     41/opt/hadoop$ ls -al | grep fromHDFS
     42/opt/hadoop$ ls -al fromHDFS
     43}}} 
     44
     45 === 1.4 刪除檔案 ===
     46
     47{{{
     48/opt/hadoop$ bin/hadoop fs -ls input
     49/opt/hadoop$ bin/hadoop fs -rm input/masters
     50}}}
     51
     52 === 1.5 直接看檔案 ===
     53
     54{{{
     55/opt/hadoop$ bin/hadoop fs -ls input
     56/opt/hadoop$ bin/hadoop fs -cat input/slaves
     57}}}
     58
     59 === 1.6 更多指令操作 ===
     60
     61{{{
     62hadooper@vPro:/opt/hadoop$ bin/hadoop fs
     63
     64Usage: java FsShell
     65           [-ls <path>]
     66           [-lsr <path>]
     67           [-du <path>]
     68           [-dus <path>]
     69           [-count[-q] <path>]
     70           [-mv <src> <dst>]
     71           [-cp <src> <dst>]
     72           [-rm <path>]
     73           [-rmr <path>]
     74           [-expunge]
     75           [-put <localsrc> ... <dst>]
     76           [-copyFromLocal <localsrc> ... <dst>]
     77           [-moveFromLocal <localsrc> ... <dst>]
     78           [-get [-ignoreCrc] [-crc] <src> <localdst>]
     79           [-getmerge <src> <localdst> [addnl]]
     80           [-cat <src>]
     81           [-text <src>]
     82           [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
     83           [-moveToLocal [-crc] <src> <localdst>]
     84           [-mkdir <path>]
     85           [-setrep [-R] [-w] <rep> <path/file>]
     86           [-touchz <path>]
     87           [-test -[ezd] <path>]
     88           [-stat [format] <path>]
     89           [-tail [-f] <file>]
     90           [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
     91           [-chown [-R] [OWNER][:[GROUP]] PATH...]
     92           [-chgrp [-R] GROUP PATH...]
     93           [-help [cmd]]
     94
     95Generic options supported are
     96-conf <configuration file>     specify an application configuration file
     97-D <property=value>            use value for given property
     98-fs <local|namenode:port>      specify a namenode
     99-jt <local|jobtracker:port>    specify a job tracker
     100-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
     101-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
     102-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
     103The general command line syntax is
     104bin/hadoop command [genericOptions] [commandOptions]
     105
     106}}} 
     107 
     108 
     109 == Content 2. Hadoop 運算命令 ==
     110 
     111 === 2.1 Hadoop運算命令 grep === 
     112 
     113 * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計
     114 
     115{{{
     116 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar grep input grep_output 'dfs[a-z.]+'
     117 
     118}}}
     119 
     120 運作的畫面如下:
     121 
     122{{{
     123
     12409/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     12509/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     12609/03/24 12:33:45 INFO mapred.JobClient: Running job: job_200903232025_0003
     12709/03/24 12:33:46 INFO mapred.JobClient:  map 0% reduce 0%
     12809/03/24 12:33:47 INFO mapred.JobClient:  map 10% reduce 0%
     12909/03/24 12:33:49 INFO mapred.JobClient:  map 20% reduce 0%
     13009/03/24 12:33:51 INFO mapred.JobClient:  map 30% reduce 0%
     13109/03/24 12:33:52 INFO mapred.JobClient:  map 40% reduce 0%
     13209/03/24 12:33:54 INFO mapred.JobClient:  map 50% reduce 0%
     13309/03/24 12:33:55 INFO mapred.JobClient:  map 60% reduce 0%
     13409/03/24 12:33:57 INFO mapred.JobClient:  map 70% reduce 0%
     13509/03/24 12:33:59 INFO mapred.JobClient:  map 80% reduce 0%
     13609/03/24 12:34:00 INFO mapred.JobClient:  map 90% reduce 0%
     13709/03/24 12:34:02 INFO mapred.JobClient:  map 100% reduce 0%
     13809/03/24 12:34:10 INFO mapred.JobClient:  map 100% reduce 10%
     13909/03/24 12:34:12 INFO mapred.JobClient:  map 100% reduce 13%
     14009/03/24 12:34:15 INFO mapred.JobClient:  map 100% reduce 20%
     14109/03/24 12:34:20 INFO mapred.JobClient:  map 100% reduce 23%
     14209/03/24 12:34:22 INFO mapred.JobClient: Job complete: job_200903232025_0003
     14309/03/24 12:34:22 INFO mapred.JobClient: Counters: 16
     14409/03/24 12:34:22 INFO mapred.JobClient:   File Systems
     14509/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes read=48245
     14609/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes written=1907
     14709/03/24 12:34:22 INFO mapred.JobClient:     Local bytes read=1549
     14809/03/24 12:34:22 INFO mapred.JobClient:     Local bytes written=3584
     14909/03/24 12:34:22 INFO mapred.JobClient:   Job Counters
     150......
     151}}}
     152
     153 
     154 * 接著查看結果
     155
     156{{{
     157  /opt/hadoop$ bin/hadoop fs -ls grep_output
     158  /opt/hadoop$ bin/hadoop fs -cat grep_output/part-00000
     159}}}
     160
     161 結果如下
     162
     163{{{
     1643       dfs.class
     1653       dfs.
     1662       dfs.period
     1671       dfs.http.address
     1681       dfs.balance.bandwidth
     1691       dfs.block.size
     1701       dfs.blockreport.initial
     1711       dfs.blockreport.interval
     1721       dfs.client.block.write.retries
     1731       dfs.client.buffer.dir
     1741       dfs.data.dir
     1751       dfs.datanode.address
     1761       dfs.datanode.dns.interface
     1771       dfs.datanode.dns.nameserver
     1781       dfs.datanode.du.pct
     1791       dfs.datanode.du.reserved
     1801       dfs.datanode.handler.count
     1811       dfs.datanode.http.address
     1821       dfs.datanode.https.address
     1831       dfs.datanode.ipc.address
     1841       dfs.default.chunk.view.size
     1851       dfs.df.interval
     1861       dfs.file
     1871       dfs.heartbeat.interval
     1881       dfs.hosts
     1891       dfs.hosts.exclude
     1901       dfs.https.address
     1911       dfs.impl
     1921       dfs.max.objects
     1931       dfs.name.dir
     1941       dfs.namenode.decommission.interval
     1951       dfs.namenode.decommission.interval.
     1961       dfs.namenode.decommission.nodes.per.interval
     1971       dfs.namenode.handler.count
     1981       dfs.namenode.logging.level
     1991       dfs.permissions
     2001       dfs.permissions.supergroup
     2011       dfs.replication
     2021       dfs.replication.consider
     2031       dfs.replication.interval
     2041       dfs.replication.max
     2051       dfs.replication.min
     2061       dfs.replication.min.
     2071       dfs.safemode.extension
     2081       dfs.safemode.threshold.pct
     2091       dfs.secondary.http.address
     2101       dfs.servers
     2111       dfs.web.ugi
     2121       dfsmetrics.log
     213
     214 }}}
     215
     216 === 2.2 Hadoop運算命令 WordCount ===
     217 
     218 * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列
     219 
     220 {{{
     221 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar wordcount input wc_output
     222 }}}
     223 
     224 檢查輸出結果的方法同2.1的方法
     225
     226 === 2.3 更多運算命令 ===
     227 
     228 可執行的指令一覽表:
     229
     230 || aggregatewordcount ||  An Aggregate based map/reduce program that counts the words in the input files. ||
     231 || aggregatewordhist || An Aggregate based map/reduce program that computes the histogram of the words in the input files. ||
     232 || grep ||  A map/reduce program that counts the matches of a regex in the input. ||
     233 || join || A job that effects a join over sorted, equally partitioned datasets ||
     234 || multifilewc ||  A job that counts words from several files. ||
     235 || pentomino  || A map/reduce tile laying program to find solutions to pentomino problems. ||
     236 || pi ||  A map/reduce program that estimates Pi using monte-carlo method. ||
     237 || randomtextwriter ||  A map/reduce program that writes 10GB of random textual data per node. ||
     238 || randomwriter || A map/reduce program that writes 10GB of random data per node. ||
     239 || sleep ||  A job that sleeps at each map and reduce task. ||
     240 || sort || A map/reduce program that sorts the data written by the random writer. ||
     241 || sudoku ||  A sudoku solver. ||
     242 || wordcount || A map/reduce program that counts the words in the input files. ||
     243
     244 請參考 [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples]
     245
     246{{{
     247#!html
     248<html lang="zh-tw"><head>
     249
     250<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"><title>a.html</title>
     251
     252</head><body>
     253<br>
     254
     255<p>
     256</p><table summary="" border="1" cellpadding="3" cellspacing="0" width="100%">
     257<tbody><tr class="TableHeadingColor" bgcolor="#ccccff">
     258<th colspan="2" align="left"><font size="+2">
     259<b>Class Summary</b></font></th>
     260</tr>
     261<tr class="TableRowColor" bgcolor="white">
     262
     263<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordCount.html" title="class in org.apache.hadoop.examples">AggregateWordCount</a></b></td>
     264<td>This is an example Aggregated Hadoop Map/Reduce application. It
     265reads the text input files, breaks each line into words and counts
     266them. The output is a locally sorted list of words and the count of how
     267often they occurred. To run: bin/hadoop jar hadoop-*-examples.jar
     268aggregatewordcount in-dir out-dir numOfReducers textinputformat </td>
     269</tr>
     270<tr class="TableRowColor" bgcolor="white">
     271<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordHistogram.html" title="class in org.apache.hadoop.examples">AggregateWordHistogram</a></b></td>
     272<td>This is an example Aggregated Hadoop Map/Reduce application.
     273Computes the histogram of the words in the input texts. To run:
     274bin/hadoop jar hadoop-*-examples.jar aggregatewordhist in-dir out-dir
     275numOfReducers textinputformat </td>
     276</tr>
     277<tr class="TableRowColor" bgcolor="white">
     278<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/ExampleDriver.html" title="class in org.apache.hadoop.examples">ExampleDriver</a></b></td>
     279<td>A description of an example program based on its class and a human-readable description.</td>
     280</tr>
     281
     282<tr class="TableRowColor" bgcolor="white">
     283<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Grep.html" title="class in org.apache.hadoop.examples">Grep</a></b></td>
     284<td>&nbsp;</td>
     285</tr>
     286<tr class="TableRowColor" bgcolor="white">
     287<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Join.html" title="class in org.apache.hadoop.examples">Join</a></b></td>
     288<td>This is the trivial map/reduce program that does absolutely nothing
     289other than use the framework to fragment and sort the input values. To
     290run: bin/hadoop jar build/hadoop-examples.jar join [-m maps] [-r
     291reduces] [-inFormat input format class] [-outFormat output format
     292class] [-outKey output key class] [-outValue output value class]
     293[-joinOp] [in-dir]* in-dir out-dir</inner></td>
     294</tr>
     295<tr class="TableRowColor" bgcolor="white">
     296<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomTextWriter.html" title="class in org.apache.hadoop.examples">RandomTextWriter</a></b></td>
     297<td>This program uses map/reduce to just run a distributed job where
     298there is
     299no interaction between the tasks and each task writes a large unsorted
     300random sequence of words.To run: bin/hadoop jar
     301hadoop-${version}-examples.jar randomtextwriter [-outFormat output
     302format class] output</td>
     303
     304</tr>
     305<tr class="TableRowColor" bgcolor="white">
     306<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomWriter.html" title="class in org.apache.hadoop.examples">RandomWriter</a></b></td>
     307<td>This program uses map/reduce to just run a distributed job where
     308there is
     309no interaction between the tasks and each task write a large unsorted
     310random binary sequence file of BytesWritable.To run: bin/hadoop jar
     311hadoop-${version}-examples.jar randomwriter [-outFormat output format
     312class] output</td>
     313</tr>
     314<tr class="TableRowColor" bgcolor="white">
     315<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Sort.html" title="class in org.apache.hadoop.examples">Sort&lt;K,V&gt;</a></b></td>
     316<td>This is the trivial map/reduce program that does absolutely nothing
     317other than use the framework to fragment and sort the input values.To
     318run: bin/hadoop jar build/hadoop-examples.jar sort [-m maps] [-r
     319reduces] [-inFormat input format class] [-outFormat output format
     320class] [-outKey output key class] [-outValue output value class]
     321[-totalOrder pcnt num samples max splits] in-dir out-dir</td>
     322</tr>
     323<tr class="TableRowColor" bgcolor="white">
     324<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/WordCount.html" title="class in org.apache.hadoop.examples">WordCount</a></b></td>
     325
     326<td>This is an example Hadoop Map/Reduce application.</td>
     327</tr>
     328</tbody></table>
     329</body></html>
     330}}}
     331 
     332
     333 == Content 3. 使用網頁Gui瀏覽資訊 ==
     334 
     335 * [http://localhost:50030 Map/Reduce Administration]
     336 * [http://localhost:50070 NameNode ]
     337 
     338 == Content 4. 練習 ==
     339 
     340 * 刪除在 hdfs 內的一整個的資料夾 input
     341 * 用網頁秀出你在 wordcount練習的輸出結果[[BR]]
     342 [[Image(2009-03-24-135001_872x741_scrot.png)]]