wiki:jazz/Hadoop_Lab2

Context Navigation

Version 8 (modified by waue, 16 years ago) (diff)
--

實作二： HDFS 指令操作練習

實作二： HDFS 指令操作練習

前言

此部份接續實做一

Content 1. 基本操作

1.1 瀏覽你HDFS目錄

1.2 上傳資料到HDFS目錄

1.3 下載HDFS的資料到本地目錄

1.4 更多指令操作

Content 2. Hadoop 運算命令

2.1 Hadoop運算命令 grep

2.2 Hadoop運算命令 WordCount

2.3 更多運算命令

可執行的指令一覽表：

aggregatewordcount An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist An Aggregate based map/reduce program that computes the histogram of the words in the input files.
grep A map/reduce program that counts the matches of a regex in the input.
join A job that effects a join over sorted, equally partitioned datasets
multifilewc A job that counts words from several files.
pentomino A map/reduce tile laying program to find solutions to pentomino problems.
pi A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter A map/reduce program that writes 10GB of random textual data per node.
randomwriter A map/reduce program that writes 10GB of random data per node.
sleep A job that sleeps at each map and reduce task.
sort A map/reduce program that sorts the data written by the random writer.
sudoku A sudoku solver.
wordcount A map/reduce program that counts the words in the input files.

請參考 org.apache.hadoop.examples

SleepJob.SleepInputFormat

Class Summary
AggregateWordCount	This is an example Aggregated Hadoop Map/Reduce application. It reads the text input files, breaks each line into words and counts them. The output is a locally sorted list of words and the count of how often they occurred. To run: bin/hadoop jar hadoop-*-examples.jar aggregatewordcount in-dir out-dir numOfReducers textinputformat
AggregateWordHistogram	This is an example Aggregated Hadoop Map/Reduce application. Computes the histogram of the words in the input texts. To run: bin/hadoop jar hadoop-*-examples.jar aggregatewordhist in-dir out-dir numOfReducers textinputformat
ExampleDriver	A description of an example program based on its class and a human-readable description.
Grep
Join	This is the trivial map/reduce program that does absolutely nothing other than use the framework to fragment and sort the input values.
MultiFileWordCount	MultiFileWordCount is an example to demonstrate the usage of MultiFileInputFormat.
MultiFileWordCount.MapClass	This Mapper is similar to the one in `WordCount.MapClass`.
MultiFileWordCount.MultiFileLineRecordReader	RecordReader is responsible from extracting records from the InputSplit.
MultiFileWordCount.MyInputFormat	To use `MultiFileInputFormat`, one should extend it, to return a (custom) `RecordReader`.
MultiFileWordCount.WordOffset	This record keeps <filename,offset> pairs.
PiEstimator	A Map-reduce program to estimaate the valu eof Pi using monte-carlo method.
PiEstimator.PiMapper	Mappper class for Pi estimation.
RandomTextWriter	This program uses map/reduce to just run a distributed job where there is no interaction between the tasks and each task writes a large unsorted random sequence of words.
RandomWriter	This program uses map/reduce to just run a distributed job where there is no interaction between the tasks and each task write a large unsorted random binary sequence file of BytesWritable.
SleepJob	Dummy class for testing MR framefork.
Sort<K,V>	This is the trivial map/reduce program that does absolutely nothing other than use the framework to fragment and sort the input values.
WordCount	This is an example Hadoop Map/Reduce application.
WordCount.MapClass	Counts the words in each line.
WordCount.Reduce	A reducer class that just emits the sum of the input values.

Content 3. 使用網頁Gui瀏覽訊息

練習

Attachments (1)

2009-03-24-135001_872x741_scrot.png (59.1 KB) - added by waue 16 years ago.

Download all attachments as: .zip

Download in other formats:

Plain Text

aggregatewordcount	An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist	An Aggregate based map/reduce program that computes the histogram of the words in the input files.
grep	A map/reduce program that counts the matches of a regex in the input.
join	A job that effects a join over sorted, equally partitioned datasets
multifilewc	A job that counts words from several files.
pentomino	A map/reduce tile laying program to find solutions to pentomino problems.
pi	A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter	A map/reduce program that writes 10GB of random textual data per node.
randomwriter	A map/reduce program that writes 10GB of random data per node.
sleep	A job that sleeps at each map and reduce task.
sort	A map/reduce program that sorts the data written by the random writer.
sudoku	A sudoku solver.
wordcount	A map/reduce program that counts the words in the input files.