wiki:jazz/Hadoop_Lab2

Version 8 (modified by waue, 15 years ago) (diff)

--

實作二: HDFS 指令操作練習

前言

  • 此部份接續實做一

Content 1. 基本操作

1.1 瀏覽你HDFS目錄

1.2 上傳資料到HDFS目錄

1.3 下載HDFS的資料到本地目錄

1.4 更多指令操作

Content 2. Hadoop 運算命令

2.1 Hadoop運算命令 grep

2.2 Hadoop運算命令 WordCount

2.3 更多運算命令

可執行的指令一覽表:

aggregatewordcount An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist An Aggregate based map/reduce program that computes the histogram of the words in the input files.
grep A map/reduce program that counts the matches of a regex in the input.
join A job that effects a join over sorted, equally partitioned datasets
multifilewc A job that counts words from several files.
pentomino A map/reduce tile laying program to find solutions to pentomino problems.
pi A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter A map/reduce program that writes 10GB of random textual data per node.
randomwriter A map/reduce program that writes 10GB of random data per node.
sleep A job that sleeps at each map and reduce task.
sort A map/reduce program that sorts the data written by the random writer.
sudoku A sudoku solver.
wordcount A map/reduce program that counts the words in the input files.

請參考 org.apache.hadoop.examples

SleepJob.SleepInputFormat
Class Summary
AggregateWordCount This is an example Aggregated Hadoop Map/Reduce application. It reads the text input files, breaks each line into words and counts them. The output is a locally sorted list of words and the count of how often they occurred. To run: bin/hadoop jar hadoop-*-examples.jar aggregatewordcount in-dir out-dir numOfReducers textinputformat
AggregateWordHistogram This is an example Aggregated Hadoop Map/Reduce application. Computes the histogram of the words in the input texts. To run: bin/hadoop jar hadoop-*-examples.jar aggregatewordhist in-dir out-dir numOfReducers textinputformat
ExampleDriver A description of an example program based on its class and a human-readable description.
Grep  
Join This is the trivial map/reduce program that does absolutely nothing other than use the framework to fragment and sort the input values.
MultiFileWordCount MultiFileWordCount is an example to demonstrate the usage of MultiFileInputFormat.
MultiFileWordCount.MapClass This Mapper is similar to the one in WordCount.MapClass.
MultiFileWordCount.MultiFileLineRecordReader RecordReader is responsible from extracting records from the InputSplit.
MultiFileWordCount.MyInputFormat To use MultiFileInputFormat, one should extend it, to return a (custom) RecordReader.
MultiFileWordCount.WordOffset This record keeps <filename,offset> pairs.
PiEstimator A Map-reduce program to estimaate the valu eof Pi using monte-carlo method.
PiEstimator.PiMapper Mappper class for Pi estimation.
RandomTextWriter This program uses map/reduce to just run a distributed job where there is no interaction between the tasks and each task writes a large unsorted random sequence of words.
RandomWriter This program uses map/reduce to just run a distributed job where there is no interaction between the tasks and each task write a large unsorted random binary sequence file of BytesWritable.
SleepJob Dummy class for testing MR framefork.
Sort<K,V> This is the trivial map/reduce program that does absolutely nothing other than use the framework to fragment and sort the input values.
WordCount This is an example Hadoop Map/Reduce application.
WordCount.MapClass Counts the words in each line.
WordCount.Reduce A reducer class that just emits the sum of the input values.
 

Content 3. 使用網頁Gui瀏覽訊息

練習

Attachments (1)

Download all attachments as: .zip