Changes between Initial Version and Version 1 of YMU110509/Lab8


Ignore:
Timestamp:
May 9, 2011, 11:31:29 AM (14 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • YMU110509/Lab8

    v1 v1  
     1[[PageOutline]]
     2
     3◢ <[wiki:YMU110509/Lab7 實作七]> | <[wiki:YMU110509 回課程大綱]> ▲ | <---> ◣
     4
     5= 實作八 Lab8 =
     6
     7{{{
     8#!html
     9<div style="text-align: center;"><big style="font-weight: bold;"><big>練習豬的拉丁語<br/>Pig Latin in Practice</big></big></div>
     10}}}
     11
     12== Aggregation (Local Mode) ==
     13
     14{{{
     15~$ wget http://hadoop.nchc.org.tw/excite-small.log
     16~$ pig -x local
     17grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     18grunt> grpd = GROUP log BY user;
     19grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     20grunt> STORE cntd INTO 'lab8_out1';
     21grunt> quit
     22~$ head lab8_out1
     23}}}
     24
     25== Filter (Local Mode) ==
     26
     27{{{
     28~$ pig -x local
     29grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     30grunt> grpd = GROUP log BY user;
     31grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     32grunt> fltrd = FILTER cntd BY cnt > 50;
     33grunt> STORE fltrd INTO 'lab8_out2';
     34grunt> quit
     35~$ head lab8_out2
     36}}}
     37
     38== Sorting (Local Mode) ==
     39
     40{{{
     41~$ pig -x local
     42grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     43grunt> grpd = GROUP log BY user;
     44grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     45grunt> fltrd = FILTER cntd BY cnt > 50;
     46grunt> srtd = ORDER fltrd BY cnt;
     47grunt> STORE srtd INTO 'lab8_out3';
     48grunt> quit
     49~$ head lab8_out3
     50}}}
     51
     52== Connect Pig to Hadoop (Full Distributed Mode) ==
     53
     54{{{
     55~$ hadoop fs -put excite-small.log .
     56~$ pig
     57grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     58grunt> grpd = GROUP log BY user;
     59grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     60grunt> STORE cntd INTO 'lab8_out1';
     61grunt> quit
     62~$ hadoop fs -cat lab8_out1/part-00000
     63}}}