Changes between Version 1 and Version 2 of NCTU110329/Lab8


Ignore:
Timestamp:
Apr 26, 2011, 1:25:33 PM (13 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • NCTU110329/Lab8

    v1 v2  
    99<div style="text-align: center;"><big style="font-weight: bold;"><big>練習豬的拉丁語<br/>Pig Latin in Practice</big></big></div>
    1010}}}
     11
     12== Aggregation ==
     13
     14{{{
     15~$ wget http://hadoop.nchc.org.tw/excite-small.log
     16~$ pig -x local
     17grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     18grunt> grpd = GROUP log BY user;
     19grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     20grunt> STORE cntd INTO 'lab8_out1';
     21grunt> quit
     22~$ head lab8_out1
     23}}}
     24
     25== Filter ==
     26
     27{{{
     28~$ pig -x local
     29grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     30grunt> grpd = GROUP log BY user;
     31grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     32grunt> fltrd = FILTER cntd BY cnt > 50;
     33grunt> STORE fltrd INTO 'lab8_out2';
     34grunt> quit
     35~$ head lab8_out2
     36}}}
     37
     38== Sorting ==
     39
     40{{{
     41~$ pig -x local
     42grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     43grunt> grpd = GROUP log BY user;
     44grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     45grunt> fltrd = FILTER cntd BY cnt > 50;
     46grunt> srtd = ORDER fltrd BY cnt;
     47grunt> STORE srtd INTO 'lab8_out3';
     48grunt> quit
     49~$ head lab8_out3
     50}}}
     51
     52== Connecting Hadoop ==
     53
     54{{{
     55~$ hadoop fs -put excite-small.log .
     56~$ pig
     57grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     58grunt> grpd = GROUP log BY user;
     59grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     60grunt> STORE cntd INTO 'lab8_out1';
     61grunt> quit
     62~$ head lab8_out1
     63}}}