wiki:NCTU110329/Lab8

Version 2 (modified by jazz, 13 years ago) (diff)

--

◢ <實作七> | <回課程大綱> ▲ | <---> ◣

實作八 Lab8

練習豬的拉丁語
Pig Latin in Practice

Aggregation

~$ wget http://hadoop.nchc.org.tw/excite-small.log
~$ pig -x local
grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
grunt> grpd = GROUP log BY user;
grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
grunt> STORE cntd INTO 'lab8_out1';
grunt> quit
~$ head lab8_out1

Filter

~$ pig -x local
grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
grunt> grpd = GROUP log BY user;
grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
grunt> fltrd = FILTER cntd BY cnt > 50;
grunt> STORE fltrd INTO 'lab8_out2';
grunt> quit
~$ head lab8_out2

Sorting

~$ pig -x local
grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
grunt> grpd = GROUP log BY user;
grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
grunt> fltrd = FILTER cntd BY cnt > 50;
grunt> srtd = ORDER fltrd BY cnt;
grunt> STORE srtd INTO 'lab8_out3';
grunt> quit
~$ head lab8_out3

Connecting Hadoop

~$ hadoop fs -put excite-small.log .
~$ pig
grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
grunt> grpd = GROUP log BY user;
grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
grunt> STORE cntd INTO 'lab8_out1';
grunt> quit
~$ head lab8_out1

Attachments (1)

Download all attachments as: .zip