Changes between Initial Version and Version 1 of FDC110829/Lab7


Ignore:
Timestamp:
Aug 29, 2011, 11:22:50 PM (13 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FDC110829/Lab7

    v1 v1  
     1[[PageOutline]]
     2
     3◢ <[wiki:FDC110829/Lab6 實作六]> | <[wiki:FDC110829 回課程大綱]> ▲ | <---> ◣
     4
     5= 實作七 Lab7 =
     6
     7{{{
     8#!html
     9<div style="text-align: center;"><big style="font-weight: bold;"><big>練習豬的拉丁語<br/>Pig Latin in Practice</big></big></div>
     10}}}
     11
     12 * 首先使用您的帳號,連線至 hadoop.nchc.org.tw
     13
     14== Aggregation (Local Mode) ==
     15
     16{{{
     17~$ wget http://hadoop.nchc.org.tw/excite-small.log
     18~$ pig -x local
     19grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     20grunt> grpd = GROUP log BY user;
     21grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     22grunt> STORE cntd INTO 'lab8_out1';
     23grunt> quit
     24~$ head lab8_out1
     25}}}
     26
     27== Filter (Local Mode) ==
     28
     29{{{
     30~$ pig -x local
     31grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     32grunt> grpd = GROUP log BY user;
     33grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     34grunt> fltrd = FILTER cntd BY cnt > 50;
     35grunt> STORE fltrd INTO 'lab8_out2';
     36grunt> quit
     37~$ head lab8_out2
     38}}}
     39
     40== Sorting (Local Mode) ==
     41
     42{{{
     43~$ pig -x local
     44grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     45grunt> grpd = GROUP log BY user;
     46grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     47grunt> fltrd = FILTER cntd BY cnt > 50;
     48grunt> srtd = ORDER fltrd BY cnt;
     49grunt> STORE srtd INTO 'lab8_out3';
     50grunt> quit
     51~$ head lab8_out3
     52}}}
     53
     54== Connect Pig to Hadoop (Full Distributed Mode) ==
     55
     56{{{
     57~$ hadoop fs -put excite-small.log .
     58~$ pig
     59grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     60grunt> grpd = GROUP log BY user;
     61grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     62grunt> STORE cntd INTO 'lab8_out1';
     63grunt> quit
     64~$ hadoop fs -cat lab8_out1/part-00000
     65}}}