Changes between Initial Version and Version 1 of THU120907/Lab8


Ignore:
Timestamp:
Sep 6, 2012, 11:47:09 PM (12 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • THU120907/Lab8

    v1 v1  
     1◢ <[wiki:THU120907/Lab7 實作七]> | <[wiki:THU120907 回課程大綱]> ▲ | ◣
     2
     3= 實作八 Lab 8 =
     4[[PageOutline]]
     5{{{
     6#!html
     7<div style="text-align: center;"><big style="font-weight: bold;"><big>練習豬的拉丁語<br/>Pig Latin in Practice</big></big></div>
     8}}}
     9
     10{{{
     11#!text
     12以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。
     13}}}
     14
     15== Aggregation (Local Mode) ==
     16
     17{{{
     18~$ wget http://hadoop.nchc.org.tw/excite-small.log
     19~$ pig -x local
     20grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     21grunt> grpd = GROUP log BY user;
     22grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     23grunt> STORE cntd INTO 'lab8_out1';
     24grunt> quit
     25~$ head lab8_out1
     26}}}
     27
     28== Filter (Local Mode) ==
     29
     30{{{
     31~$ pig -x local
     32grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     33grunt> grpd = GROUP log BY user;
     34grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     35grunt> fltrd = FILTER cntd BY cnt > 50;
     36grunt> STORE fltrd INTO 'lab8_out2';
     37grunt> quit
     38~$ head lab8_out2
     39}}}
     40
     41== Sorting (Local Mode) ==
     42
     43{{{
     44~$ pig -x local
     45grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     46grunt> grpd = GROUP log BY user;
     47grunt> cntd = FOREACH grpd GENERATE group, COUNT(log) AS cnt;
     48grunt> fltrd = FILTER cntd BY cnt > 50;
     49grunt> srtd = ORDER fltrd BY cnt;
     50grunt> STORE srtd INTO 'lab8_out3';
     51grunt> quit
     52~$ head lab8_out3
     53}}}
     54
     55== Connect Pig to Hadoop (Full Distributed Mode) ==
     56
     57{{{
     58~$ hadoop fs -put excite-small.log .
     59~$ pig
     60grunt> log = LOAD 'excite-small.log' AS (user, timestamp, query);
     61grunt> grpd = GROUP log BY user;
     62grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
     63grunt> STORE cntd INTO 'lab8_out1';
     64grunt> quit
     65~$ hadoop fs -cat lab8_out1/part-00000
     66}}}