Changes between Version 11 and Version 12 of LogParser
- Timestamp:
- Jul 4, 2008, 6:05:33 PM (17 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
LogParser
v11 v12 1 1 = 目的 = 2 2 This program will parse your apache log and store it into Hbase. 3 4 3 5 4 = 如何使用 = 6 * 1. Upload apache logs ( /var/log/apache2/access.log* ) to hdfs (default: /user/waue/apache-log) \ 7 5 1 Upload apache logs ( /var/log/apache2/access.log* ) to hdfs (default: /user/waue/apache-log) 8 6 {{{ 9 7 $ bin/hadoop dfs -put /var/log/apache2/ apache-log 10 8 }}} 11 * 2. parameter "dir" in main contains the logs. 12 13 * 3. you should filter the exception contents manually, 9 2 parameter "dir" in main contains the logs. 10 3 you should filter the exception contents manually, 14 11 {{{ 15 12 ex: ::1 - - [29/Jun/2008:07:35:15 +0800] "GET / HTTP/1.0" 200 729 "... … … 20 17 {{{ 21 18 hql > select * from apache-log; 22 23 19 }}} 24 20 2 結果 … … 26 22 {{{ 27 23 +-------------------------+-------------------------+-------------------------+ 28 29 24 | Row | Column | Cell | 30 31 25 +-------------------------+-------------------------+-------------------------+ 32 33 26 | 118.170.101.250 | http:agent | Mozilla/4.0 (compatible;| 34 35 27 | | | MSIE 4.01; Windows 95) | 36 28 ..........(skip)........ 37 29 +-------------------------+-------------------------+-------------------------+ 38 39 | 118.170.101.250 | http:bytesize | 318 | 40 30 | 87.65.93.58 | http:method | OPTIONS | 41 31 +-------------------------+-------------------------+-------------------------+ 42 43 ..........(skip)........44 45 +-------------------------+-------------------------+-------------------------+46 47 | 87.65.93.58 | http:method | OPTIONS |48 49 +-------------------------+-------------------------+-------------------------+50 51 32 | 87.65.93.58 | http:protocol | HTTP/1.1 | 52 53 +-------------------------+-------------------------+-------------------------+54 55 | 87.65.93.58 | referrer:- | * |56 57 +-------------------------+-------------------------+-------------------------+58 59 | 87.65.93.58 | url:* | - |60 61 +-------------------------+-------------------------+-------------------------+62 63 33 31 row(s) in set. (0.58 sec) 64 65 66 34 }}} 67 35 … … 70 38 71 39 {{{ 40 private String ip; 41 private String protocol; 42 private String method; 43 private String url; 44 private String code; 45 private String byteSize; 46 private String referrer; 47 private String agent; 48 private long timestamp; 72 49 private static Pattern p = Pattern 73 50 .compile("([︿ ]*) ([︿ ]*) ([︿ ]*) \\[([︿]]*)\\] \"([︿\"]*)\"" + … … 83 60 則此正規表示法可看成[[br]] 84 61 85 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 ||86 62 || ([︿ ]*) || ([︿ ]*) || ([︿ ]*) || \\[([︿]]*)\\] || \"([︿\"]*)\" || ([︿ ]*) || ([︿ ]*) || \"([︿\"]*)\" || \"([︿\"]*)\".* || 87 63 || ip || - || - || 時間 || "http " || 回傳碼 || 長度 || "指引" || "代理器" || … … 113 89 接著定義建構子,宣告了一個 [http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html java.util.regex.Matcher] 此物件可以用來與之前的 Pattern搭配。[[br]] 剛剛宣告的模板p有個函數 matcher(String) ,此功能會將材料(String敘述 )壓印成模板的形狀,並把這個壓出物件叫做matcher。 之後要取用matcher的第n段,只要用matcher.group(n)就可以把第n段的內容以String的形式取回。[[br]] 114 90 回頭對照傳近來的內容 115 116 117 [[br]] 91 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 92 || ip || - || - || 時間 || "http " || 回傳碼 || 長度 || "指引" || "代理器" || 93 || 140.110.138.176 || - || - || [02/Jul/2008:16:55:02 +0800] || "GET /hbase-0.1.3.zip HTTP/1.0" || 200 || 10249801 || " -" || "Wget/1.10.2" || 94 之後就很顯而易見,用matcher.group(n)取得值後,一一的用this.參數來作設定,但其實不用this 編譯依然能過關,只是習慣在建構子內用到該class的參數會這麼用(以跟繼承到父類別的參數作區別?)其中時間需要用SimpleDateFormat小轉譯一下,http的內容需要用split()來作更細部的分解。 118 95 119 96 {{{ … … 141 118 } 142 119 }}} 120 此函數用來檢查IP的格式是否正確而已