close
Warning:
Can't synchronize with repository "(default)" (Unsupported version control system "svn": /usr/lib/python2.7/dist-packages/libsvn/_fs.so: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.
- Timestamp:
-
Sep 1, 2011, 1:56:24 PM (14 years ago)
- Author:
-
waue
- Comment:
-
--
Legend:
- Unmodified
- Added
- Removed
- Modified
-
v7
|
v8
|
|
9 | 9 | [[PageOutline]] |
10 | 10 | |
| 11 | = Change Log = |
| 12 | * <update> 20110901 IKAnalyzer3.2.8_waue.jar 重編 |
11 | 13 | = 編譯 = |
12 | 14 | |
… |
… |
|
72 | 74 | |
73 | 75 | * 下載 IKAnalyzer3.2.8.jar (2011/07/29) 解壓縮 |
| 76 | * 此處可以使用官方原始檔來編譯 ,但最後要讓 nutch 頁面索引時索引入的IKAnalyzer 需要修正過,可直接使用我修好的 [http://trac.nchc.org.tw/cloud/attachment/wiki/waue/2011/0801/IKAnalyzer3.2.8_waue.jar IKAnalyzer3.2.8_waue.jar] |
74 | 77 | [http://code.google.com/p/ik-analyzer/downloads/list] |
75 | 78 | |
… |
… |
|
137 | 140 | |
138 | 141 | |
139 | | |
140 | | |
141 | | |
142 | 142 | * 重新編譯 nutch 並產生 nutch-job-1.2.job |
143 | 143 | |
… |
… |
|
155 | 155 | = 佈署 = |
156 | 156 | |
157 | | 分別將 IKAnalyzer3.2.8.jar ; nutch-1.2.jar ; nutch-1.2.job 放到以下目錄 |
| 157 | 分別將 [http://trac.nchc.org.tw/cloud/attachment/wiki/waue/2011/0801/IKAnalyzer3.2.8_waue.jar IKAnalyzer3.2.8-waue.jar(fixed)] ; nutch-1.2.jar ; nutch-1.2.job 放到以下目錄 |
158 | 158 | |
159 | 159 | || 目錄 || 放置檔案 || |
160 | | || /opt/crawlzilla/nutch/lib/ || IKAnalyzer3.2.8.jar || |
| 160 | || /opt/crawlzilla/nutch/lib/ || IKAnalyzer3.2.8-waue.jar || |
161 | 161 | || /opt/crawlzilla/nutch || nutch-1.2.jar [[BR]] nutch-1.2.job || |
162 | | || /opt/crawlzilla/tomcat/webapps/default/WEB-INF/lib/ || IKAnalyzer3.2.8.jar [[BR]] nutch-1.2.jar || |
| 162 | || /opt/crawlzilla/tomcat/webapps/default/WEB-INF/lib/ || IKAnalyzer3.2.8-waue.jar [[BR]] nutch-1.2.jar || |
163 | 163 | |
164 | 164 | * 最後用nutch 的 crawl 抓取網頁,搜索的結果就是按ik分過的中文詞 |
| 165 | |
| 166 | * 不使用修正過後的IK分詞庫,雖然nutch 爬取沒問題,也能建立正確的分詞庫,但索引網頁return 回來的頁面會是空白一片,可參考Debug 一節 |
| 167 | |
165 | 168 | |
166 | 169 | = 修改 = |