| 15 | === 5/8 === |
| 16 | 1. 基於資安問題將nutch限制瀏覽ip,修改conf/server.xml檔,加入 |
| 17 | {{{ |
| 18 | |
| 19 | <Context path="/path/to/secret_files" ...> |
| 20 | <Valve className="org.apache.catalina.valves.RemoteAddrValve" |
| 21 | allow="127.0.0.1" deny=""/> |
| 22 | </Context> |
| 23 | }}} |
| 24 | |
| 25 | 2. tomcat 調校方法 |
| 26 | [http://www.oreilly.com.tw/column_editor.php?id=e137 中文] 、 [http://www.onjava.com/lpt/a/3909 英文] |
| 27 | |
| 28 | === 5/7 === |
| 29 | 1. nutch 運作於 管理規範專區成功,並parse進pdf,word內容 改法為在nutch.site.xml加入內容 |
| 30 | |
| 31 | {{{ |
| 32 | <property> |
| 33 | <name>plugin.includes</name> |
| 34 | <value>protocol-http|urlfilter-regex|parse-(text|html|js|pdf|msword|rss|rtf|oo|msexcel|parse-mspowerpoint)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> |
| 35 | <description>... |
| 36 | </description> |
| 37 | </property> |
| 38 | }}} |
| 39 | |
| 40 | parse-(text|html|js|pdf|msword|rss|rtf|oo|msexcel|parse-mspowerpoint)內的檔名需要對應plugins中parse-XXX的名稱而定 |
| 41 | |
| 42 | === 5/5 === |
| 43 | 1. nutch 運作於 管理規範專區成功,但內容卻不包含pdf, word, ... |
| 44 | |