| | 15 | === 5/8 === |
| | 16 | 1. 基於資安問題將nutch限制瀏覽ip,修改conf/server.xml檔,加入 |
| | 17 | {{{ |
| | 18 | |
| | 19 | <Context path="/path/to/secret_files" ...> |
| | 20 | <Valve className="org.apache.catalina.valves.RemoteAddrValve" |
| | 21 | allow="127.0.0.1" deny=""/> |
| | 22 | </Context> |
| | 23 | }}} |
| | 24 | |
| | 25 | 2. tomcat 調校方法 |
| | 26 | [http://www.oreilly.com.tw/column_editor.php?id=e137 中文] 、 [http://www.onjava.com/lpt/a/3909 英文] |
| | 27 | |
| | 28 | === 5/7 === |
| | 29 | 1. nutch 運作於 管理規範專區成功,並parse進pdf,word內容 改法為在nutch.site.xml加入內容 |
| | 30 | |
| | 31 | {{{ |
| | 32 | <property> |
| | 33 | <name>plugin.includes</name> |
| | 34 | <value>protocol-http|urlfilter-regex|parse-(text|html|js|pdf|msword|rss|rtf|oo|msexcel|parse-mspowerpoint)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> |
| | 35 | <description>... |
| | 36 | </description> |
| | 37 | </property> |
| | 38 | }}} |
| | 39 | |
| | 40 | parse-(text|html|js|pdf|msword|rss|rtf|oo|msexcel|parse-mspowerpoint)內的檔名需要對應plugins中parse-XXX的名稱而定 |
| | 41 | |
| | 42 | === 5/5 === |
| | 43 | 1. nutch 運作於 管理規範專區成功,但內容卻不包含pdf, word, ... |
| | 44 | |