說明 路徑 擁有者身份 nutchez 家目錄 /opt/nutchez/ nutchuser nutch 家目錄 /opt/nutchez/nutch nutchuser nutch 工作目錄 /var/nutchez/nutch-nutchuser nutchuser nutch 日誌檔 /var/nutchez/logs nutchuser nutch 設定檔 /opt/nutchez/nutch/conf nutchuser tomcat 家目錄 /opt/nutchez/tomcat nutchuser nutchez 使用者目錄 /home/nutchuser/nutchez/ nutchuser nutchez 索引資料庫 /home/nutchuser/nutchez/search/ 由nutch完成crawl後產生
- 修改 /opt/nutchez/nutch/conf/ 的 hadoop-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://secuse.nchc.org.tw:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>secuse.nchc.org.tw:9001</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/var/nutchez/nutch-nutchuser</value> </property> </configuration>
- 改tomcat port => /opt/nutchez/tomcat/conf/ 的 server.xml
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" useBodyEncodingForURI="true" />
- 最後的搜尋結果 => /opt/nutchez/tomcat/webapps/ROOT/WEB-INF/classes/ 的 nutch-site.xml
<configuration> <property> <name>searcher.dir</name> <value>/home/nutchuser/nutchez/search</value> </property> </configuration>
- /opt/nutchez/nutch/bin/nutch 執行檔有改
NUTCH_HOME=/opt/nutchez/nutch NUTCH_CONF_DIR=/opt/nutchez/nutch/conf NUTCH_LOG_DIR=/var/nutchez/logs
- 用 改版的 nutchez 的 hadoop 還是要format 與 start-all.sh
Last modified 15 years ago
Last modified on May 21, 2010, 6:25:44 PM