wiki:waue/2010/0521

Version 10 (modified by waue, 14 years ago) (diff)

--

說明 路徑 擁有者身份
nutchez 家目錄 /opt/nutchez/ nutchuser
nutch 家目錄 /opt/nutchez/nutch nutchuser
nutch 工作目錄 /var/nutchez/nutch-nutchuser nutchuser
nutch 日誌檔 /var/nutchez/logs nutchuser
nutch 設定檔 /opt/nutchez/nutch/conf nutchuser
tomcat 家目錄 /opt/nutchez/tomcat nutchuser
nutchez 使用者目錄 /home/nutchuser/nutchez/ nutchuser
nutchez 索引資料庫 /home/nutchuser/nutchez/search/ 由nutch完成crawl後產生
  • 修改 /opt/nutchez/nutch/conf/ 的 hadoop-site.xml
    <configuration>
      <property>
        <name>fs.default.name</name>
        <value>hdfs://secuse.nchc.org.tw:9000</value>
      </property>
      <property>
        <name>mapred.job.tracker</name>
        <value>secuse.nchc.org.tw:9001</value>
      </property>
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/var/nutchez/nutch-nutchuser</value>
      </property>
    </configuration>
    
  • 改tomcat port => /opt/nutchez/tomcat/conf/ 的 server.xml
   <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" URIEncoding="UTF-8"
               useBodyEncodingForURI="true" />
  • 最後的搜尋結果 => /opt/nutchez/tomcat/webapps/ROOT/WEB-INF/classes/ 的 nutch-site.xml
<configuration>
    <property>
        <name>searcher.dir</name>
        <value>/home/nutchuser/nutchez/search</value>
    </property>
</configuration>
  • /opt/nutchez/nutch/bin/nutch 執行檔有改
NUTCH_HOME=/opt/nutchez/nutch
NUTCH_CONF_DIR=/opt/nutchez/nutch/conf
NUTCH_LOG_DIR=/var/nutchez/logs
  • 用 改版的 nutchez 的 hadoop 還是要format 與 start-all.sh