= demo.crawlzilla.info =
== 2013-04-03 ==
* 最近經常遇到 demo.crawlzilla.info 連線許久的狀況,可是從 Munin 又找不出系統主因
* 根據 top 的資訊:
{{{
top - 23:16:21 up 15 days, 7:45, 2 users, load average: 1.25, 1.37, 1.35
Tasks: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie
Cpu0 : 2.3%us, 0.7%sy, 0.0%ni, 97.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 1.6%us, 0.0%sy, 0.0%ni, 96.4%id, 0.0%wa, 0.0%hi, 2.0%si, 0.0%st
Cpu2 : 43.3%us, 0.3%sy, 0.0%ni, 56.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 57.2%us, 0.7%sy, 0.0%ni, 42.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8028708k total, 4986732k used, 3041976k free, 189008k buffers
Swap: 19803128k total, 0k used, 19803128k free, 3185932k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15371 crawler 20 0 2472m 778m 14m S 102 9.9 516:36.60 java
15474 crawler 20 0 1375m 130m 11m S 7 1.7 135:58.46 java
15628 crawler 20 0 1366m 78m 11m S 0 1.0 1:16.75 java
15554 crawler 20 0 1345m 98m 11m S 0 1.3 1:23.92 java
15408 crawler 20 0 1308m 99m 11m S 0 1.3 0:40.57 java
}}}
* 發現記憶體用較兇的兩個是 Tomcat 跟 !JobTracker
{{{
jazz@CrawlzillaServ:~$ sudo jps
15371 Bootstrap
15554 TaskTracker
15628 DataNode
15408 NameNode
1529 Jps
15474 JobTracker
}}}
* 而且 Tomcat 似乎只能用到一個 CPU core
* [問題] 怎麼讓 Tomcat 使用到多個核心??
* [參考] http://www.mulesoft.com/tomcat-performance
* [參考] http://www.tomcatexpert.com/blog/2011/11/22/performance-tuning-jvm-running-tomcat
* [參考] [http://stackoverflow.com/questions/13631994/what-is-the-best-practice-for-tomcat-performance-tuning-in-amazon-ubuntu-instanc What is the best practice for tomcat performance tuning in Amazon ubuntu instance?]
* 此外,也發現 Apache2 的 '''Keep-Alive 時間很長''',因此需要對 Apache2 的 mod_proxy 做一些調整
* [參考] http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass
{{{
ProxyPass / http://140.110.X.X:8080/ connectiontimeout=2 timeout=5 ttl=5
ProxyPassReverse / http://140.110.X.X:8080/
SetEnv proxy-nokeepalive 1
}}}
* 同樣地,我們也降低 Tomcat 自身的 Keep-Alive 時間,縮短 Timeout,讓資源可以快速釋放出來。
* [參考] [http://stackoverflow.com/questions/1542502/java-server-cpu-usage-at-100-after-two-days-continous-running-with-about-110-us Java server cpu usage at 100% after two days continous running with about 110 users]
* [參考] [http://www.virtualzone.de/2010/11/tomcat-apache-high-cpu-usage.html Tomcat & Apache: High CPU Usage]
{{{
## 編輯 /opt/crawlzilla/tomcat/conf/server.xml
}}}
== 2013-05-11 ==
* 也許因為註冊人數變多了,開始會遇到 Tomcat 記憶體不足的現象。因此做了點小的調整:
* 因為 Hadoop 的資源使用量並不大,因此把 hadoop-env.sh 的 HADOOP_HEAPSIZE 調降到 256M - /opt/crawlzilla/nutch/conf/hadoop-env.sh
{{{
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=256
}}}
* 其次,修改 mapred-site.xml 的預設 mapper 個數與 reducer 個數。 - /opt/crawlzilla/nutch/conf/mapred-site.xml
{{{
mapred.tasktracker.map.tasks.maximum
6
The maximum number of map tasks that will be run
simultaneously by a task tracker.
mapred.tasktracker.reduce.tasks.maximum
2
}}}
* 增加 Tomcat 的 HEAPSIZE 到 4GB - /opt/crawlzilla/tomcat/bin/catalina.sh
{{{
case "`uname`" in
CYGWIN*) cygwin=true;;
OS400*) os400=true;;
Darwin*) darwin=true;;
esac
## Add by Jazz - 2013-05-11
export CATALINA_OPTS="-Xms1024m -Xmx4096m"
# resolve links - $0 may be a softlink
PRG="$0"
}}}