wiki:jazz/demo.crawlzilla.info

demo.crawlzilla.info

2013-04-03

  • 最近經常遇到 demo.crawlzilla.info 連線許久的狀況,可是從 Munin 又找不出系統主因
  • 根據 top 的資訊:
    top - 23:16:21 up 15 days,  7:45,  2 users,  load average: 1.25, 1.37, 1.35
    Tasks:   5 total,   0 running,   5 sleeping,   0 stopped,   0 zombie
    Cpu0  :  2.3%us,  0.7%sy,  0.0%ni, 97.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu1  :  1.6%us,  0.0%sy,  0.0%ni, 96.4%id,  0.0%wa,  0.0%hi,  2.0%si,  0.0%st
    Cpu2  : 43.3%us,  0.3%sy,  0.0%ni, 56.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu3  : 57.2%us,  0.7%sy,  0.0%ni, 42.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   8028708k total,  4986732k used,  3041976k free,   189008k buffers
    Swap: 19803128k total,        0k used, 19803128k free,  3185932k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                   
    15371 crawler   20   0 2472m 778m  14m S  102  9.9 516:36.60 java                                                                       
    15474 crawler   20   0 1375m 130m  11m S    7  1.7 135:58.46 java                                                                       
    15628 crawler   20   0 1366m  78m  11m S    0  1.0   1:16.75 java                                                                       
    15554 crawler   20   0 1345m  98m  11m S    0  1.3   1:23.92 java                                                                       
    15408 crawler   20   0 1308m  99m  11m S    0  1.3   0:40.57 java                                                                       
    
  • 發現記憶體用較兇的兩個是 Tomcat 跟 JobTracker
    jazz@CrawlzillaServ:~$ sudo jps
    15371 Bootstrap
    15554 TaskTracker
    15628 DataNode
    15408 NameNode
    1529 Jps
    15474 JobTracker
    
  • 而且 Tomcat 似乎只能用到一個 CPU core
  • [問題] 怎麼讓 Tomcat 使用到多個核心??
  • [參考] http://www.mulesoft.com/tomcat-performance
  • [參考] http://www.tomcatexpert.com/blog/2011/11/22/performance-tuning-jvm-running-tomcat
  • [參考] What is the best practice for tomcat performance tuning in Amazon ubuntu instance?
  • 此外,也發現 Apache2 的 Keep-Alive 時間很長,因此需要對 Apache2 的 mod_proxy 做一些調整
  • 同樣地,我們也降低 Tomcat 自身的 Keep-Alive 時間,縮短 Timeout,讓資源可以快速釋放出來。

2013-05-11

  • 也許因為註冊人數變多了,開始會遇到 Tomcat 記憶體不足的現象。因此做了點小的調整:
  • 因為 Hadoop 的資源使用量並不大,因此把 hadoop-env.sh 的 HADOOP_HEAPSIZE 調降到 256M - /opt/crawlzilla/nutch/conf/hadoop-env.sh
    # The maximum amount of heap to use, in MB. Default is 1000.
    export HADOOP_HEAPSIZE=256
    
  • 其次,修改 mapred-site.xml 的預設 mapper 個數與 reducer 個數。 - /opt/crawlzilla/nutch/conf/mapred-site.xml
       <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>6</value>
        <description>The maximum number of map tasks that will be run
        simultaneously by a task tracker.
        </description>
      </property>
      <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>2</value>
        <description>
        </description>
      </property>
    
  • 增加 Tomcat 的 HEAPSIZE 到 4GB - /opt/crawlzilla/tomcat/bin/catalina.sh
  • 增加 Tomcat 的 MaxPermSize 到 256MB - 因為遇到 java.lang.OutOfMemoryError?: PermGen? space 的錯誤訊息
    case "`uname`" in
    CYGWIN*) cygwin=true;;
    OS400*) os400=true;;
    Darwin*) darwin=true;;
    esac
    
    ## Add by Jazz - 2013-05-11
    export CATALINA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=256m"
    
    # resolve links - $0 may be a softlink
    PRG="$0"
    
  • 將 Tomcat 的 Keepalive 降低 - /opt/crawlzilla/tomcat/conf/server.xml
    ## 編輯 /opt/crawlzilla/tomcat/conf/server.xml 
      <Service name="Catalina">
    
        <Connector port="8080" protocol="HTTP/1.1"
                   keepAliveTimeout="100"
                   connectionTimeout="100"
    
  • 減少同時 Apache process 的個數 - MaxClients? 改成 2 - /etc/apache2/apache2.conf
    <IfModule mpm_event_module>
        StartServers          1
        MinSpareThreads       1
        MaxSpareThreads       1
        ThreadLimit           1
        ThreadsPerChild       1
        MaxClients            2
        MaxRequestsPerChild  20
    </IfModule>
    
  • 縮短 Timeout 時間
    #
    # Timeout: The number of seconds before receives and sends time out.
    #
    Timeout 10
    
    #
    # KeepAlive: Whether or not to allow persistent connections (more than
    # one request per connection). Set to "Off" to deactivate.
    #
    KeepAlive On
    
    #
    # MaxKeepAliveRequests: The maximum number of requests to allow
    # during a persistent connection. Set to 0 to allow an unlimited amount.
    # We recommend you leave this number high, for maximum performance.
    #
    MaxKeepAliveRequests 50
    
    #
    # KeepAliveTimeout: Number of seconds to wait for the next request from the
    # same client on the same connection.
    #
    KeepAliveTimeout 2
    

2013-12-12

  • <狀況> 在核可新用戶時,發生 Too many open files 的情形。
    HTTP Status 500 -
    
    type Exception report
    
    message
    
    description The server encountered an internal error () that prevented it from fulfilling this request.
    
    exception
    
    java.io.IOException: Cannot run program "mv": java.io.IOException: error=24, Too many open files
    	java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    	java.lang.Runtime.exec(Runtime.java:593)
    	java.lang.Runtime.exec(Runtime.java:431)
    	java.lang.Runtime.exec(Runtime.java:328)
    	org.nchc.crawlzilla.bean.RegisterBean.acceptUser(RegisterBean.java:130)
    	org.nchc.crawlzilla.servlet.memberManager.doPost(memberManager.java:69)
    	javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
    	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    
    root cause
    
    java.io.IOException: java.io.IOException: error=24, Too many open files
    	java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    	java.lang.ProcessImpl.start(ProcessImpl.java:65)
    	java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    	java.lang.Runtime.exec(Runtime.java:593)
    	java.lang.Runtime.exec(Runtime.java:431)
    	java.lang.Runtime.exec(Runtime.java:328)
    	org.nchc.crawlzilla.bean.RegisterBean.acceptUser(RegisterBean.java:130)
    	org.nchc.crawlzilla.servlet.memberManager.doPost(memberManager.java:69)
    	javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
    	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    
    note The full stack trace of the root cause is available in the Apache Tomcat/6.0.32 logs.
    
  • <解法> 1. 增加 ulimit 開檔個數
    • /etc/security/limits.conf

      old new  
      4949#ftp             -       chroot          /ftp
      5050#@student        -       maxlogins       4
      5151
       52*      soft        nofile        4096
       53*      hard        nofile        743964
       54
      5255# End of file
  • <解法> 2. 在 /etc/profile 強制增加 ulimit 開檔個數,然後重新啟動 Crawlzilla
    • /etc/profile

      old new  
      2626fi
      2727
      2828umask 022
       29
       30## increase limits to open files
       31ulimit -n 743964
Last modified 11 years ago Last modified on Dec 12, 2013, 11:42:06 AM