wiki:jazz/09-07-02

Version 3 (modified by jazz, 15 years ago) (diff)

--

2009-07-02

  • [狀況] Namenode 一直起不來
  • [解法] 從 namesecondary 拷貝 edits 到 name,然後重新啟動 Namenode
    jazz@hadoop:~$ sudo cp /var/lib/hadoop/cache/hadoop/dfs/namesecondary/current/edits /var/lib/hadoop/cache/hadoop/dfs/name/current/edits
    jazz@hadoop:~$ sudo /etc/init.d/hadoop-namenode restart
    
  • [錯誤追蹤] 很可能是因為系統碟沒有空間,但是我主要是追 /var/log/hadoop/hadoop-hadoop-namenode-hadoop.out 看到錯誤訊息是 hadoop045 帳號在跑東西
    ERROR dfs.LeaseManager: /user/hadoop045/trend/output/countries/_logs/history
    
  • 接著在 /var/lib/hadoop/cache/hadoop/dfs/namesecondary/current/edits 也看到一堆 hadoop045 的檔案,因此懷疑 edits 檔是紀錄哪些讀寫是未完成。既然 Namenode 都起不來了,那些沒跑完的也跑不完,就乾脆把 edits 歸零了。原始 namenode 錯誤訊息如下:
    09/07/02 09:18:06 INFO dfs.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = hadoop/140.110.X.X
    STARTUP_MSG:   args = []
    STARTUP_MSG:   version = 0.18.3-4cloudera0.3.0
    STARTUP_MSG:   build =  -r ; compiled by 'root' on Fri May 29 23:29:49 UTC 2009
    ************************************************************/
    09/07/02 09:18:06 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=8020
    09/07/02 09:18:06 INFO dfs.NameNode: Namenode up at: hadoop.nchc.org.tw/140.110.X.X:8020
    09/07/02 09:18:06 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
    09/07/02 09:18:06 INFO dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
    09/07/02 09:18:06 INFO fs.FSNamesystem: fsOwner=hadoop,hadoop
    09/07/02 09:18:06 INFO fs.FSNamesystem: supergroup=supergroup
    09/07/02 09:18:06 INFO fs.FSNamesystem: isPermissionEnabled=true
    09/07/02 09:18:06 INFO dfs.FSNamesystemMetrics: Initializing FSNamesystemMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
    09/07/02 09:18:06 INFO fs.FSNamesystem: Registered FSNamesystemStatusMBean
    09/07/02 09:18:06 INFO dfs.Storage: Number of files = 3723
    09/07/02 09:18:06 INFO dfs.Storage: Number of files under construction = 6
    09/07/02 09:18:06 INFO dfs.Storage: Image file of size 776834 loaded in 0 seconds.
    09/07/02 09:18:06 ERROR dfs.LeaseManager: /user/hadoop045/trend/output/countries/_logs/history/hadoop.nchc.org.tw_1244039499361_job_200906031031_0393_conf.xml not found in lease.paths (=[/user/hadoop045/trend/output/bayesian_input/_logs/history/hadoop.nchc.org.tw_1244039499361_job_200906031031_0391_hadoop045_word+count, /user/hadoop045/trend/output/countries/_logs/history/hadoop.nchc.org.tw_1244039499361_job_200906031031_0393_hadoop045_linecountC])
    09/07/02 09:18:06 ERROR fs.FSNamesystem: FSNamesystem initialization failed.
    java.io.EOFException
    	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
    	at org.apache.hadoop.io.UTF8.readFields(UTF8.java:103)
    	at org.apache.hadoop.dfs.FSImage.readString(FSImage.java:1368)
    	at org.apache.hadoop.dfs.FSEditLog.readLong(FSEditLog.java:1119)
    	at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:449)
    	at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
    	at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
    	at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
    	at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
    	at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
    	at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
    	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
    	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
    	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
    	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
    	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
    09/07/02 09:18:06 INFO ipc.Server: Stopping server on 8020
    09/07/02 09:18:06 ERROR dfs.NameNode: java.io.EOFException
    	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
    	at org.apache.hadoop.io.UTF8.readFields(UTF8.java:103)
    	at org.apache.hadoop.dfs.FSImage.readString(FSImage.java:1368)
    	at org.apache.hadoop.dfs.FSEditLog.readLong(FSEditLog.java:1119)
    	at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:449)
    	at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
    	at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
    	at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
    	at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
    	at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
    	at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
    	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
    	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
    	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
    	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
    	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
    
    09/07/02 09:18:06 INFO dfs.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop/140.110.X.X
    ************************************************************/