= 2009-07-02 = * [狀況] Namenode 一直起不來 * [解法] 從 namesecondary 拷貝 edits 到 name,然後重新啟動 Namenode {{{ jazz@hadoop:~$ sudo cp /var/lib/hadoop/cache/hadoop/dfs/namesecondary/current/edits /var/lib/hadoop/cache/hadoop/dfs/name/current/edits jazz@hadoop:~$ sudo /etc/init.d/hadoop-namenode restart }}} * [錯誤追蹤] 很可能是因為系統碟沒有空間,但是我主要是追 /var/log/hadoop/hadoop-hadoop-namenode-hadoop.out 看到錯誤訊息是 hadoop045 帳號在跑東西 {{{ ERROR dfs.LeaseManager: /user/hadoop045/trend/output/countries/_logs/history }}}  * 接著在 /var/lib/hadoop/cache/hadoop/dfs/namesecondary/current/edits 也看到一堆 hadoop045 的檔案,因此懷疑 edits 檔是紀錄哪些讀寫是未完成。既然 Namenode 都起不來了,那些沒跑完的也跑不完,就乾脆把 edits 歸零了。原始 namenode 錯誤訊息如下: {{{ 09/07/02 09:18:06 INFO dfs.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop/140.110.X.X STARTUP_MSG: args = [] STARTUP_MSG: version = 0.18.3-4cloudera0.3.0 STARTUP_MSG: build = -r ; compiled by 'root' on Fri May 29 23:29:49 UTC 2009 ************************************************************/ 09/07/02 09:18:06 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=8020 09/07/02 09:18:06 INFO dfs.NameNode: Namenode up at: hadoop.nchc.org.tw/140.110.X.X:8020 09/07/02 09:18:06 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 09/07/02 09:18:06 INFO dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 09/07/02 09:18:06 INFO fs.FSNamesystem: fsOwner=hadoop,hadoop 09/07/02 09:18:06 INFO fs.FSNamesystem: supergroup=supergroup 09/07/02 09:18:06 INFO fs.FSNamesystem: isPermissionEnabled=true 09/07/02 09:18:06 INFO dfs.FSNamesystemMetrics: Initializing FSNamesystemMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 09/07/02 09:18:06 INFO fs.FSNamesystem: Registered FSNamesystemStatusMBean 09/07/02 09:18:06 INFO dfs.Storage: Number of files = 3723 09/07/02 09:18:06 INFO dfs.Storage: Number of files under construction = 6 09/07/02 09:18:06 INFO dfs.Storage: Image file of size 776834 loaded in 0 seconds. 09/07/02 09:18:06 ERROR dfs.LeaseManager: /user/hadoop045/trend/output/countries/_logs/history/hadoop.nchc.org.tw_1244039499361_job_200906031031_0393_conf.xml not found in lease.paths (=[/user/hadoop045/trend/output/bayesian_input/_logs/history/hadoop.nchc.org.tw_1244039499361_job_200906031031_0391_hadoop045_word+count, /user/hadoop045/trend/output/countries/_logs/history/hadoop.nchc.org.tw_1244039499361_job_200906031031_0393_hadoop045_linecountC]) 09/07/02 09:18:06 ERROR fs.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:103) at org.apache.hadoop.dfs.FSImage.readString(FSImage.java:1368) at org.apache.hadoop.dfs.FSEditLog.readLong(FSEditLog.java:1119) at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:449) at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846) at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675) at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294) at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:273) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148) at org.apache.hadoop.dfs.NameNode.(NameNode.java:193) at org.apache.hadoop.dfs.NameNode.(NameNode.java:179) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839) 09/07/02 09:18:06 INFO ipc.Server: Stopping server on 8020 09/07/02 09:18:06 ERROR dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:103) at org.apache.hadoop.dfs.FSImage.readString(FSImage.java:1368) at org.apache.hadoop.dfs.FSEditLog.readLong(FSEditLog.java:1119) at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:449) at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846) at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675) at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294) at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:273) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148) at org.apache.hadoop.dfs.NameNode.(NameNode.java:193) at org.apache.hadoop.dfs.NameNode.(NameNode.java:179) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839) 09/07/02 09:18:06 INFO dfs.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop/140.110.X.X ************************************************************/ }}}