7 | | * 本實作基於 Ubuntu 8.04 LTS 版本,關於 Ubuntu 8.04 的安裝程序,請參考"[wiki:jazz/Hardy Ubuntu 8.04 Server 版安裝步驟]"。 |
8 | | * 本課程實作之電腦教室所提供的作業環境是 Ubuntu 8.04 Server 版加裝 xubuntu 桌面的環境。 |
9 | | * 本頁面的部分指令,是針對不熟悉 Linux 文字編輯器的使用者所設計的'懶人'設定法,您也可以使用習慣使用的文字編輯器(如:vi,nano,joe等)進行修改。 |
10 | | * 這個頁面,黑底白字的部分為指令,請自行剪貼提示符號 "$"(代表一般使用者) 或 "#"(代表最高權限 root 管理者) 之後的指令。 |
11 | | |
12 | | * 登入資訊 |
13 | | |
14 | | || 使用者 || Hadooper|| |
15 | | || 群組 || Hadoop || |
16 | | || 密碼 || ****** || |
17 | | |
18 | | * Hadooper 擁有sudoer 的權限 |
19 | | |
20 | | ----- |
21 | | * 寫給我看的: |
22 | | |
23 | | 每台電腦都要增加此使用者 |
24 | | {{{ |
25 | | $ sudo addgroup hadoop |
26 | | $ sudo adduser --ingroup hadoop hadooper |
27 | | }}} |
28 | | 測試不設定.bashrc 的java home有無關係 |
29 | | ------- |
30 | | |
| 7 | * 您手邊有兩台電腦,假設剛剛操作的電腦為node1,另一台則為node2。稍後的環境我們假設node1 為server, node2 為slaves。 |
| 8 | * 這個實做會架設運作在叢集環境上的Hadoop,因此若是你的電腦還存在著之前的實做一的環境,請先作step 0,以移除掉之前的設定。 |
| 9 | |
| 10 | |
| 11 | === 清除所有在實做一作過的環境 === |
| 12 | |
| 13 | * node1 (有操作過實做一的電腦)執行 |
| 14 | {{{ |
| 15 | ~$ killall java |
| 16 | ~$ rm -rf /tmp/hadoop-hadooper* |
| 17 | ~$ rm -rf /opt/hadoop/logs/*/tmp/hadoop-hadooper |
| 18 | ~$ rm -rf /opt/hadoop |
| 19 | ~$ rm -rf ~/.ssh |
| 20 | }}} |
| 21 | |
| 22 | === 設定hostname === |
| 23 | |
73 | | /opt/hadoop$ cat >> conf/hadoop-env.sh << EOF |
74 | | }}} |
75 | | |
76 | | 貼上以下資訊 |
77 | | |
78 | | {{{ |
79 | | #!sh |
80 | | export JAVA_HOME=/usr/lib/jvm/java-6-sun |
81 | | export HADOOP_HOME=/opt/hadoop |
82 | | export HADOOP_CONF_DIR=/opt/hadoop/conf |
83 | | EOF |
| 69 | /opt/hadoop$ gedit conf/hadoop-env.sh |
| 70 | }}} |
| 71 | |
| 72 | 編輯以下資訊 |
| 73 | |
| 74 | {{{ |
| 75 | #!diff |
| 76 | --- hadoop-0.18.3/conf/hadoop-env.sh.org |
| 77 | +++ hadoop-0.18.3/conf/hadoop-env.sh |
| 78 | @@ -6,7 +6,10 @@ |
| 79 | # remote nodes. |
| 80 | # The java implementation to use. Required. |
| 81 | -# export JAVA_HOME=/usr/lib/j2sdk1.5-sun |
| 82 | +export JAVA_HOME=/usr/lib/jvm/java-6-sun |
| 83 | +export HADOOP_HOME=/opt/hadoop |
| 84 | +export HADOOP_CONF_DIR=/opt/hadoop/conf |
| 85 | +export HADOOP_LOG_DIR=/home/hadooper/logs |
| 86 | +export HADOOP_PID_DIR=/home/hadooper/pids |
| 87 | |
| 88 | # Extra Java CLASSPATH elements. Optional. |
| 89 | # export HADOOP_CLASSPATH= |
91 | | /opt/hadoop# cat > conf/hadoop-site.xml << EOF |
92 | | }}} |
93 | | |
94 | | 貼上以下內容 |
95 | | |
96 | | {{{ |
97 | | #!sh |
98 | | <configuration> |
99 | | <property> |
100 | | <name>fs.default.name</name> |
101 | | <value>hdfs://localhost:9000</value> |
102 | | <description> |
103 | | The name of the default file system. Either the literal string |
104 | | "local" or a host:port for NDFS. |
105 | | </description> |
106 | | </property> |
107 | | <property> |
108 | | <name>mapred.job.tracker</name> |
109 | | <value>localhost:9001</value> |
110 | | <description> |
111 | | The host and port that the MapReduce job tracker runs at. If |
112 | | "local", then jobs are run in-process as a single map and |
113 | | reduce task. |
114 | | </description> |
115 | | </property> |
116 | | </configuration> |
117 | | EOF |
118 | | }}} |
119 | | |
120 | | == step 6. 格式化HDFS == |
121 | | |
122 | | * 以上我們已經設定好 Hadoop 單機測試的環境,接著讓我們來啟動 Hadoop 相關服務,格式化 namenode, secondarynamenode, tasktracker |
| 97 | /opt/hadoop# gedit conf/hadoop-site.xml |
| 98 | }}} |
| 99 | |
| 100 | 編輯以下內容 |
| 101 | |
| 102 | {{{ |
| 103 | #!diff |
| 104 | --- hadoop-0.18.3/conf/hadoop-site.xml.org |
| 105 | +++ hadoop-0.18.3/conf/hadoop-site.xml |
| 106 | @@ -4,5 +4,31 @@ |
| 107 | <!-- Put site-specific property overrides in this file. --> |
| 108 | <configuration> |
| 109 | - |
| 110 | + <property> |
| 111 | + <name>fs.default.name</name> |
| 112 | + <value>hdfs://node1_ip:9000/</value> |
| 113 | + <description> |
| 114 | + The name of the default file system. Either the literal string |
| 115 | + "local" or a host:port for NDFS. |
| 116 | + </description> |
| 117 | + </property> |
| 118 | + <property> |
| 119 | + <name>mapred.job.tracker</name> |
| 120 | + <value>hdfs://node1_ip:9001</value> |
| 121 | + <description> |
| 122 | + The host and port that the MapReduce job tracker runs at. If |
| 123 | + "local", then jobs are run in-process as a single map and |
| 124 | + reduce task. |
| 125 | + </description> |
| 126 | + </property> |
| 127 | + <property> |
| 128 | + <name>hadoop.tmp.dir</name> |
| 129 | + <value>/tmp/hadoop/hadoop-${user.name}</value> |
| 130 | + <description>A base for other temporary directories.</description> |
| 131 | + </property> |
| 132 | </configuration> |
| 133 | }}} |
| 134 | |
| 135 | == step 6. 設定masters及slaves == |
| 136 | |
| 137 | * 接著我們要編輯哪個主機當namenode, 若有其他主機則為datanodes |
| 138 | * 編輯 conf/slaves |
| 139 | {{{ |
| 140 | /opt/hadoop$ gedit conf/hadoop-site.xml |
| 141 | }}} |
| 142 | 內容 |
| 143 | {{{ |
| 144 | #!diff |
| 145 | --- hadoop/conf/slaves.org |
| 146 | +++ hadoop/conf/slaves |
| 147 | @@ -1,2 +1,5 @@ |
| 148 | -localhost |
| 149 | +node1 |
| 150 | +node2 |
| 151 | }}} |
| 152 | |
| 153 | == step 7. Hadoop_Home內的資料複製到其他主機上 == |
| 154 | |
| 155 | * 在node1上對遠端node2作開資料夾/opt/hadoop及權限設定 |
| 156 | {{{ |
| 157 | /opt/hadoop$ ssh node2_ip "sudo mkdir /opt/hadoop" |
| 158 | /opt/hadoop$ ssh node2_ip "sudo chown -R hadoop:hadooper /opt/hadoop" |
| 159 | }}} |
| 160 | |
| 161 | * 複製node1的hadoop資料夾到node2上 |
| 162 | {{{ |
| 163 | /opt/hadoop$ scp -r /opt/hadoop/* node2_ip:/opt/hadoop/ |
| 164 | }}} |
| 165 | |
| 166 | == step 8. 格式化HDFS == |
| 167 | |
| 168 | |
| 169 | * 以上我們已經安裝及設定好 Hadoop 的叢集環境,接著讓我們來啟動 Hadoop ,首先還是先格式化hdfs |
| 170 | |
| 171 | * 在node1 上操作 |
| 172 | |
148 | | == step 7. 啟動Hadoop == |
149 | | |
150 | | * 接著用 start-all.sh 來啟動所有服務,包含 namenode, datanode, |
151 | | {{{ |
152 | | /opt/hadoop$ bin/start-all.sh |
153 | | }}} |
154 | | 執行畫面如: |
155 | | {{{ |
156 | | starting namenode, logging to /opt/hadoop/logs/hadoop-hadooper-namenode-vPro.out |
157 | | localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-hadooper-datanode-vPro.out |
158 | | localhost: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadooper-secondarynamenode-vPro.out |
159 | | starting jobtracker, logging to /opt/hadoop/logs/hadoop-hadooper-jobtracker-vPro.out |
160 | | }}} |
161 | | |
162 | | == step 8. 完成!檢查運作狀態 == |
| 198 | == step 9. 啟動Hadoop == |
| 199 | |
| 200 | * 在node1上,執行下面的命令啟動HDFS: |
| 201 | $ bin/start-dfs.sh |
| 202 | |
| 203 | * bin/start-dfs.sh腳本會參照NameNode上${HADOOP_CONF_DIR}/slaves文件的內容,在所有列出的slave上啟動DataNode守護進程。 |
| 204 | |
| 205 | * 在node2上,執行下面的命令啟動Map/Reduce: |
| 206 | {{{ |
| 207 | $ ssh node2_ip "bin/start-mapred.sh" |
| 208 | }}} |
| 209 | |
| 210 | * bin/start-mapred.sh腳本會參照JobTracker上${HADOOP_CONF_DIR}/slaves文件的內容,在所有列出的slave上啟動TaskTracker。 |
| 211 | |
| 212 | == step 8. 檢查運作狀態 == |