close
Warning:
Can't synchronize with repository "(default)" (Unsupported version control system "svn": /usr/lib/python2.7/dist-packages/libsvn/_fs.so: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.
Deploy Hadoop to PC Classroom using DRBL
- java is required for Hadoop, so you need to install java runtime or jdk first.
~$ echo "deb http://free.nchc.org.tw/debian/ etch non-free" > /tmp/etch-non-free.list
~$ sudo mv /tmp/etch-non-free.list /etc/apt/sources.list.d/.
~$ sudo apt-get update
~$ sudo apt-get install sun-java5-jdk
- download Hadoop 0.18.2
~$ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/core/hadoop-0.18.2/hadoop-0.18.2.tar.gz
~$ tar zxvf hadoop-0.18.2.tar.gz
- setup JAVA_HOME environment variable
~$ echo "export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun" >> ~/.bash_profile
~$ source ~/.bash_profile
- edit hadoop-0.18.2/conf/hadoop-env.sh
-
old
|
new
|
|
6 | 6 | # remote nodes. |
7 | 7 | |
8 | 8 | # The java implementation to use. Required. |
9 | | # export JAVA_HOME=/usr/lib/j2sdk1.5-sun |
| 9 | export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun |
| 10 | export HADOOP_HOME=/home/jazz/hadoop-0.18.2 |
| 11 | export HADOOP_CONF_DIR=$HADOOP_HOME/conf |
10 | 12 | |
11 | 13 | # Extra Java CLASSPATH elements. Optional. |
12 | 14 | # export HADOOP_CLASSPATH= |
- here is current DRBL setup
你的DRBL環境配置:
******************************************************
NIC NIC IP Clients
+------------------------------+
| DRBL SERVER |
| |
| +-- [eth0] 140.110.25.101 +- to WAN
| |
| +-- [eth1] 192.168.61.254 +- to clients group 1 [ 16 clients, their IP
| | from 192.168.61.1 - 192.168.61.16]
+------------------------------+
******************************************************
Total clients: 16
******************************************************
- Hadoop will use ssh connections for internal connection, thus we have to do SSH key exchange.
~$ ssh-keygen
~$ cp .ssh/id_rsa.pub .ssh/authorized_keys
~$ sudo apt-get install dsh
~$ mkdir -p .dsh
~$ for ((i=1;i<=16;i++)); do echo "192.168.61.$i" >> .dsh/machines.list; done
- edit hadoop-0.18.2/conf/hadoop-site.xml
-
old
|
new
|
|
4 | 4 | <!-- Put site-specific property overrides in this file. --> |
5 | 5 | |
6 | 6 | <configuration> |
7 | | |
| 7 | <property> |
| 8 | <name>fs.default.name</name> |
| 9 | <value>hdfs://192.168.61.254:9000/</value> |
| 10 | <description> |
| 11 | The name of the default file system. Either the literal string |
| 12 | "local" or a host:port for NDFS. |
| 13 | </description> |
| 14 | </property> |
| 15 | <property> |
| 16 | <name>mapred.job.tracker</name> |
| 17 | <value>192.168.61.254:9001</value> |
| 18 | <description> |
| 19 | The host and port that the MapReduce job tracker runs at. If |
| 20 | "local", then jobs are run in-process as a single map and |
| 21 | reduce task. |
| 22 | </description> |
| 23 | </property> |
| 24 | <property> |
| 25 | <name>dfs.data.dir</name> |
| 26 | <value>/hadoop/dfs/data</value> |
| 27 | <description>Determines where on the local filesystem an DFS data node |
| 28 | should store its blocks. If this is a comma-delimited |
| 29 | list of directories, then data will be stored in all named |
| 30 | directories, typically on different devices. |
| 31 | Directories that do not exist are ignored. |
| 32 | </description> |
| 33 | </property> |
8 | 34 | </configuration> |
- edit /etc/rc.local for DRBL Server as Hadoop namenode
-
old
|
new
|
|
11 | 11 | # |
12 | 12 | # By default this script does nothing. |
13 | 13 | |
| 14 | echo 3 > /proc/sys/vm/drop_caches |
| 15 | /home/jazz/hadoop-0.18.2/bin/hadoop namenode -format |
| 16 | /home/jazz/hadoop-0.18.2/bin/hadoop-daemon.sh start namenode |
14 | 17 | exit 0 |
- edit rc.local for DRBL client as datanode
~$ cat > rc.local << EOF
#!/bin/sh -e
echo 3 > /proc/sys/vm/drop_caches
/home/jazz/hadoop-0.18.2/bin/hadoop-daemon.sh start datanode
exit 0
EOF
~$ chmod a+x rc.local
~$ sudo /opt/drbl/sbin/drbl-cp-host rc.local /etc/
# 由於 DRBL Client 預設不會跑 rc.local,因此要手動加
# 另外,為了避免執行 rc.local 時,硬碟還沒完成掛載動作,因此刻意把 rc.local 挪到最後才執行
~$ sudo /opt/drbl/bin/drbl-doit update-rc.d rc.local defaults 99
- shutdown DRBL clients
- reboot DRBL server
- use "Wake on LAN" for DRBL clients
- browse http://192.168.61.254:50070 for DFS status
Download in other formats: