wiki:jazz/DRBL_Hadoop

Version 4 (modified by jazz, 16 years ago) (diff)

--

Deploy Hadoop to PC Classroom using DRBL

  • java is required for Hadoop, so you need to install java runtime or jdk first.
    ~$ echo "deb http://free.nchc.org.tw/debian/ etch non-free" > /tmp/etch-non-free.list
    ~$ sudo mv /tmp/etch-non-free.list /etc/apt/sources.list.d/.
    ~$ sudo apt-get update
    ~$ sudo apt-get install sun-java5-jdk
    
  • download Hadoop 0.18.2
    ~$ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/core/hadoop-0.18.2/hadoop-0.18.2.tar.gz
    ~$ tar zxvf hadoop-0.18.2.tar.gz
    
  • setup JAVA_HOME environment variable
    ~$ echo "export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun" >> ~/.bash_profile
    ~$ source ~/.bash_profile
    
  • edit hadoop-0.18.2/conf/hadoop-env.sh
    • hadoop-0.18.2/conf/hadoop-env.sh

      old new  
      66# remote nodes.
      77
      88# The java implementation to use.  Required.
      9 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
       9export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun
       10export HADOOP_HOME=/home/jazz/hadoop-0.18.2
       11export HADOOP_CONF_DIR=$HADOOP_HOME/conf
      1012
      1113# Extra Java CLASSPATH elements.  Optional.
      1214# export HADOOP_CLASSPATH=
  • here is current DRBL setup
              NIC    NIC IP                    Clients
    +------------------------------+
    |         DRBL SERVER          |
    |                              |
    |    +-- [eth0] X.X.X.X        +- to WAN
    |                              |
    |    +-- [eth1] 192.168.61.254 +- to clients group 1 [ 10 clients, their IP
    |                              |             from 192.168.61.1 - 192.168.61.10]
    |    +-- [eth2] 192.168.62.254 +- to clients group 2 [ 11 clients, their IP
    |                              |             from 192.168.62.1 - 192.168.62.11]
    |    +-- [eth3] 192.168.63.254 +- to clients group 3 [ 10 clients, their IP
    |                              |             from 192.168.63.1 - 192.168.63.10]
    |    +-- [eth4] 192.168.64.254 +- to clients group 4 [ 10 clients, their IP
    |                              |             from 192.168.64.1 - 192.168.64.10]
    +------------------------------+
    
  • Hadoop will use ssh connections for internal connection, thus we have to do SSH key exchange.
    ~$ ssh-keygen
    ~$ cp .ssh/id_rsa.pub .ssh/authorized_keys
    ~$ sudo apt-get install dsh
    ~$ mkdir -p .dsh
    ~$ nmap -v -sP 192.168.61-63.1-11 | grep '(.*) .* up' | awk '{ print $3 }' | sort -n | sed 's#(##' | sed 's#)##' > .dsh/machines.list"192.168.63.$i" >> .dsh/machines.list; echo "192.168.64.$i" >> .dsh/machines.list; done
    
  • edit