wiki:jazz/mpdboot

How to run R-MPI on multiple machine with normal user permission

Configure mpd for normal user

  • Fisrt, login as nornal user. Here we login with user id 'jazz'. Then exchange SSH public key to each computing node.
    login as: jazz
    jazz@bio-cluster-12's password:
    jazz@bio-cluster-12:~$ ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/jazz/.ssh/id_rsa):
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /home/jazz/.ssh/id_rsa.
    Your public key has been saved in /home/jazz/.ssh/id_rsa.pub.
    The key fingerprint is:
    XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX jazz@bio-cluster-12
    jazz@bio-cluster-12:~$ for i in 11 10 09 08 07 06; do scp .ssh/id_rsa.pub bio-cluster-$i:.ssh/authorized_keys; done
    
  • setup .mpd.conf for each compute node in $HOME
    jazz@bio-cluster-12:~$ echo "MPD_SECRETWORD=${user}$$" > ~/.mpd.conf
    jazz@bio-cluster-12:~$ chmod 600 .mpd.conf
    jazz@bio-cluster-12:~$ for i in 11 10 09 08 07 06
    > do
    > scp .mpd.conf bio-cluster-$i:.
    > done
    .mpd.conf                                     100%   21     0.0KB/s   00:00
    .mpd.conf                                     100%   21     0.0KB/s   00:00
    .mpd.conf                                     100%   21     0.0KB/s   00:00
    .mpd.conf                                     100%   21     0.0KB/s   00:00
    .mpd.conf                                     100%   21     0.0KB/s   00:00
    .mpd.conf                                     100%   21     0.0KB/s   00:00
    
  • setup mpd.hosts on localhost
    jazz@bio-cluster-12:~$ cat > mpd.hosts << EOF
    > bio-cluster-11
    > bio-cluster-10
    > bio-cluster-09
    > bio-cluster-08
    > bio-cluster-07
    > bio-cluster-06
    > EOF
    
  • run mpdboot for 7 nodes and use mpdtrace to check the status of mpd process on each compute node. mpdringtest can test the speed of message passing. mpdallexit to terminate all mpd processes.
    jazz@bio-cluster-12:~$ mpdboot -n 7
    jazz@bio-cluster-12:~$ mpdtrace -l
    bio-cluster-12_54092 (10.220.202.219)
    bio-cluster-08_38361 (10.220.202.223)
    bio-cluster-09_52923 (10.220.202.222)
    bio-cluster-11_33377 (10.220.202.220)
    bio-cluster-10_33103 (10.220.202.221)
    bio-cluster-06_59631 (10.220.202.225)
    bio-cluster-07_59533 (10.220.202.224)
    jazz@bio-cluster-12:~$ mpdringtest 100
    time for 100 loops = 0.0729811191559 seconds
    jazz@bio-cluster-12:~$ mpdallexit
    

Test 1: single mpd and R-MPI in localhost

  • run mpd in localhost
    jazz@bio-cluster-12:~$ mpd &
    [1] 1505
    jazz@bio-cluster-12:~$ mpdtrace -l
    bio-cluster-12_37007 (10.220.202.219)
    
  • BTW, you can also use mpdboot to run mpd in localhost
    jazz@bio-cluster-12:~$ mpdboot
    [1]+  Done                    mpd
    jazz@bio-cluster-12:~$ mpdtrace -l
    bio-cluster-12_44810 (10.220.202.219)
    
  • run R-MPI with single mpd in localhost
    jazz@bio-cluster-12:~$ R
    
    R version 2.4.0 Patched (2006-11-25 r39997)
    Copyright (C) 2006 The R Foundation for Statistical Computing
    ISBN 3-900051-07-0
    
    > library(Rmpi)
    > mpi.spawn.Rslaves()
            1 slaves are spawned successfully. 0 failed.
    master (rank 0, comm 1) of size 2 is running on: bio-cluster-12
    slave1 (rank 1, comm 1) of size 2 is running on: bio-cluster-12
    > mpi.close.Rslaves()
    mpi.close.Rslaves()
    [1] 1
    > mpi.quit(save="no")
    mpi.quit(save="no")
    jazz@bio-cluster-12:~$
    

Test 2: Compile MPICH2 sample program and run in multiple compute nodes

  • Sample program from Wade.
    jazz@bio-cluster-12:~$ cat > demo1.c << EOF
    > /* Program:
    >  *   每個 node 將自己的 id 印出,並且將所有的參與運動的 node 總數也印出
    >  *   ,顯示出自己的主機名稱。
    >  * History:
    >  *   2008-04-09 BETA
    >  *   2008-06-25 增加顯示主機名稱功能
    > */
    >
    > #include <stdio.h>
    > #include "mpi.h"
    > int main (int argc, char **argv)
    > {
    >   int rank, size, len;
    >   char name[MPI_MAX_PROCESSOR_NAME];
    >   MPI_Init(&argc, &argv);
    >   int myid, numprocs;
    >
    >   /* 取得 node 總數 */
    >   MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    >
    >   /* 取得本身 node id / rank  */
    >   MPI_Comm_rank(MPI_COMM_WORLD,&myid);
    >
    >   /* 取得本身 host name  */
    >   MPI_Get_processor_name(name, &len);
    >   printf("This is machine %d of %d  name = %s\n", myid, numprocs, name);
    >   MPI_Finalize();
    > }
    > EOF
    jazz@bio-cluster-12:~$ mpicc -I /usr/include/mpich2/ -lmpich demo1.c -o demo1
    jazz@bio-cluster-12:~$ for i in 11 10 09 08 07 06
    > do
    > scp demo1 bio-cluster-$i:.
    > done
    demo1                                         100%  557KB 557.3KB/s   00:00
    demo1                                         100%  557KB 557.3KB/s   00:00
    demo1                                         100%  557KB 557.3KB/s   00:00
    demo1                                         100%  557KB 557.3KB/s   00:00
    demo1                                         100%  557KB 557.3KB/s   00:00
    demo1                                         100%  557KB 557.3KB/s   00:00
    jazz@bio-cluster-12:~$ mpdboot -n 7
    jazz@bio-cluster-12:~$ mpdtrace -l
    bio-cluster-12_41632 (10.220.202.219)
    bio-cluster-08_33197 (10.220.202.223)
    bio-cluster-09_40371 (10.220.202.222)
    bio-cluster-10_54199 (10.220.202.221)
    bio-cluster-06_54334 (10.220.202.225)
    bio-cluster-07_42302 (10.220.202.224)
    bio-cluster-11_36534 (10.220.202.220)
    jazz@bio-cluster-12:~$ mpiexec -n 7 /home/jazz/demo1
    This is machine 0 of 7  name = bio-cluster-12
    This is machine 1 of 7  name = bio-cluster-08
    This is machine 2 of 7  name = bio-cluster-09
    This is machine 3 of 7  name = bio-cluster-10
    This is machine 5 of 7  name = bio-cluster-07
    This is machine 6 of 7  name = bio-cluster-11
    This is machine 4 of 7  name = bio-cluster-06
    

Test 3: run multiple mpd and R-MPI in multiple compute nodes

  • Note: mpdboot will use rsh as default communication channel, in debian 4.0 we can find that rsh is equal to ssh.
    jazz@bio-cluster-12:~$ which rsh
    /usr/bin/rsh
    jazz@bio-cluster-12:~$ ls -al /usr/bin/rsh
    lrwxrwxrwx 1 root root 21 2008-04-09 20:52 /usr/bin/rsh -> /etc/alternatives/rsh
    jazz@bio-cluster-12:~$ ls -al /etc/alternatives/rsh
    lrwxrwxrwx 1 root root 12 2008-05-29 19:04 /etc/alternatives/rsh -> /usr/bin/ssh
    
Last modified 16 years ago Last modified on Jul 2, 2008, 11:50:04 AM