wiki:chwhs/DataChallenge

Version 2 (modified by chwhs, 16 years ago) (diff)

--

Data Challenge

  • (宗旨?):先構想出較有利的機制並透過設計來調整Torque、GXP和Gfarm的運作,盡可能降低資料搬移量並且能取得最佳資源執行工作,更能在某資源fail時,能儘速交由另一資源來接續其工作,而不是重新執行此工作,來達到seamlessly完成工作,更進而加快完成速度。
  • 簡易測試範例
    • $ gxpc e --help
      Usage:
        gxpc e  [OPTION ...] CMD
        gxpc mw [OPTION ...] CMD
        gxpc ep [OPTION ...] file
      
      Description:
        Execute the command on the selected nodes.
      
      Option (for mw only):
        --master 'command'
          equivalent to e --updown '3:4:command' ...
        if --master is not given, it is equivalent to e --updown 3:4 ...
      
      Options (for e, mw, and ep):
        --withmask,-m MASK
          execute on a set of nodes saved by savemask or pushmask
        --withhostmask,-h HOSTMASK
          execute on a set of nodes whose names match regexp HOSTMASK
        --withhostnegmask,-H HOSTMASK
          execute on a set of nodes whose names do not match regexp HOSTMASK
        --up FD0[:FD1]
          collect output from FD0 of CMD, and output them to FD1 of gxpc.
          if :FD1 is omitted, it is treated as if FD1 == FD0
        --down FD0[:FD1]
          broadcast input to FD0 of gxpc to FD1 of CMD.
          if :FD1 is omitted, it is treated as if FD1 == FD0
        --updown FD1:FD2[:MASTER]
          if :MASTER is omitted, collect output from FD1 of CMD,
          and broadcast them to FD2 of CMD.
          if :MASTER is given, run MASTER on the local host, collect
          output from FD1 of CMD, feed them to stdin of the MASTER.
          broadcast stdout of the MASTER to FD1 of CMD.
        --pty
          assign pseudo tty for stdin/stdout/stderr of CMD
      
      By default,
      
      - stdin of gxpc are broadcast to stdin of CMD
      - stdout of CMD are output to stdout of gxpc
      - stderr of CMD are output to stderr of gxpc
      
      This is as if `--down 0 --up 1 --up 2' are specified.  In this
      case, stdout/stderr are block-buffered by default.  You may need
      to do setbuf in your program or flush stdout/err, to display
      CMD's output without delay.  --pty overwrites this and turn them
      to line-buffered (by default).  both stdout/err of CMD now goto
      stdout of gxpc (they are merged).  CMD's stdout/err should appear
      as soon as they are newlined.
      
      See Also:
        smask savemask pushmask rmask restoremask popmask
      
    • $ gxpc use --help
      Usage:
        gxpc use          [--as USER] RSH_NAME SRC [TARGET]
        gxpc use --delete [--as USER] RSH_NAME SRC [TARGET]
        gxpc use
        gxpc use --delete [idx]
      
      Description:
      
        Configure rsh-like commands used to login targets matching a
      particular pattern from hosts matching a particular pattern. The
      typical usage is `gxpc use RSH_NAME SRC TARGET', which says gxp can
      use an rsh-like command RSH_NAME for SRC to login TARGET. gxpc
      remembers these facts to decide which hosts should issue which
      commands to login which hosts, when explore command is issued. See the
      section of explore command and the tutorial section of the manual.
      
      Examples:
        gxpc use           ssh abc000.def.com pqr.xyz.ac.jp
        gxpc use           ssh abc000 pqr
        gxpc use           ssh abc
        gxpc use           rsh abc
        gxpc use --as taue ssh abc000 pqr
        gxpc use qrsh      abc
        gxpc use qrsh_host abc
        gxpc use sge       abc
        gxpc use torque    abc
      
      The first line says that, if gxpc is told to login pqr.xyz.ac.jp by
      explore command, hosts named abc000.def.com can use `ssh' method to do
      so.  How it translates into the actual ssh command line can be shown
      by `show_explore' command (try `gxpc help show_explore') and can be
      configured by `rsh' command (try `gxpc help rsh').
      
      SRC and TARGET are actually regular expressions, so the line like the
      first one can often be written like the second one.  The first line
      is equivalent to the second line as long as there is only one host
      begining with abc000 and there is only one target beginning with pqr.
      In general, the specification:
      
        gxpc use RSH_NAME SRC TARGET
      
      is read: if gxpc is told to login a target matching regular
      expession TARGET, a host matching regular expression SRC can use
      RSH_NAME to do so.
      
      Note that the effect of use command is NOT to specify which target
      gxpc should login, but to specify HOW it can do so, if it is told
      to. It is the role of explore command to specify which target hosts it
      should login
      
      If the TARGET argument is omitted as in the third line, it is
      treated as if TARGET expression is SRC. That is, the third line
      is equivalent to:
      
        gxpc use ssh abc abc
      
      This is often useful to express that ssh login is possible
      between hosts within a single cluster, which typically have a
      common prefix in their host names. If the traditional rsh command
      is allowed within a single cluster, the fourth line may be useful
      too.
      
      If --as user option is given, login is issued using an explicit user
      name. The fifth line says when gxp attempts to login pqr from abc000,
      the explicit user name `taue' should be given. You do not need this as
      long as the underlying rsh-like command will complement it by a
      configuration file. e.g., ssh will read ~/.ssh/config to complement
      user name used to login a particular host.
      
      qrsh_host uses command qrsh, with an explicit hostname argument
      to login a particular host (i.e., qrsh -l hostname=...).  This is
      useful in environments where direct ssh is discouraged or
      disallowed and qrsh is preferred.
      
      qrsh also uses qrsh, but without an explicit hostname. The host
      is selected by the scheduler. Therefore it does not make sense to
      try to speficify a particular hostname as TARGET.  Thus, the
      effect of the line
      
        gxpc use qrsh abc
      
      is if targets beginning with abc is given (upon explore command),
      a host beginning with abc will issue qrsh, and get whichever host
      is allocated by the scheduler.
      
      See Also:
        explore conf_explore
      
    • $ gxpc explore --help
      Usage:
        gxpc explore [OPTIONS] TARGET TARGET ...
      
      Description:
        Login target hosts specified by OPTIONS and TARGET.
      
      Options:
        --dry
          dryrun. only show target hosts
        --hostfile,-h HOSTS_FILE
          give known hosts by file
        --hostcmd HOSTS_CMD
          give known hosts by command output
        --targetfile,-t TARGETS_FILE
          give target hosts by file
        --targetcmd TARGETS_CMD
          give target hosts by command output
        --timeout SECONDS
          specify the time to wait for a remote host's response
          until gxp considers it dead
        --children_soft_limit N (>= 2)
          control the shape of the explore tree. if this value is N, gxpc
          tries to keep the number of children of a single host no more than N,
          unless it is absolutely necessary to reach requested nodes.
        --children_hard_limit N
          control the shape of the explore tree. if this value is N, gxpc
          keeps the number of children of a single host no more than N, in any event.
        --verbosity N (0 <= N <= 2)
          set verbosity level (the larger the more verbose)
        --set_default
          if you set this option, options specified in this explore becomes the default.
          for example, if you say --timeout 20.0 and --set_default, timeout is set to
          20.0 in subsequent explores, even if you do not specify --timeout.
        --reset_default
          reset the default values set by --set_default.
        --show_settings
          show effective explore options, considering those given by command line and
          those specified as default values.
      
      Execution of an explore command will conceptually consist of the
      following three steps.
      
      (1) Known Hosts: Know names of existing hosts, either by
      --hostfile, --hostcmd, or a default rule. These are called
      'known hosts.' -h is an acronym of --hostfile.
      
      (2) Targets: Extract login targets from known hosts. They are
      extracted by regular expressions given either by --targetfile,
      --targetcmd, or directly by command line arguments. -t is
      an acronym of --targetfile.
      
      (3) gxpc will attempt to login these targets according to the
      rules specified by `use' commands.
      
      Known hosts are specified by a file using --hostfile option, or
      by output of a command using --hostcmd. Formats of the two are
      common and very simple. In the simplest format, a single file
      contains a single hostname. For example,
      
         hongo001
         hongo002
         hongo004
         hongo005
         hongo006
         hongo007
         hongo008
      
      is a valid HOSTS_FILE. If you specify a command that outputs
      a list of files in the above format, the effect is the same
      as giving a file having the list by --hostfile. For example,
      
        --hostcmd 'for i in `seq 1 8` ; do printf "%03d\n" $i ; done'
      
      has the same effect as giving the above file to --hostfile.
      
      The format of a HOSTS_FILE is actually a so-called /etc/hosts
      format, each line of which may contain several aliases of the
      same host, as well as their IP address. gxpc simply regards them
      as aliases of a single host, wihtout giving any significance to
      which columns they are in. Anything after `#' on each line is a
      comment and ignored. Lines not containning any name, such as
      empty lines, are also ignored.  The above simple format is
      obviously a special case of this.
      
      It is sometimes convenient to specify /etc/hosts as an argument
      to --hostfile or to specify `ypcat hosts' as an argument to
      --hostcmd. As a matter of fact, if you do not specify any of
      --hostfile, --hostcmd, --targetfile, and --targetcmd, it is
      treated as if --hostfile /etc/hosts is given.
      
      Login targets are specified by a file using --targetfile option,
      --targetcmd option, or by directly listing targets in the command
      line. Format of them are common and only slightly different from
      HOSTS_FILE.  The format of the list of targets in the command
      line is as follows.
      
         TARGET_REGEXP [N] TARGET_REGEXP [N] TARGET_REGEXP [N] ...
      
      where N is an integer and TARGET_REGEXP is any string that cannot
      be parsed as an integer. That is, it is a list of regular
      expressions, each item of which may optionally be followed by an
      integer. The integer indicates how many logins should occur to
      the target matching TARGET_REGEXP. The following is a valid
      command line.
      
        gxpc explore -h hosts_file hongo00
      
      which says you want to target all hosts beginning with hongo00,
      among all hosts listed in hosts_file.  If, for example, you have
      specified by `use' command that the local host can login these
      hosts by ssh, you will reach hosts whose names begin with
      hongo00.  If you instead say
      
        gxpc explore -h hosts_file hongo00 2
      
      you will get two processes on each of these hosts.
      
      If you do not give any of --targetfile, --targetcmd, and command
      line targets, it is treated as if a regular expression mathing
      any string is given as the command line target. That is, all
      known hosts are targets.
      
      Format of targets_host is simply a list of lines each of which
      is like the list of arguments just explained above. Thus, the
      following is a valid TARGETS_FILE.
      
        hongo00 2
        chiba0
        istbs
        sheep
      
      which says you want to get two processes on each host beginning
      with hongo00 and one process on each host beginning with chiba0,
      istbs, or sheep. Just to illustrate the syntax, the same thing
      can be alternatively written with different arrangement into
      lines.
      
        hongo00 2 chiba0
        istbs sheep
      
      Similar to hosts_file, you may instead specify a command line
      producing the output conforming to the format of TARGETS_FILE.
      
      We have so far explained that target_regexp is matched against a
      pool of known hosts to generate the actual list of targets.
      There is an exception to this. If TARGET_REGEXP does not match
      any host in the pool of known hosts, it is treated as if the
      TARGET_REGEXP is itself a known host. Thus,
      
        gxpc explore hongo000 hongo001
      
      will login hongo000 and hongo001, because neither hosts_file nor
      hosts_cmd hosts are given so these expressions obviously won't
      match any known host. Using this rule, you may have a file that
      explicitly lists all hosts and solely use it to specify targets
      without using separate HOSTS_FILE. For example, if you have a
      long TARGETS_FILE called targets like:
      
        abc000
        abc001
          ...
        abc099
        def000
        def001
          ...
        def049
        pqr000
        pqr001
          ...
        pqr149
      
      and say
      
        gxpc explore -t targets
      
      you say you want to get these 300 targets using whatever methods
      you specified by `use' commands.
      
      Unlike HOSTS_FILE, an empty line in TARGETS_FILE is treated as if
      it is the end of file. By inserting an empty line, you can easily
      let gxpc ignore the rest of the file. This rule is sometimes
      convenient when targeting a small number of hosts within a
      TARGETS_FILE.
      
      Here are some examples.
      
      1.
      
        gxpc explore -h hosts_file chiba hongo
      
      Hosts beginning with chiba or hongo in hosts_file
      become the targets.
      
      2.
      
        gxpc explore -h hosts_file -t targets_file
      
      Hosts matching any regular expression in targets_file become
      the targets.
      
      3.
      
        gxpc explore -h hosts_file
      
      All hosts in hosts_file become the targets.  Equivalent to `gxpc
      explore -h hosts_file .'  (`.' is a regular expression mathing
      any non-empty string).
      
      4.
      
        gxpc explore -t targets_file
      
      All hosts in targetfile become the targets. This is simiar to the
      previous case, but the file format is different.  Note that in
      this case, strings in targets_file won't be matched against
      anything, so they should be literal target names.
      
      5.
      
        gxpc explore chiba000 chiba001 chiba002 chiba003
      
      chiba000, chiba001, chiba002, and chiba003 become the targets.
      
      6.
      
        gxpc explore chiba0
      
      Equivalent to `gxpc explore -h /etc/hosts chiba0' which is hosts
      beginning with chiba0 in /etc/hosts become the targets. Useful
      when you use a single cluster and all necessary hosts are listed
      in that file.
      
      7.
      
        gxpc explore
      
      Equivalent to `gxpc explore -h /etc/hosts' which is in turn
      equivalent to `gxpc explore -h /etc/hosts .'  That is, all hosts
      in /etc/hosts become the targets.  This will be rarely useful
      because /etc/hosts typically includes hosts you don't want to
      use.
      
    • 以在chiba這個cluster裡為例:(機器有chiba000-157,用到chiba000-003)
      $ dach005@chiba000:~$ gxpc e hostname (用gxpc e 來帶入要使用的指令"hostname",在一開始無login到其他hosts時先查看)
      chiba000  
      
      $ dach005@chiba000:~$ gxpc use ssh chiba (表示可使用ssh指令login 到chiba cluster裡的hosts,即source hosts和target hosts都在此cluster裡)
      
      $ dach005@chiba000:~$ gxpc use (可看目前可ssh 的列表)
      0 : use ssh chiba chiba
      
      $ dach005@chiba000:~$ gxpc explore chiba00[[ 1-3 ]] (可用explore指令來達到真的login到遠端機器裡)(因為語法問題,下指令時無需空格)
      reached : chiba001
      reached : chiba002
      reached : chiba003
      
      $ dach005@chiba000:~$ gxpc e hostname (再看一次,可以發現目前我們可以reach到的機器列表)
      chiba000
      chiba003
      chiba002
      chiba001
      
      $ dach005@chiba000:~$ qsub test_3.sh (Torque 沒搭配GXP時,僅可以在本端機器這邊下命令)
      64.chiba000.intrigger.nii.ac.jp
      
      $ dach005@chiba000:~$ gxpc e qsub test_3.sh (搭配torque 我們發現就可以同時在本機及遠端機器下執行指令--> chiba000-003是GXP的執行nodes而非執行Torque的執行nodes)
      67.chiba000.intrigger.nii.ac.jp
      68.chiba000.intrigger.nii.ac.jp
      65.chiba000.intrigger.nii.ac.jp
      66.chiba000.intrigger.nii.ac.jp
      
  • gxpc smask 和 gxpc rmask 的指令測試 (用來挑選某些特定節點)
    • 可以參考此網址點我
      • 在網頁下方(2007年10月13日這篇)執行mpi的例子說明
    • 挑選步驟
      • 1.送出一個指令,使得在所想選得那些節點上會成功執行的
      • 2.送出smask,使得把其它沒成功的節點mask住
      • 3.可以用e hostname指令來做確認那些節點是確切被挑選到的
      • 4.送出rmask,恢復那些被mask住的節點
    • 指令說明
      • "gxpc e hostname | grep chiba00[02]" (只要chiba000和001時)、"e uptime | awk '{ if ($(NF-2) > 0.5) print "H" }' | grep H" 的功能:可挑出我們想要的節點。
      • "gxpc smask" 的功能:用來挑選那些上一個指令執行失敗的節點,把它們set mask。
      • "gxpc e hostname" 的功能:用來double check,看目前有哪些節點在選擇中
      • "gxpc rmask" 的功能:用來reset mask 回復到原本預設挑選的所有被reached 到的節點。
    • 迷思
      • 從網頁中的測試範例,我們可以發現他的smask及rmask確實有在運作,但是我有照著做,卻發現沒有作用,似乎無法挑出我要的節點,上一個指令沒有被挑選到的節點(即command失敗)也沒被mask住。
      • 不知是否我有測試錯的地方。(待查)