Data Challenge
- (宗旨?):先構想出較有利的機制並透過設計來調整Torque、GXP和Gfarm的運作,盡可能降低資料搬移量並且能取得最佳資源執行工作,更能在某資源fail時,能儘速交由另一資源來接續其工作,而不是重新執行此工作,來達到seamlessly完成工作,更進而加快完成速度。
- 簡易測試範例
- $ gxpc e --help
Usage: gxpc e [OPTION ...] CMD gxpc mw [OPTION ...] CMD gxpc ep [OPTION ...] file Description: Execute the command on the selected nodes. Option (for mw only): --master 'command' equivalent to e --updown '3:4:command' ... if --master is not given, it is equivalent to e --updown 3:4 ... Options (for e, mw, and ep): --withmask,-m MASK execute on a set of nodes saved by savemask or pushmask --withhostmask,-h HOSTMASK execute on a set of nodes whose names match regexp HOSTMASK --withhostnegmask,-H HOSTMASK execute on a set of nodes whose names do not match regexp HOSTMASK --up FD0[:FD1] collect output from FD0 of CMD, and output them to FD1 of gxpc. if :FD1 is omitted, it is treated as if FD1 == FD0 --down FD0[:FD1] broadcast input to FD0 of gxpc to FD1 of CMD. if :FD1 is omitted, it is treated as if FD1 == FD0 --updown FD1:FD2[:MASTER] if :MASTER is omitted, collect output from FD1 of CMD, and broadcast them to FD2 of CMD. if :MASTER is given, run MASTER on the local host, collect output from FD1 of CMD, feed them to stdin of the MASTER. broadcast stdout of the MASTER to FD1 of CMD. --pty assign pseudo tty for stdin/stdout/stderr of CMD By default, - stdin of gxpc are broadcast to stdin of CMD - stdout of CMD are output to stdout of gxpc - stderr of CMD are output to stderr of gxpc This is as if `--down 0 --up 1 --up 2' are specified. In this case, stdout/stderr are block-buffered by default. You may need to do setbuf in your program or flush stdout/err, to display CMD's output without delay. --pty overwrites this and turn them to line-buffered (by default). both stdout/err of CMD now goto stdout of gxpc (they are merged). CMD's stdout/err should appear as soon as they are newlined. See Also: smask savemask pushmask rmask restoremask popmask
- $ gxpc use --help
Usage: gxpc use [--as USER] RSH_NAME SRC [TARGET] gxpc use --delete [--as USER] RSH_NAME SRC [TARGET] gxpc use gxpc use --delete [idx] Description: Configure rsh-like commands used to login targets matching a particular pattern from hosts matching a particular pattern. The typical usage is `gxpc use RSH_NAME SRC TARGET', which says gxp can use an rsh-like command RSH_NAME for SRC to login TARGET. gxpc remembers these facts to decide which hosts should issue which commands to login which hosts, when explore command is issued. See the section of explore command and the tutorial section of the manual. Examples: gxpc use ssh abc000.def.com pqr.xyz.ac.jp gxpc use ssh abc000 pqr gxpc use ssh abc gxpc use rsh abc gxpc use --as taue ssh abc000 pqr gxpc use qrsh abc gxpc use qrsh_host abc gxpc use sge abc gxpc use torque abc The first line says that, if gxpc is told to login pqr.xyz.ac.jp by explore command, hosts named abc000.def.com can use `ssh' method to do so. How it translates into the actual ssh command line can be shown by `show_explore' command (try `gxpc help show_explore') and can be configured by `rsh' command (try `gxpc help rsh'). SRC and TARGET are actually regular expressions, so the line like the first one can often be written like the second one. The first line is equivalent to the second line as long as there is only one host begining with abc000 and there is only one target beginning with pqr. In general, the specification: gxpc use RSH_NAME SRC TARGET is read: if gxpc is told to login a target matching regular expession TARGET, a host matching regular expression SRC can use RSH_NAME to do so. Note that the effect of use command is NOT to specify which target gxpc should login, but to specify HOW it can do so, if it is told to. It is the role of explore command to specify which target hosts it should login If the TARGET argument is omitted as in the third line, it is treated as if TARGET expression is SRC. That is, the third line is equivalent to: gxpc use ssh abc abc This is often useful to express that ssh login is possible between hosts within a single cluster, which typically have a common prefix in their host names. If the traditional rsh command is allowed within a single cluster, the fourth line may be useful too. If --as user option is given, login is issued using an explicit user name. The fifth line says when gxp attempts to login pqr from abc000, the explicit user name `taue' should be given. You do not need this as long as the underlying rsh-like command will complement it by a configuration file. e.g., ssh will read ~/.ssh/config to complement user name used to login a particular host. qrsh_host uses command qrsh, with an explicit hostname argument to login a particular host (i.e., qrsh -l hostname=...). This is useful in environments where direct ssh is discouraged or disallowed and qrsh is preferred. qrsh also uses qrsh, but without an explicit hostname. The host is selected by the scheduler. Therefore it does not make sense to try to speficify a particular hostname as TARGET. Thus, the effect of the line gxpc use qrsh abc is if targets beginning with abc is given (upon explore command), a host beginning with abc will issue qrsh, and get whichever host is allocated by the scheduler. See Also: explore conf_explore
- $ gxpc explore --help
Usage: gxpc explore [OPTIONS] TARGET TARGET ... Description: Login target hosts specified by OPTIONS and TARGET. Options: --dry dryrun. only show target hosts --hostfile,-h HOSTS_FILE give known hosts by file --hostcmd HOSTS_CMD give known hosts by command output --targetfile,-t TARGETS_FILE give target hosts by file --targetcmd TARGETS_CMD give target hosts by command output --timeout SECONDS specify the time to wait for a remote host's response until gxp considers it dead --children_soft_limit N (>= 2) control the shape of the explore tree. if this value is N, gxpc tries to keep the number of children of a single host no more than N, unless it is absolutely necessary to reach requested nodes. --children_hard_limit N control the shape of the explore tree. if this value is N, gxpc keeps the number of children of a single host no more than N, in any event. --verbosity N (0 <= N <= 2) set verbosity level (the larger the more verbose) --set_default if you set this option, options specified in this explore becomes the default. for example, if you say --timeout 20.0 and --set_default, timeout is set to 20.0 in subsequent explores, even if you do not specify --timeout. --reset_default reset the default values set by --set_default. --show_settings show effective explore options, considering those given by command line and those specified as default values. Execution of an explore command will conceptually consist of the following three steps. (1) Known Hosts: Know names of existing hosts, either by --hostfile, --hostcmd, or a default rule. These are called 'known hosts.' -h is an acronym of --hostfile. (2) Targets: Extract login targets from known hosts. They are extracted by regular expressions given either by --targetfile, --targetcmd, or directly by command line arguments. -t is an acronym of --targetfile. (3) gxpc will attempt to login these targets according to the rules specified by `use' commands. Known hosts are specified by a file using --hostfile option, or by output of a command using --hostcmd. Formats of the two are common and very simple. In the simplest format, a single file contains a single hostname. For example, hongo001 hongo002 hongo004 hongo005 hongo006 hongo007 hongo008 is a valid HOSTS_FILE. If you specify a command that outputs a list of files in the above format, the effect is the same as giving a file having the list by --hostfile. For example, --hostcmd 'for i in `seq 1 8` ; do printf "%03d\n" $i ; done' has the same effect as giving the above file to --hostfile. The format of a HOSTS_FILE is actually a so-called /etc/hosts format, each line of which may contain several aliases of the same host, as well as their IP address. gxpc simply regards them as aliases of a single host, wihtout giving any significance to which columns they are in. Anything after `#' on each line is a comment and ignored. Lines not containning any name, such as empty lines, are also ignored. The above simple format is obviously a special case of this. It is sometimes convenient to specify /etc/hosts as an argument to --hostfile or to specify `ypcat hosts' as an argument to --hostcmd. As a matter of fact, if you do not specify any of --hostfile, --hostcmd, --targetfile, and --targetcmd, it is treated as if --hostfile /etc/hosts is given. Login targets are specified by a file using --targetfile option, --targetcmd option, or by directly listing targets in the command line. Format of them are common and only slightly different from HOSTS_FILE. The format of the list of targets in the command line is as follows. TARGET_REGEXP [N] TARGET_REGEXP [N] TARGET_REGEXP [N] ... where N is an integer and TARGET_REGEXP is any string that cannot be parsed as an integer. That is, it is a list of regular expressions, each item of which may optionally be followed by an integer. The integer indicates how many logins should occur to the target matching TARGET_REGEXP. The following is a valid command line. gxpc explore -h hosts_file hongo00 which says you want to target all hosts beginning with hongo00, among all hosts listed in hosts_file. If, for example, you have specified by `use' command that the local host can login these hosts by ssh, you will reach hosts whose names begin with hongo00. If you instead say gxpc explore -h hosts_file hongo00 2 you will get two processes on each of these hosts. If you do not give any of --targetfile, --targetcmd, and command line targets, it is treated as if a regular expression mathing any string is given as the command line target. That is, all known hosts are targets. Format of targets_host is simply a list of lines each of which is like the list of arguments just explained above. Thus, the following is a valid TARGETS_FILE. hongo00 2 chiba0 istbs sheep which says you want to get two processes on each host beginning with hongo00 and one process on each host beginning with chiba0, istbs, or sheep. Just to illustrate the syntax, the same thing can be alternatively written with different arrangement into lines. hongo00 2 chiba0 istbs sheep Similar to hosts_file, you may instead specify a command line producing the output conforming to the format of TARGETS_FILE. We have so far explained that target_regexp is matched against a pool of known hosts to generate the actual list of targets. There is an exception to this. If TARGET_REGEXP does not match any host in the pool of known hosts, it is treated as if the TARGET_REGEXP is itself a known host. Thus, gxpc explore hongo000 hongo001 will login hongo000 and hongo001, because neither hosts_file nor hosts_cmd hosts are given so these expressions obviously won't match any known host. Using this rule, you may have a file that explicitly lists all hosts and solely use it to specify targets without using separate HOSTS_FILE. For example, if you have a long TARGETS_FILE called targets like: abc000 abc001 ... abc099 def000 def001 ... def049 pqr000 pqr001 ... pqr149 and say gxpc explore -t targets you say you want to get these 300 targets using whatever methods you specified by `use' commands. Unlike HOSTS_FILE, an empty line in TARGETS_FILE is treated as if it is the end of file. By inserting an empty line, you can easily let gxpc ignore the rest of the file. This rule is sometimes convenient when targeting a small number of hosts within a TARGETS_FILE. Here are some examples. 1. gxpc explore -h hosts_file chiba hongo Hosts beginning with chiba or hongo in hosts_file become the targets. 2. gxpc explore -h hosts_file -t targets_file Hosts matching any regular expression in targets_file become the targets. 3. gxpc explore -h hosts_file All hosts in hosts_file become the targets. Equivalent to `gxpc explore -h hosts_file .' (`.' is a regular expression mathing any non-empty string). 4. gxpc explore -t targets_file All hosts in targetfile become the targets. This is simiar to the previous case, but the file format is different. Note that in this case, strings in targets_file won't be matched against anything, so they should be literal target names. 5. gxpc explore chiba000 chiba001 chiba002 chiba003 chiba000, chiba001, chiba002, and chiba003 become the targets. 6. gxpc explore chiba0 Equivalent to `gxpc explore -h /etc/hosts chiba0' which is hosts beginning with chiba0 in /etc/hosts become the targets. Useful when you use a single cluster and all necessary hosts are listed in that file. 7. gxpc explore Equivalent to `gxpc explore -h /etc/hosts' which is in turn equivalent to `gxpc explore -h /etc/hosts .' That is, all hosts in /etc/hosts become the targets. This will be rarely useful because /etc/hosts typically includes hosts you don't want to use.
- 以在chiba這個cluster裡為例:(機器有chiba000-157,用到chiba000-003)
$ dach005@chiba000:~$ gxpc e hostname (用gxpc e 來帶入要使用的指令"hostname",在一開始無login到其他hosts時先查看) chiba000 $ dach005@chiba000:~$ gxpc use ssh chiba (表示可使用ssh指令login 到chiba cluster裡的hosts,即source hosts和target hosts都在此cluster裡) $ dach005@chiba000:~$ gxpc use (可看目前可ssh 的列表) 0 : use ssh chiba chiba $ dach005@chiba000:~$ gxpc explore chiba00[[ 1-3 ]] (可用explore指令來達到真的login到遠端機器裡)(因為語法問題,下指令時無需空格) reached : chiba001 reached : chiba002 reached : chiba003 $ dach005@chiba000:~$ gxpc e hostname (再看一次,可以發現目前我們可以reach到的機器列表) chiba000 chiba003 chiba002 chiba001 $ dach005@chiba000:~$ qsub test_3.sh (Torque 沒搭配GXP時,僅可以在本端機器這邊下命令) 64.chiba000.intrigger.nii.ac.jp $ dach005@chiba000:~$ gxpc e qsub test_3.sh (搭配torque 我們發現就可以同時在本機及遠端機器下執行指令--> chiba000-003是GXP的執行nodes而非執行Torque的執行nodes) 67.chiba000.intrigger.nii.ac.jp 68.chiba000.intrigger.nii.ac.jp 65.chiba000.intrigger.nii.ac.jp 66.chiba000.intrigger.nii.ac.jp
- $ gxpc e --help
- gxpc smask 和 gxpc rmask 的指令測試 (用來挑選某些特定節點)
- 可以參考此網址點我
- 在網頁下方(2007年10月13日這篇)執行mpi的例子說明
- 挑選步驟
- 1.送出一個指令,使得在所想選得那些節點上會成功執行的
- 2.送出smask,使得把其它沒成功的節點mask住
- 3.可以用e hostname指令來做確認那些節點是確切被挑選到的
- 4.送出rmask,恢復那些被mask住的節點
- 指令說明
- "gxpc e hostname | grep chiba00[ 02 ]" (只要chiba000和002時)、"e uptime | awk '{ if ($(NF-2) > 0.5) print "H" }' | grep H" 的功能:可挑出我們想要的節點。
- 中括號沒有空格,因為語法關係才將02前後空格
- "gxpc smask" 的功能:用來挑選那些上一個指令執行失敗的節點,把它們set mask。
- "gxpc e hostname" 的功能:用來double check,看目前有哪些節點在選擇中
- "gxpc rmask" 的功能:用來reset mask 回復到原本預設挑選的所有被reached 到的節點。
- "gxpc e hostname | grep chiba00[ 02 ]" (只要chiba000和002時)、"e uptime | awk '{ if ($(NF-2) > 0.5) print "H" }' | grep H" 的功能:可挑出我們想要的節點。
- 迷思
- 從網頁中的測試範例,我們可以發現他的smask及rmask確實有在運作,但是我有照著做,卻發現沒有作用,似乎無法挑出我要的節點,上一個指令沒有被挑選到的節點(即command失敗)也沒被mask住。
- 不知是否我有測試錯的地方。(待查)
- 可以參考此網址點我
Last modified 17 years ago
Last modified on Jul 7, 2008, 10:55:45 AM