| 1 | |
| 2 | == '''''Data Challenge''''' == |
| 3 | * (宗旨?):先構想出較有利的機制並透過設計來調整Torque、GXP和Gfarm的運作,盡可能降低資料搬移量並且能取得最佳資源執行工作,更能在某資源fail時,能儘速交由另一資源來接續其工作,而不是重新執行此工作,來達到seamlessly完成工作,更進而加快完成速度。 |
| 4 | * 簡易測試範例 |
| 5 | * '''$ gxpc e --help''' |
| 6 | {{{ |
| 7 | Usage: |
| 8 | gxpc e [OPTION ...] CMD |
| 9 | gxpc mw [OPTION ...] CMD |
| 10 | gxpc ep [OPTION ...] file |
| 11 | |
| 12 | Description: |
| 13 | Execute the command on the selected nodes. |
| 14 | |
| 15 | Option (for mw only): |
| 16 | --master 'command' |
| 17 | equivalent to e --updown '3:4:command' ... |
| 18 | if --master is not given, it is equivalent to e --updown 3:4 ... |
| 19 | |
| 20 | Options (for e, mw, and ep): |
| 21 | --withmask,-m MASK |
| 22 | execute on a set of nodes saved by savemask or pushmask |
| 23 | --withhostmask,-h HOSTMASK |
| 24 | execute on a set of nodes whose names match regexp HOSTMASK |
| 25 | --withhostnegmask,-H HOSTMASK |
| 26 | execute on a set of nodes whose names do not match regexp HOSTMASK |
| 27 | --up FD0[:FD1] |
| 28 | collect output from FD0 of CMD, and output them to FD1 of gxpc. |
| 29 | if :FD1 is omitted, it is treated as if FD1 == FD0 |
| 30 | --down FD0[:FD1] |
| 31 | broadcast input to FD0 of gxpc to FD1 of CMD. |
| 32 | if :FD1 is omitted, it is treated as if FD1 == FD0 |
| 33 | --updown FD1:FD2[:MASTER] |
| 34 | if :MASTER is omitted, collect output from FD1 of CMD, |
| 35 | and broadcast them to FD2 of CMD. |
| 36 | if :MASTER is given, run MASTER on the local host, collect |
| 37 | output from FD1 of CMD, feed them to stdin of the MASTER. |
| 38 | broadcast stdout of the MASTER to FD1 of CMD. |
| 39 | --pty |
| 40 | assign pseudo tty for stdin/stdout/stderr of CMD |
| 41 | |
| 42 | By default, |
| 43 | |
| 44 | - stdin of gxpc are broadcast to stdin of CMD |
| 45 | - stdout of CMD are output to stdout of gxpc |
| 46 | - stderr of CMD are output to stderr of gxpc |
| 47 | |
| 48 | This is as if `--down 0 --up 1 --up 2' are specified. In this |
| 49 | case, stdout/stderr are block-buffered by default. You may need |
| 50 | to do setbuf in your program or flush stdout/err, to display |
| 51 | CMD's output without delay. --pty overwrites this and turn them |
| 52 | to line-buffered (by default). both stdout/err of CMD now goto |
| 53 | stdout of gxpc (they are merged). CMD's stdout/err should appear |
| 54 | as soon as they are newlined. |
| 55 | |
| 56 | See Also: |
| 57 | smask savemask pushmask rmask restoremask popmask |
| 58 | }}} |
| 59 | * $ '''gxpc use --help''' |
| 60 | {{{ |
| 61 | Usage: |
| 62 | gxpc use [--as USER] RSH_NAME SRC [TARGET] |
| 63 | gxpc use --delete [--as USER] RSH_NAME SRC [TARGET] |
| 64 | gxpc use |
| 65 | gxpc use --delete [idx] |
| 66 | |
| 67 | Description: |
| 68 | |
| 69 | Configure rsh-like commands used to login targets matching a |
| 70 | particular pattern from hosts matching a particular pattern. The |
| 71 | typical usage is `gxpc use RSH_NAME SRC TARGET', which says gxp can |
| 72 | use an rsh-like command RSH_NAME for SRC to login TARGET. gxpc |
| 73 | remembers these facts to decide which hosts should issue which |
| 74 | commands to login which hosts, when explore command is issued. See the |
| 75 | section of explore command and the tutorial section of the manual. |
| 76 | |
| 77 | Examples: |
| 78 | gxpc use ssh abc000.def.com pqr.xyz.ac.jp |
| 79 | gxpc use ssh abc000 pqr |
| 80 | gxpc use ssh abc |
| 81 | gxpc use rsh abc |
| 82 | gxpc use --as taue ssh abc000 pqr |
| 83 | gxpc use qrsh abc |
| 84 | gxpc use qrsh_host abc |
| 85 | gxpc use sge abc |
| 86 | gxpc use torque abc |
| 87 | |
| 88 | The first line says that, if gxpc is told to login pqr.xyz.ac.jp by |
| 89 | explore command, hosts named abc000.def.com can use `ssh' method to do |
| 90 | so. How it translates into the actual ssh command line can be shown |
| 91 | by `show_explore' command (try `gxpc help show_explore') and can be |
| 92 | configured by `rsh' command (try `gxpc help rsh'). |
| 93 | |
| 94 | SRC and TARGET are actually regular expressions, so the line like the |
| 95 | first one can often be written like the second one. The first line |
| 96 | is equivalent to the second line as long as there is only one host |
| 97 | begining with abc000 and there is only one target beginning with pqr. |
| 98 | In general, the specification: |
| 99 | |
| 100 | gxpc use RSH_NAME SRC TARGET |
| 101 | |
| 102 | is read: if gxpc is told to login a target matching regular |
| 103 | expession TARGET, a host matching regular expression SRC can use |
| 104 | RSH_NAME to do so. |
| 105 | |
| 106 | Note that the effect of use command is NOT to specify which target |
| 107 | gxpc should login, but to specify HOW it can do so, if it is told |
| 108 | to. It is the role of explore command to specify which target hosts it |
| 109 | should login |
| 110 | |
| 111 | If the TARGET argument is omitted as in the third line, it is |
| 112 | treated as if TARGET expression is SRC. That is, the third line |
| 113 | is equivalent to: |
| 114 | |
| 115 | gxpc use ssh abc abc |
| 116 | |
| 117 | This is often useful to express that ssh login is possible |
| 118 | between hosts within a single cluster, which typically have a |
| 119 | common prefix in their host names. If the traditional rsh command |
| 120 | is allowed within a single cluster, the fourth line may be useful |
| 121 | too. |
| 122 | |
| 123 | If --as user option is given, login is issued using an explicit user |
| 124 | name. The fifth line says when gxp attempts to login pqr from abc000, |
| 125 | the explicit user name `taue' should be given. You do not need this as |
| 126 | long as the underlying rsh-like command will complement it by a |
| 127 | configuration file. e.g., ssh will read ~/.ssh/config to complement |
| 128 | user name used to login a particular host. |
| 129 | |
| 130 | qrsh_host uses command qrsh, with an explicit hostname argument |
| 131 | to login a particular host (i.e., qrsh -l hostname=...). This is |
| 132 | useful in environments where direct ssh is discouraged or |
| 133 | disallowed and qrsh is preferred. |
| 134 | |
| 135 | qrsh also uses qrsh, but without an explicit hostname. The host |
| 136 | is selected by the scheduler. Therefore it does not make sense to |
| 137 | try to speficify a particular hostname as TARGET. Thus, the |
| 138 | effect of the line |
| 139 | |
| 140 | gxpc use qrsh abc |
| 141 | |
| 142 | is if targets beginning with abc is given (upon explore command), |
| 143 | a host beginning with abc will issue qrsh, and get whichever host |
| 144 | is allocated by the scheduler. |
| 145 | |
| 146 | See Also: |
| 147 | explore conf_explore |
| 148 | }}} |
| 149 | * $ '''gxpc explore --help''' |
| 150 | {{{ |
| 151 | Usage: |
| 152 | gxpc explore [OPTIONS] TARGET TARGET ... |
| 153 | |
| 154 | Description: |
| 155 | Login target hosts specified by OPTIONS and TARGET. |
| 156 | |
| 157 | Options: |
| 158 | --dry |
| 159 | dryrun. only show target hosts |
| 160 | --hostfile,-h HOSTS_FILE |
| 161 | give known hosts by file |
| 162 | --hostcmd HOSTS_CMD |
| 163 | give known hosts by command output |
| 164 | --targetfile,-t TARGETS_FILE |
| 165 | give target hosts by file |
| 166 | --targetcmd TARGETS_CMD |
| 167 | give target hosts by command output |
| 168 | --timeout SECONDS |
| 169 | specify the time to wait for a remote host's response |
| 170 | until gxp considers it dead |
| 171 | --children_soft_limit N (>= 2) |
| 172 | control the shape of the explore tree. if this value is N, gxpc |
| 173 | tries to keep the number of children of a single host no more than N, |
| 174 | unless it is absolutely necessary to reach requested nodes. |
| 175 | --children_hard_limit N |
| 176 | control the shape of the explore tree. if this value is N, gxpc |
| 177 | keeps the number of children of a single host no more than N, in any event. |
| 178 | --verbosity N (0 <= N <= 2) |
| 179 | set verbosity level (the larger the more verbose) |
| 180 | --set_default |
| 181 | if you set this option, options specified in this explore becomes the default. |
| 182 | for example, if you say --timeout 20.0 and --set_default, timeout is set to |
| 183 | 20.0 in subsequent explores, even if you do not specify --timeout. |
| 184 | --reset_default |
| 185 | reset the default values set by --set_default. |
| 186 | --show_settings |
| 187 | show effective explore options, considering those given by command line and |
| 188 | those specified as default values. |
| 189 | |
| 190 | Execution of an explore command will conceptually consist of the |
| 191 | following three steps. |
| 192 | |
| 193 | (1) Known Hosts: Know names of existing hosts, either by |
| 194 | --hostfile, --hostcmd, or a default rule. These are called |
| 195 | 'known hosts.' -h is an acronym of --hostfile. |
| 196 | |
| 197 | (2) Targets: Extract login targets from known hosts. They are |
| 198 | extracted by regular expressions given either by --targetfile, |
| 199 | --targetcmd, or directly by command line arguments. -t is |
| 200 | an acronym of --targetfile. |
| 201 | |
| 202 | (3) gxpc will attempt to login these targets according to the |
| 203 | rules specified by `use' commands. |
| 204 | |
| 205 | Known hosts are specified by a file using --hostfile option, or |
| 206 | by output of a command using --hostcmd. Formats of the two are |
| 207 | common and very simple. In the simplest format, a single file |
| 208 | contains a single hostname. For example, |
| 209 | |
| 210 | hongo001 |
| 211 | hongo002 |
| 212 | hongo004 |
| 213 | hongo005 |
| 214 | hongo006 |
| 215 | hongo007 |
| 216 | hongo008 |
| 217 | |
| 218 | is a valid HOSTS_FILE. If you specify a command that outputs |
| 219 | a list of files in the above format, the effect is the same |
| 220 | as giving a file having the list by --hostfile. For example, |
| 221 | |
| 222 | --hostcmd 'for i in `seq 1 8` ; do printf "%03d\n" $i ; done' |
| 223 | |
| 224 | has the same effect as giving the above file to --hostfile. |
| 225 | |
| 226 | The format of a HOSTS_FILE is actually a so-called /etc/hosts |
| 227 | format, each line of which may contain several aliases of the |
| 228 | same host, as well as their IP address. gxpc simply regards them |
| 229 | as aliases of a single host, wihtout giving any significance to |
| 230 | which columns they are in. Anything after `#' on each line is a |
| 231 | comment and ignored. Lines not containning any name, such as |
| 232 | empty lines, are also ignored. The above simple format is |
| 233 | obviously a special case of this. |
| 234 | |
| 235 | It is sometimes convenient to specify /etc/hosts as an argument |
| 236 | to --hostfile or to specify `ypcat hosts' as an argument to |
| 237 | --hostcmd. As a matter of fact, if you do not specify any of |
| 238 | --hostfile, --hostcmd, --targetfile, and --targetcmd, it is |
| 239 | treated as if --hostfile /etc/hosts is given. |
| 240 | |
| 241 | Login targets are specified by a file using --targetfile option, |
| 242 | --targetcmd option, or by directly listing targets in the command |
| 243 | line. Format of them are common and only slightly different from |
| 244 | HOSTS_FILE. The format of the list of targets in the command |
| 245 | line is as follows. |
| 246 | |
| 247 | TARGET_REGEXP [N] TARGET_REGEXP [N] TARGET_REGEXP [N] ... |
| 248 | |
| 249 | where N is an integer and TARGET_REGEXP is any string that cannot |
| 250 | be parsed as an integer. That is, it is a list of regular |
| 251 | expressions, each item of which may optionally be followed by an |
| 252 | integer. The integer indicates how many logins should occur to |
| 253 | the target matching TARGET_REGEXP. The following is a valid |
| 254 | command line. |
| 255 | |
| 256 | gxpc explore -h hosts_file hongo00 |
| 257 | |
| 258 | which says you want to target all hosts beginning with hongo00, |
| 259 | among all hosts listed in hosts_file. If, for example, you have |
| 260 | specified by `use' command that the local host can login these |
| 261 | hosts by ssh, you will reach hosts whose names begin with |
| 262 | hongo00. If you instead say |
| 263 | |
| 264 | gxpc explore -h hosts_file hongo00 2 |
| 265 | |
| 266 | you will get two processes on each of these hosts. |
| 267 | |
| 268 | If you do not give any of --targetfile, --targetcmd, and command |
| 269 | line targets, it is treated as if a regular expression mathing |
| 270 | any string is given as the command line target. That is, all |
| 271 | known hosts are targets. |
| 272 | |
| 273 | Format of targets_host is simply a list of lines each of which |
| 274 | is like the list of arguments just explained above. Thus, the |
| 275 | following is a valid TARGETS_FILE. |
| 276 | |
| 277 | hongo00 2 |
| 278 | chiba0 |
| 279 | istbs |
| 280 | sheep |
| 281 | |
| 282 | which says you want to get two processes on each host beginning |
| 283 | with hongo00 and one process on each host beginning with chiba0, |
| 284 | istbs, or sheep. Just to illustrate the syntax, the same thing |
| 285 | can be alternatively written with different arrangement into |
| 286 | lines. |
| 287 | |
| 288 | hongo00 2 chiba0 |
| 289 | istbs sheep |
| 290 | |
| 291 | Similar to hosts_file, you may instead specify a command line |
| 292 | producing the output conforming to the format of TARGETS_FILE. |
| 293 | |
| 294 | We have so far explained that target_regexp is matched against a |
| 295 | pool of known hosts to generate the actual list of targets. |
| 296 | There is an exception to this. If TARGET_REGEXP does not match |
| 297 | any host in the pool of known hosts, it is treated as if the |
| 298 | TARGET_REGEXP is itself a known host. Thus, |
| 299 | |
| 300 | gxpc explore hongo000 hongo001 |
| 301 | |
| 302 | will login hongo000 and hongo001, because neither hosts_file nor |
| 303 | hosts_cmd hosts are given so these expressions obviously won't |
| 304 | match any known host. Using this rule, you may have a file that |
| 305 | explicitly lists all hosts and solely use it to specify targets |
| 306 | without using separate HOSTS_FILE. For example, if you have a |
| 307 | long TARGETS_FILE called targets like: |
| 308 | |
| 309 | abc000 |
| 310 | abc001 |
| 311 | ... |
| 312 | abc099 |
| 313 | def000 |
| 314 | def001 |
| 315 | ... |
| 316 | def049 |
| 317 | pqr000 |
| 318 | pqr001 |
| 319 | ... |
| 320 | pqr149 |
| 321 | |
| 322 | and say |
| 323 | |
| 324 | gxpc explore -t targets |
| 325 | |
| 326 | you say you want to get these 300 targets using whatever methods |
| 327 | you specified by `use' commands. |
| 328 | |
| 329 | Unlike HOSTS_FILE, an empty line in TARGETS_FILE is treated as if |
| 330 | it is the end of file. By inserting an empty line, you can easily |
| 331 | let gxpc ignore the rest of the file. This rule is sometimes |
| 332 | convenient when targeting a small number of hosts within a |
| 333 | TARGETS_FILE. |
| 334 | |
| 335 | Here are some examples. |
| 336 | |
| 337 | 1. |
| 338 | |
| 339 | gxpc explore -h hosts_file chiba hongo |
| 340 | |
| 341 | Hosts beginning with chiba or hongo in hosts_file |
| 342 | become the targets. |
| 343 | |
| 344 | 2. |
| 345 | |
| 346 | gxpc explore -h hosts_file -t targets_file |
| 347 | |
| 348 | Hosts matching any regular expression in targets_file become |
| 349 | the targets. |
| 350 | |
| 351 | 3. |
| 352 | |
| 353 | gxpc explore -h hosts_file |
| 354 | |
| 355 | All hosts in hosts_file become the targets. Equivalent to `gxpc |
| 356 | explore -h hosts_file .' (`.' is a regular expression mathing |
| 357 | any non-empty string). |
| 358 | |
| 359 | 4. |
| 360 | |
| 361 | gxpc explore -t targets_file |
| 362 | |
| 363 | All hosts in targetfile become the targets. This is simiar to the |
| 364 | previous case, but the file format is different. Note that in |
| 365 | this case, strings in targets_file won't be matched against |
| 366 | anything, so they should be literal target names. |
| 367 | |
| 368 | 5. |
| 369 | |
| 370 | gxpc explore chiba000 chiba001 chiba002 chiba003 |
| 371 | |
| 372 | chiba000, chiba001, chiba002, and chiba003 become the targets. |
| 373 | |
| 374 | 6. |
| 375 | |
| 376 | gxpc explore chiba0 |
| 377 | |
| 378 | Equivalent to `gxpc explore -h /etc/hosts chiba0' which is hosts |
| 379 | beginning with chiba0 in /etc/hosts become the targets. Useful |
| 380 | when you use a single cluster and all necessary hosts are listed |
| 381 | in that file. |
| 382 | |
| 383 | 7. |
| 384 | |
| 385 | gxpc explore |
| 386 | |
| 387 | Equivalent to `gxpc explore -h /etc/hosts' which is in turn |
| 388 | equivalent to `gxpc explore -h /etc/hosts .' That is, all hosts |
| 389 | in /etc/hosts become the targets. This will be rarely useful |
| 390 | because /etc/hosts typically includes hosts you don't want to |
| 391 | use. |
| 392 | }}} |
| 393 | * 以在chiba這個cluster裡為例:(機器有chiba000-157,用到chiba000-003) |
| 394 | {{{ |
| 395 | $ dach005@chiba000:~$ gxpc e hostname (用gxpc e 來帶入要使用的指令"hostname",在一開始無login到其他hosts時先查看) |
| 396 | chiba000 |
| 397 | |
| 398 | $ dach005@chiba000:~$ gxpc use ssh chiba (表示可使用ssh指令login 到chiba cluster裡的hosts,即source hosts和target hosts都在此cluster裡) |
| 399 | |
| 400 | $ dach005@chiba000:~$ gxpc use (可看目前可ssh 的列表) |
| 401 | 0 : use ssh chiba chiba |
| 402 | |
| 403 | $ dach005@chiba000:~$ gxpc explore chiba00[[ 1-3 ]] (可用explore指令來達到真的login到遠端機器裡)(因為語法問題,下指令時無需空格) |
| 404 | reached : chiba001 |
| 405 | reached : chiba002 |
| 406 | reached : chiba003 |
| 407 | |
| 408 | $ dach005@chiba000:~$ gxpc e hostname (再看一次,可以發現目前我們可以reach到的機器列表) |
| 409 | chiba000 |
| 410 | chiba003 |
| 411 | chiba002 |
| 412 | chiba001 |
| 413 | |
| 414 | $ dach005@chiba000:~$ qsub test_3.sh (Torque 沒搭配GXP時,僅可以在本端機器這邊下命令) |
| 415 | 64.chiba000.intrigger.nii.ac.jp |
| 416 | |
| 417 | $ dach005@chiba000:~$ gxpc e qsub test_3.sh (搭配torque 我們發現就可以同時在本機及遠端機器下執行指令--> chiba000-003是GXP的執行nodes而非執行Torque的執行nodes) |
| 418 | 67.chiba000.intrigger.nii.ac.jp |
| 419 | 68.chiba000.intrigger.nii.ac.jp |
| 420 | 65.chiba000.intrigger.nii.ac.jp |
| 421 | 66.chiba000.intrigger.nii.ac.jp |
| 422 | }}} |