| | 1 | |
| | 2 | == '''''Data Challenge''''' == |
| | 3 | * (宗旨?):先構想出較有利的機制並透過設計來調整Torque、GXP和Gfarm的運作,盡可能降低資料搬移量並且能取得最佳資源執行工作,更能在某資源fail時,能儘速交由另一資源來接續其工作,而不是重新執行此工作,來達到seamlessly完成工作,更進而加快完成速度。 |
| | 4 | * 簡易測試範例 |
| | 5 | * '''$ gxpc e --help''' |
| | 6 | {{{ |
| | 7 | Usage: |
| | 8 | gxpc e [OPTION ...] CMD |
| | 9 | gxpc mw [OPTION ...] CMD |
| | 10 | gxpc ep [OPTION ...] file |
| | 11 | |
| | 12 | Description: |
| | 13 | Execute the command on the selected nodes. |
| | 14 | |
| | 15 | Option (for mw only): |
| | 16 | --master 'command' |
| | 17 | equivalent to e --updown '3:4:command' ... |
| | 18 | if --master is not given, it is equivalent to e --updown 3:4 ... |
| | 19 | |
| | 20 | Options (for e, mw, and ep): |
| | 21 | --withmask,-m MASK |
| | 22 | execute on a set of nodes saved by savemask or pushmask |
| | 23 | --withhostmask,-h HOSTMASK |
| | 24 | execute on a set of nodes whose names match regexp HOSTMASK |
| | 25 | --withhostnegmask,-H HOSTMASK |
| | 26 | execute on a set of nodes whose names do not match regexp HOSTMASK |
| | 27 | --up FD0[:FD1] |
| | 28 | collect output from FD0 of CMD, and output them to FD1 of gxpc. |
| | 29 | if :FD1 is omitted, it is treated as if FD1 == FD0 |
| | 30 | --down FD0[:FD1] |
| | 31 | broadcast input to FD0 of gxpc to FD1 of CMD. |
| | 32 | if :FD1 is omitted, it is treated as if FD1 == FD0 |
| | 33 | --updown FD1:FD2[:MASTER] |
| | 34 | if :MASTER is omitted, collect output from FD1 of CMD, |
| | 35 | and broadcast them to FD2 of CMD. |
| | 36 | if :MASTER is given, run MASTER on the local host, collect |
| | 37 | output from FD1 of CMD, feed them to stdin of the MASTER. |
| | 38 | broadcast stdout of the MASTER to FD1 of CMD. |
| | 39 | --pty |
| | 40 | assign pseudo tty for stdin/stdout/stderr of CMD |
| | 41 | |
| | 42 | By default, |
| | 43 | |
| | 44 | - stdin of gxpc are broadcast to stdin of CMD |
| | 45 | - stdout of CMD are output to stdout of gxpc |
| | 46 | - stderr of CMD are output to stderr of gxpc |
| | 47 | |
| | 48 | This is as if `--down 0 --up 1 --up 2' are specified. In this |
| | 49 | case, stdout/stderr are block-buffered by default. You may need |
| | 50 | to do setbuf in your program or flush stdout/err, to display |
| | 51 | CMD's output without delay. --pty overwrites this and turn them |
| | 52 | to line-buffered (by default). both stdout/err of CMD now goto |
| | 53 | stdout of gxpc (they are merged). CMD's stdout/err should appear |
| | 54 | as soon as they are newlined. |
| | 55 | |
| | 56 | See Also: |
| | 57 | smask savemask pushmask rmask restoremask popmask |
| | 58 | }}} |
| | 59 | * $ '''gxpc use --help''' |
| | 60 | {{{ |
| | 61 | Usage: |
| | 62 | gxpc use [--as USER] RSH_NAME SRC [TARGET] |
| | 63 | gxpc use --delete [--as USER] RSH_NAME SRC [TARGET] |
| | 64 | gxpc use |
| | 65 | gxpc use --delete [idx] |
| | 66 | |
| | 67 | Description: |
| | 68 | |
| | 69 | Configure rsh-like commands used to login targets matching a |
| | 70 | particular pattern from hosts matching a particular pattern. The |
| | 71 | typical usage is `gxpc use RSH_NAME SRC TARGET', which says gxp can |
| | 72 | use an rsh-like command RSH_NAME for SRC to login TARGET. gxpc |
| | 73 | remembers these facts to decide which hosts should issue which |
| | 74 | commands to login which hosts, when explore command is issued. See the |
| | 75 | section of explore command and the tutorial section of the manual. |
| | 76 | |
| | 77 | Examples: |
| | 78 | gxpc use ssh abc000.def.com pqr.xyz.ac.jp |
| | 79 | gxpc use ssh abc000 pqr |
| | 80 | gxpc use ssh abc |
| | 81 | gxpc use rsh abc |
| | 82 | gxpc use --as taue ssh abc000 pqr |
| | 83 | gxpc use qrsh abc |
| | 84 | gxpc use qrsh_host abc |
| | 85 | gxpc use sge abc |
| | 86 | gxpc use torque abc |
| | 87 | |
| | 88 | The first line says that, if gxpc is told to login pqr.xyz.ac.jp by |
| | 89 | explore command, hosts named abc000.def.com can use `ssh' method to do |
| | 90 | so. How it translates into the actual ssh command line can be shown |
| | 91 | by `show_explore' command (try `gxpc help show_explore') and can be |
| | 92 | configured by `rsh' command (try `gxpc help rsh'). |
| | 93 | |
| | 94 | SRC and TARGET are actually regular expressions, so the line like the |
| | 95 | first one can often be written like the second one. The first line |
| | 96 | is equivalent to the second line as long as there is only one host |
| | 97 | begining with abc000 and there is only one target beginning with pqr. |
| | 98 | In general, the specification: |
| | 99 | |
| | 100 | gxpc use RSH_NAME SRC TARGET |
| | 101 | |
| | 102 | is read: if gxpc is told to login a target matching regular |
| | 103 | expession TARGET, a host matching regular expression SRC can use |
| | 104 | RSH_NAME to do so. |
| | 105 | |
| | 106 | Note that the effect of use command is NOT to specify which target |
| | 107 | gxpc should login, but to specify HOW it can do so, if it is told |
| | 108 | to. It is the role of explore command to specify which target hosts it |
| | 109 | should login |
| | 110 | |
| | 111 | If the TARGET argument is omitted as in the third line, it is |
| | 112 | treated as if TARGET expression is SRC. That is, the third line |
| | 113 | is equivalent to: |
| | 114 | |
| | 115 | gxpc use ssh abc abc |
| | 116 | |
| | 117 | This is often useful to express that ssh login is possible |
| | 118 | between hosts within a single cluster, which typically have a |
| | 119 | common prefix in their host names. If the traditional rsh command |
| | 120 | is allowed within a single cluster, the fourth line may be useful |
| | 121 | too. |
| | 122 | |
| | 123 | If --as user option is given, login is issued using an explicit user |
| | 124 | name. The fifth line says when gxp attempts to login pqr from abc000, |
| | 125 | the explicit user name `taue' should be given. You do not need this as |
| | 126 | long as the underlying rsh-like command will complement it by a |
| | 127 | configuration file. e.g., ssh will read ~/.ssh/config to complement |
| | 128 | user name used to login a particular host. |
| | 129 | |
| | 130 | qrsh_host uses command qrsh, with an explicit hostname argument |
| | 131 | to login a particular host (i.e., qrsh -l hostname=...). This is |
| | 132 | useful in environments where direct ssh is discouraged or |
| | 133 | disallowed and qrsh is preferred. |
| | 134 | |
| | 135 | qrsh also uses qrsh, but without an explicit hostname. The host |
| | 136 | is selected by the scheduler. Therefore it does not make sense to |
| | 137 | try to speficify a particular hostname as TARGET. Thus, the |
| | 138 | effect of the line |
| | 139 | |
| | 140 | gxpc use qrsh abc |
| | 141 | |
| | 142 | is if targets beginning with abc is given (upon explore command), |
| | 143 | a host beginning with abc will issue qrsh, and get whichever host |
| | 144 | is allocated by the scheduler. |
| | 145 | |
| | 146 | See Also: |
| | 147 | explore conf_explore |
| | 148 | }}} |
| | 149 | * $ '''gxpc explore --help''' |
| | 150 | {{{ |
| | 151 | Usage: |
| | 152 | gxpc explore [OPTIONS] TARGET TARGET ... |
| | 153 | |
| | 154 | Description: |
| | 155 | Login target hosts specified by OPTIONS and TARGET. |
| | 156 | |
| | 157 | Options: |
| | 158 | --dry |
| | 159 | dryrun. only show target hosts |
| | 160 | --hostfile,-h HOSTS_FILE |
| | 161 | give known hosts by file |
| | 162 | --hostcmd HOSTS_CMD |
| | 163 | give known hosts by command output |
| | 164 | --targetfile,-t TARGETS_FILE |
| | 165 | give target hosts by file |
| | 166 | --targetcmd TARGETS_CMD |
| | 167 | give target hosts by command output |
| | 168 | --timeout SECONDS |
| | 169 | specify the time to wait for a remote host's response |
| | 170 | until gxp considers it dead |
| | 171 | --children_soft_limit N (>= 2) |
| | 172 | control the shape of the explore tree. if this value is N, gxpc |
| | 173 | tries to keep the number of children of a single host no more than N, |
| | 174 | unless it is absolutely necessary to reach requested nodes. |
| | 175 | --children_hard_limit N |
| | 176 | control the shape of the explore tree. if this value is N, gxpc |
| | 177 | keeps the number of children of a single host no more than N, in any event. |
| | 178 | --verbosity N (0 <= N <= 2) |
| | 179 | set verbosity level (the larger the more verbose) |
| | 180 | --set_default |
| | 181 | if you set this option, options specified in this explore becomes the default. |
| | 182 | for example, if you say --timeout 20.0 and --set_default, timeout is set to |
| | 183 | 20.0 in subsequent explores, even if you do not specify --timeout. |
| | 184 | --reset_default |
| | 185 | reset the default values set by --set_default. |
| | 186 | --show_settings |
| | 187 | show effective explore options, considering those given by command line and |
| | 188 | those specified as default values. |
| | 189 | |
| | 190 | Execution of an explore command will conceptually consist of the |
| | 191 | following three steps. |
| | 192 | |
| | 193 | (1) Known Hosts: Know names of existing hosts, either by |
| | 194 | --hostfile, --hostcmd, or a default rule. These are called |
| | 195 | 'known hosts.' -h is an acronym of --hostfile. |
| | 196 | |
| | 197 | (2) Targets: Extract login targets from known hosts. They are |
| | 198 | extracted by regular expressions given either by --targetfile, |
| | 199 | --targetcmd, or directly by command line arguments. -t is |
| | 200 | an acronym of --targetfile. |
| | 201 | |
| | 202 | (3) gxpc will attempt to login these targets according to the |
| | 203 | rules specified by `use' commands. |
| | 204 | |
| | 205 | Known hosts are specified by a file using --hostfile option, or |
| | 206 | by output of a command using --hostcmd. Formats of the two are |
| | 207 | common and very simple. In the simplest format, a single file |
| | 208 | contains a single hostname. For example, |
| | 209 | |
| | 210 | hongo001 |
| | 211 | hongo002 |
| | 212 | hongo004 |
| | 213 | hongo005 |
| | 214 | hongo006 |
| | 215 | hongo007 |
| | 216 | hongo008 |
| | 217 | |
| | 218 | is a valid HOSTS_FILE. If you specify a command that outputs |
| | 219 | a list of files in the above format, the effect is the same |
| | 220 | as giving a file having the list by --hostfile. For example, |
| | 221 | |
| | 222 | --hostcmd 'for i in `seq 1 8` ; do printf "%03d\n" $i ; done' |
| | 223 | |
| | 224 | has the same effect as giving the above file to --hostfile. |
| | 225 | |
| | 226 | The format of a HOSTS_FILE is actually a so-called /etc/hosts |
| | 227 | format, each line of which may contain several aliases of the |
| | 228 | same host, as well as their IP address. gxpc simply regards them |
| | 229 | as aliases of a single host, wihtout giving any significance to |
| | 230 | which columns they are in. Anything after `#' on each line is a |
| | 231 | comment and ignored. Lines not containning any name, such as |
| | 232 | empty lines, are also ignored. The above simple format is |
| | 233 | obviously a special case of this. |
| | 234 | |
| | 235 | It is sometimes convenient to specify /etc/hosts as an argument |
| | 236 | to --hostfile or to specify `ypcat hosts' as an argument to |
| | 237 | --hostcmd. As a matter of fact, if you do not specify any of |
| | 238 | --hostfile, --hostcmd, --targetfile, and --targetcmd, it is |
| | 239 | treated as if --hostfile /etc/hosts is given. |
| | 240 | |
| | 241 | Login targets are specified by a file using --targetfile option, |
| | 242 | --targetcmd option, or by directly listing targets in the command |
| | 243 | line. Format of them are common and only slightly different from |
| | 244 | HOSTS_FILE. The format of the list of targets in the command |
| | 245 | line is as follows. |
| | 246 | |
| | 247 | TARGET_REGEXP [N] TARGET_REGEXP [N] TARGET_REGEXP [N] ... |
| | 248 | |
| | 249 | where N is an integer and TARGET_REGEXP is any string that cannot |
| | 250 | be parsed as an integer. That is, it is a list of regular |
| | 251 | expressions, each item of which may optionally be followed by an |
| | 252 | integer. The integer indicates how many logins should occur to |
| | 253 | the target matching TARGET_REGEXP. The following is a valid |
| | 254 | command line. |
| | 255 | |
| | 256 | gxpc explore -h hosts_file hongo00 |
| | 257 | |
| | 258 | which says you want to target all hosts beginning with hongo00, |
| | 259 | among all hosts listed in hosts_file. If, for example, you have |
| | 260 | specified by `use' command that the local host can login these |
| | 261 | hosts by ssh, you will reach hosts whose names begin with |
| | 262 | hongo00. If you instead say |
| | 263 | |
| | 264 | gxpc explore -h hosts_file hongo00 2 |
| | 265 | |
| | 266 | you will get two processes on each of these hosts. |
| | 267 | |
| | 268 | If you do not give any of --targetfile, --targetcmd, and command |
| | 269 | line targets, it is treated as if a regular expression mathing |
| | 270 | any string is given as the command line target. That is, all |
| | 271 | known hosts are targets. |
| | 272 | |
| | 273 | Format of targets_host is simply a list of lines each of which |
| | 274 | is like the list of arguments just explained above. Thus, the |
| | 275 | following is a valid TARGETS_FILE. |
| | 276 | |
| | 277 | hongo00 2 |
| | 278 | chiba0 |
| | 279 | istbs |
| | 280 | sheep |
| | 281 | |
| | 282 | which says you want to get two processes on each host beginning |
| | 283 | with hongo00 and one process on each host beginning with chiba0, |
| | 284 | istbs, or sheep. Just to illustrate the syntax, the same thing |
| | 285 | can be alternatively written with different arrangement into |
| | 286 | lines. |
| | 287 | |
| | 288 | hongo00 2 chiba0 |
| | 289 | istbs sheep |
| | 290 | |
| | 291 | Similar to hosts_file, you may instead specify a command line |
| | 292 | producing the output conforming to the format of TARGETS_FILE. |
| | 293 | |
| | 294 | We have so far explained that target_regexp is matched against a |
| | 295 | pool of known hosts to generate the actual list of targets. |
| | 296 | There is an exception to this. If TARGET_REGEXP does not match |
| | 297 | any host in the pool of known hosts, it is treated as if the |
| | 298 | TARGET_REGEXP is itself a known host. Thus, |
| | 299 | |
| | 300 | gxpc explore hongo000 hongo001 |
| | 301 | |
| | 302 | will login hongo000 and hongo001, because neither hosts_file nor |
| | 303 | hosts_cmd hosts are given so these expressions obviously won't |
| | 304 | match any known host. Using this rule, you may have a file that |
| | 305 | explicitly lists all hosts and solely use it to specify targets |
| | 306 | without using separate HOSTS_FILE. For example, if you have a |
| | 307 | long TARGETS_FILE called targets like: |
| | 308 | |
| | 309 | abc000 |
| | 310 | abc001 |
| | 311 | ... |
| | 312 | abc099 |
| | 313 | def000 |
| | 314 | def001 |
| | 315 | ... |
| | 316 | def049 |
| | 317 | pqr000 |
| | 318 | pqr001 |
| | 319 | ... |
| | 320 | pqr149 |
| | 321 | |
| | 322 | and say |
| | 323 | |
| | 324 | gxpc explore -t targets |
| | 325 | |
| | 326 | you say you want to get these 300 targets using whatever methods |
| | 327 | you specified by `use' commands. |
| | 328 | |
| | 329 | Unlike HOSTS_FILE, an empty line in TARGETS_FILE is treated as if |
| | 330 | it is the end of file. By inserting an empty line, you can easily |
| | 331 | let gxpc ignore the rest of the file. This rule is sometimes |
| | 332 | convenient when targeting a small number of hosts within a |
| | 333 | TARGETS_FILE. |
| | 334 | |
| | 335 | Here are some examples. |
| | 336 | |
| | 337 | 1. |
| | 338 | |
| | 339 | gxpc explore -h hosts_file chiba hongo |
| | 340 | |
| | 341 | Hosts beginning with chiba or hongo in hosts_file |
| | 342 | become the targets. |
| | 343 | |
| | 344 | 2. |
| | 345 | |
| | 346 | gxpc explore -h hosts_file -t targets_file |
| | 347 | |
| | 348 | Hosts matching any regular expression in targets_file become |
| | 349 | the targets. |
| | 350 | |
| | 351 | 3. |
| | 352 | |
| | 353 | gxpc explore -h hosts_file |
| | 354 | |
| | 355 | All hosts in hosts_file become the targets. Equivalent to `gxpc |
| | 356 | explore -h hosts_file .' (`.' is a regular expression mathing |
| | 357 | any non-empty string). |
| | 358 | |
| | 359 | 4. |
| | 360 | |
| | 361 | gxpc explore -t targets_file |
| | 362 | |
| | 363 | All hosts in targetfile become the targets. This is simiar to the |
| | 364 | previous case, but the file format is different. Note that in |
| | 365 | this case, strings in targets_file won't be matched against |
| | 366 | anything, so they should be literal target names. |
| | 367 | |
| | 368 | 5. |
| | 369 | |
| | 370 | gxpc explore chiba000 chiba001 chiba002 chiba003 |
| | 371 | |
| | 372 | chiba000, chiba001, chiba002, and chiba003 become the targets. |
| | 373 | |
| | 374 | 6. |
| | 375 | |
| | 376 | gxpc explore chiba0 |
| | 377 | |
| | 378 | Equivalent to `gxpc explore -h /etc/hosts chiba0' which is hosts |
| | 379 | beginning with chiba0 in /etc/hosts become the targets. Useful |
| | 380 | when you use a single cluster and all necessary hosts are listed |
| | 381 | in that file. |
| | 382 | |
| | 383 | 7. |
| | 384 | |
| | 385 | gxpc explore |
| | 386 | |
| | 387 | Equivalent to `gxpc explore -h /etc/hosts' which is in turn |
| | 388 | equivalent to `gxpc explore -h /etc/hosts .' That is, all hosts |
| | 389 | in /etc/hosts become the targets. This will be rarely useful |
| | 390 | because /etc/hosts typically includes hosts you don't want to |
| | 391 | use. |
| | 392 | }}} |
| | 393 | * 以在chiba這個cluster裡為例:(機器有chiba000-157,用到chiba000-003) |
| | 394 | {{{ |
| | 395 | $ dach005@chiba000:~$ gxpc e hostname (用gxpc e 來帶入要使用的指令"hostname",在一開始無login到其他hosts時先查看) |
| | 396 | chiba000 |
| | 397 | |
| | 398 | $ dach005@chiba000:~$ gxpc use ssh chiba (表示可使用ssh指令login 到chiba cluster裡的hosts,即source hosts和target hosts都在此cluster裡) |
| | 399 | |
| | 400 | $ dach005@chiba000:~$ gxpc use (可看目前可ssh 的列表) |
| | 401 | 0 : use ssh chiba chiba |
| | 402 | |
| | 403 | $ dach005@chiba000:~$ gxpc explore chiba00[[ 1-3 ]] (可用explore指令來達到真的login到遠端機器裡)(因為語法問題,下指令時無需空格) |
| | 404 | reached : chiba001 |
| | 405 | reached : chiba002 |
| | 406 | reached : chiba003 |
| | 407 | |
| | 408 | $ dach005@chiba000:~$ gxpc e hostname (再看一次,可以發現目前我們可以reach到的機器列表) |
| | 409 | chiba000 |
| | 410 | chiba003 |
| | 411 | chiba002 |
| | 412 | chiba001 |
| | 413 | |
| | 414 | $ dach005@chiba000:~$ qsub test_3.sh (Torque 沒搭配GXP時,僅可以在本端機器這邊下命令) |
| | 415 | 64.chiba000.intrigger.nii.ac.jp |
| | 416 | |
| | 417 | $ dach005@chiba000:~$ gxpc e qsub test_3.sh (搭配torque 我們發現就可以同時在本機及遠端機器下執行指令--> chiba000-003是GXP的執行nodes而非執行Torque的執行nodes) |
| | 418 | 67.chiba000.intrigger.nii.ac.jp |
| | 419 | 68.chiba000.intrigger.nii.ac.jp |
| | 420 | 65.chiba000.intrigger.nii.ac.jp |
| | 421 | 66.chiba000.intrigger.nii.ac.jp |
| | 422 | }}} |