| 4 | |
| 5 | == Torque == |
| 6 | |
| 7 | * 撰寫 pbs script |
| 8 | {{{ |
| 9 | jazz@bio037:~$ cat myscript |
| 10 | #!/bin/bash |
| 11 | ### Job 名稱 |
| 12 | #PBS -N mytest |
| 13 | ### 輸出檔案 |
| 14 | #PBS -e /home/jazz/mytest.err |
| 15 | #PBS -o /home/jazz/mytest.log |
| 16 | ###================================================ |
| 17 | # 顯示目錄及時間資訊 |
| 18 | echo Working directory is $PBS_O_WORKDIR |
| 19 | cd $PBS_O_WORKDIR |
| 20 | echo Running on host `hostname` |
| 21 | echo Time is `date` |
| 22 | echo Directory is `pwd` |
| 23 | # 執行檔案 |
| 24 | date |
| 25 | }}} |
| 26 | * 丟 job |
| 27 | {{{ |
| 28 | jazz@bio037:~$ qsub < myscript |
| 29 | 30.bio037 |
| 30 | }}} |
| 31 | * 查 job 執行過程 |
| 32 | {{{ |
| 33 | jazz@bio037:~$ tracejob 30 |
| 34 | /var/spool/torque/mom_logs/20091015: No matching job records located |
| 35 | |
| 36 | Job: 30.bio037 |
| 37 | |
| 38 | 10/15/2009 00:38:59 S enqueuing into batch, state 1 hop 1 |
| 39 | 10/15/2009 00:38:59 S Job Queued at request of jazz@bio037, owner = jazz@bio037, job name = mytest, queue = |
| 40 | batch |
| 41 | 10/15/2009 00:38:59 S Job Modified at request of Scheduler@bio037 |
| 42 | 10/15/2009 00:38:59 S Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb |
| 43 | resources_used.vmem=0kb resources_used.walltime=00:00:00 |
| 44 | 10/15/2009 00:38:59 L Job Run |
| 45 | 10/15/2009 00:38:59 S Job Run at request of Scheduler@bio037 |
| 46 | 10/15/2009 00:38:59 A queue=batch |
| 47 | 10/15/2009 00:38:59 A user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339 |
| 48 | etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0 |
| 49 | Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 |
| 50 | 10/15/2009 00:38:59 A user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339 |
| 51 | etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0 |
| 52 | Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 session=5366 |
| 53 | end=1255538339 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb |
| 54 | resources_used.vmem=0kb resources_used.walltime=00:00:00 |
| 55 | 10/15/2009 00:39:07 S Post job file processing error |
| 56 | 10/15/2009 00:39:07 S dequeuing from batch, state COMPLETE |
| 57 | }}} |
| 58 | * 每個 Job 都可以用 jobid 去查執行的 host 是哪些,在 exec_host 這個變數 |
| 59 | {{{ |
| 60 | etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0 |
| 61 | }}} |
| 62 | * 從錯誤訊息,可以明白每台 pbs_mom 執行過的 job 都會紀錄在 /var/spool/torque/mom_logs/日期 |
| 63 | {{{ |
| 64 | /var/spool/torque/mom_logs/20091015 |
| 65 | }}} |