- 撰寫 pbs script
jazz@bio037:~$ cat myscript
#!/bin/bash
### Job 名稱
#PBS -N mytest
### 輸出檔案
#PBS -e /home/jazz/mytest.err
#PBS -o /home/jazz/mytest.log
###================================================
# 顯示目錄及時間資訊
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
# 執行檔案
date
- 丟 job
jazz@bio037:~$ qsub < myscript
30.bio037
- 查 job 執行過程
jazz@bio037:~$ tracejob 30
/var/spool/torque/mom_logs/20091015: No matching job records located
Job: 30.bio037
10/15/2009 00:38:59 S enqueuing into batch, state 1 hop 1
10/15/2009 00:38:59 S Job Queued at request of jazz@bio037, owner = jazz@bio037, job name = mytest, queue =
batch
10/15/2009 00:38:59 S Job Modified at request of Scheduler@bio037
10/15/2009 00:38:59 S Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb
resources_used.vmem=0kb resources_used.walltime=00:00:00
10/15/2009 00:38:59 L Job Run
10/15/2009 00:38:59 S Job Run at request of Scheduler@bio037
10/15/2009 00:38:59 A queue=batch
10/15/2009 00:38:59 A user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339
etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1
10/15/2009 00:38:59 A user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339
etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 session=5366
end=1255538339 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb
resources_used.vmem=0kb resources_used.walltime=00:00:00
10/15/2009 00:39:07 S Post job file processing error
10/15/2009 00:39:07 S dequeuing from batch, state COMPLETE
- 每個 Job 都可以用 jobid 去查執行的 host 是哪些,在 exec_host 這個變數
etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
- 從錯誤訊息,可以明白每台 pbs_mom 執行過的 job 都會紀錄在 /var/spool/torque/mom_logs/日期
/var/spool/torque/mom_logs/20091015