wiki:jazz/09-10-15

2009-10-15

Torque

  • 撰寫 pbs script
    jazz@bio037:~$ cat myscript
    #!/bin/bash
    ### Job 名稱
    #PBS -N mytest
    ### 輸出檔案
    #PBS -e /home/jazz/mytest.err
    #PBS -o /home/jazz/mytest.log
    ###================================================
    # 顯示目錄及時間資訊
    echo Working directory is $PBS_O_WORKDIR
    cd $PBS_O_WORKDIR
    echo Running on host `hostname`
    echo Time is `date`
    echo Directory is `pwd`
    # 執行檔案
    date
    
  • 丟 job
    jazz@bio037:~$ qsub < myscript
    30.bio037
    
  • 查 job 執行過程
    jazz@bio037:~$ tracejob 30
    /var/spool/torque/mom_logs/20091015: No matching job records located
    
    Job: 30.bio037
    
    10/15/2009 00:38:59  S    enqueuing into batch, state 1 hop 1
    10/15/2009 00:38:59  S    Job Queued at request of jazz@bio037, owner = jazz@bio037, job name = mytest, queue =
                              batch
    10/15/2009 00:38:59  S    Job Modified at request of Scheduler@bio037
    10/15/2009 00:38:59  S    Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb
                              resources_used.vmem=0kb resources_used.walltime=00:00:00
    10/15/2009 00:38:59  L    Job Run
    10/15/2009 00:38:59  S    Job Run at request of Scheduler@bio037
    10/15/2009 00:38:59  A    queue=batch
    10/15/2009 00:38:59  A    user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339
                              etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
                              Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1
    10/15/2009 00:38:59  A    user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339
                              etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
                              Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 session=5366
                              end=1255538339 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb
                              resources_used.vmem=0kb resources_used.walltime=00:00:00
    10/15/2009 00:39:07  S    Post job file processing error
    10/15/2009 00:39:07  S    dequeuing from batch, state COMPLETE
    
  • 每個 Job 都可以用 jobid 去查執行的 host 是哪些,在 exec_host 這個變數
                              etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
    
  • 從錯誤訊息,可以明白每台 pbs_mom 執行過的 job 都會紀錄在 /var/spool/torque/mom_logs/日期
    /var/spool/torque/mom_logs/20091015
    
Last modified 15 years ago Last modified on Oct 15, 2009, 12:40:14 AM