Changes between Version 1 and Version 2 of jazz/09-10-15


Ignore:
Timestamp:
Oct 15, 2009, 12:40:14 AM (15 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/09-10-15

    v1 v2  
    22
    33 * [http://www.summit09.ca/ Summit'09] - OGF 27
     4
     5== Torque ==
     6
     7 * 撰寫 pbs script
     8{{{
     9jazz@bio037:~$ cat myscript
     10#!/bin/bash
     11### Job 名稱
     12#PBS -N mytest
     13### 輸出檔案
     14#PBS -e /home/jazz/mytest.err
     15#PBS -o /home/jazz/mytest.log
     16###================================================
     17# 顯示目錄及時間資訊
     18echo Working directory is $PBS_O_WORKDIR
     19cd $PBS_O_WORKDIR
     20echo Running on host `hostname`
     21echo Time is `date`
     22echo Directory is `pwd`
     23# 執行檔案
     24date
     25}}}
     26 * 丟 job
     27{{{
     28jazz@bio037:~$ qsub < myscript
     2930.bio037
     30}}}
     31 * 查 job 執行過程
     32{{{
     33jazz@bio037:~$ tracejob 30
     34/var/spool/torque/mom_logs/20091015: No matching job records located
     35
     36Job: 30.bio037
     37
     3810/15/2009 00:38:59  S    enqueuing into batch, state 1 hop 1
     3910/15/2009 00:38:59  S    Job Queued at request of jazz@bio037, owner = jazz@bio037, job name = mytest, queue =
     40                          batch
     4110/15/2009 00:38:59  S    Job Modified at request of Scheduler@bio037
     4210/15/2009 00:38:59  S    Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb
     43                          resources_used.vmem=0kb resources_used.walltime=00:00:00
     4410/15/2009 00:38:59  L    Job Run
     4510/15/2009 00:38:59  S    Job Run at request of Scheduler@bio037
     4610/15/2009 00:38:59  A    queue=batch
     4710/15/2009 00:38:59  A    user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339
     48                          etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
     49                          Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1
     5010/15/2009 00:38:59  A    user=jazz group=jazz jobname=mytest queue=batch ctime=1255538339 qtime=1255538339
     51                          etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
     52                          Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 session=5366
     53                          end=1255538339 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb
     54                          resources_used.vmem=0kb resources_used.walltime=00:00:00
     5510/15/2009 00:39:07  S    Post job file processing error
     5610/15/2009 00:39:07  S    dequeuing from batch, state COMPLETE
     57}}}
     58 * 每個 Job 都可以用 jobid 去查執行的 host 是哪些,在 exec_host 這個變數
     59{{{
     60                          etime=1255538339 start=1255538339 owner=jazz@bio037 exec_host=bio013/0
     61}}}
     62 * 從錯誤訊息,可以明白每台 pbs_mom 執行過的 job 都會紀錄在 /var/spool/torque/mom_logs/日期
     63{{{
     64/var/spool/torque/mom_logs/20091015
     65}}}