<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="execution_environments.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="monitoring_debugging_stats.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter" title="Chapter 7. Submit Directory Details">
<div class="titlepage"><div><div><h2 class="title">
<a name="submit_directory"></a>Chapter 7. Submit Directory Details</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="submit_directory.php#submit_directory_layout">7.1. Layout</a></span></dt>
<dt><span class="section"><a href="submit_directory.php#condor_dagman_file">7.2. Condor DAGMan File</a></span></dt>
<dt><span class="section"><a href="submit_directory.php#kickstart_xml_record">7.3. Kickstart XML Record</a></span></dt>
<dt><span class="section"><a href="submit_directory.php#jobstate_log_file">7.4. Jobstate.Log File</a></span></dt>
<dt><span class="section"><a href="submit_directory.php#braindump_file">7.5. Braindump File</a></span></dt>
<dt><span class="section"><a href="submit_directory.php#static_bp_file">7.6. Pegasus static.bp File</a></span></dt>
</dl></div>
<p>This chapter describes the submit directory content after Pegasus has
  planned a workflow. Pegasus takes in an abstract workflow ( DAX ) and
  generates an executable workflow (DAG) in the submit directory.</p>
<p>This document also describes the various Replica Selection Strategies
  in Pegasus.</p>
<div class="section" title="7.1. Layout">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="submit_directory_layout"></a>7.1. Layout</h2></div></div></div>
<p>Each executable workflow is associated with a submit directory, and
    includes the following:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
<p><span class="bold"><strong>&lt;daxlabel-daxindex&gt;.dag
        </strong></span></p>
<p>This is the Condor DAGMman dag file corresponding to the
        executable workflow generated by Pegasus. The dag file describes the
        edges in the DAG and information about the jobs in the DAG. Pegasus
        generated .dag file usually contains the following information for
        each job</p>
<div class="orderedlist"><ol class="orderedlist" type="a">
<li class="listitem"><p>The job submit file for each job in the DAG.</p></li>
<li class="listitem"><p>The post script that is to be invoked when a job completes.
            This is usually located at <span class="bold"><strong>$PEGASUS_HOME/bin/exitpost</strong></span> and parses the
            kickstart record in the job's<span class="bold"><strong>.out
            file</strong></span> and determines the exitcode.</p></li>
<li class="listitem"><p>JOB RETRY - the number of times the job is to be retried in
            case of failure. In Pegasus, the job postscript exits with a non
            zero exitcode if it determines a failure occurred.</p></li>
</ol></div>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;daxlabel-daxindex&gt;.dag.dagman.out</strong></span></p>
<p>When a DAG ( .dag file ) is executed by Condor DAGMan , the
        DAGMan writes out it's output to the <span class="bold"><strong>&lt;daxlabel-daxindex&gt;.dag.dagman.out file</strong></span> .
        This file tells us the progress of the workflow, and can be used to
        determine the status of the workflow. Most of pegasus tools mine the
        <span class="bold"><strong>dagman.out</strong></span> or <span class="bold"><strong>jobstate.log</strong></span> to determine the progress of the
        workflows.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;daxlabel-daxindex&gt;.static.bp</strong></span></p>
<p>This file contains netlogger events that link jobs in the DAG
        with the jobs in the DAX. This file is parsed by pegasus-monitord when
        a workflow starts and populated to the stampede backend.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;daxlabel-daxindex&gt;.notify</strong></span></p>
<p>This file contains all the notifications that need to be set for
        the workflow and the jobs in the executable workflow. The format of
        notify file is described <a class="link" href="reference.php#pegasus_notify_file" title="10.7.2. Notify File created by Pegasus in the submit directory">here</a></p>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;daxlabel-daxindex&gt;.replica.store</strong></span></p>
<p>This is a file based replica catalog, that only lists file
        locations are mentioned in the DAX.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;daxlabel-daxindex&gt;.dot
        </strong></span></p>
<p>Pegasus creates a dot file for the executable workflow in
        addition to the .dag file. This can be used to visualize the
        executable workflow using the dot program.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;job&gt;.sub</strong></span></p>
<p>Each job in the executable workflow is associated with it's own
        submit file. The submit file tells Condor how to execute the
        job.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;job&gt;.out.00n</strong></span></p>
<p>The stdout of the executable referred in the job submit file. In
        Pegasus, most jobs are launched via kickstart. Hence, this file
        contains the kickstart XML provenance record that captures runtime
        provenance on the remote node where the job was executed. n varies
        from 1-N where N is the JOB RETRY value in the .dag file. The exitpost
        executable is invoked on the &lt;job&gt;.out file and it moves the
        &lt;job&gt;.out to &lt;job&gt;.out.00n so that the the job's .out
        files are preserved across retries.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>&lt;job&gt;.err.00n</strong></span></p>
<p>The stderr of the executable referred in the job submit file. In
        case of Pegasus, mostly the jobs are launched via kickstart. Hence,
        this file contains stderr of kickstart. This is usually empty unless
        there in an error in kickstart e.g. kickstart segfaults , or kickstart
        location specified in the submit file is incorrect. The exitpost
        executable is invoked on the <span class="bold"><strong>&lt;job&gt;.out</strong></span> file and it moves the <span class="bold"><strong> &lt;job&gt;.err to &lt;job&gt;.err.00n</strong></span> so that
        the the job's <span class="bold"><strong>.out</strong></span> files are
        preserved across retries.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>jobstate.log</strong></span></p>
<p>The jobstate.log file is written out by the pegasus-monitord
        daemon that is launched when a workflow is submitted for execution by
        pegasus-run. The pegasus-monitord daemon parses the dagman.out file
        and writes out the jobstate.log that is easier to parse. The
        jobstate.log captures the various states through which a job goes
        during the workflow. There are other monitoring related files that are
        explained in the monitoring <a class="link" href="reference.php#monitoring-files" title="10.8.1.2. Monitoring related files in the workflow directory">chapter</a>.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>braindump.txt </strong></span></p>
<p>Contains information about pegasus version, dax file, dag file,
        dax label.</p>
</li>
</ol></div>
</div>
<div class="section" title="7.2. Condor DAGMan File">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="condor_dagman_file"></a>7.2. Condor DAGMan File</h2></div></div></div>
<div class="toc"><dl><dt><span class="section"><a href="submit_directory.php#idp9515088">7.2.1. Sample Condor DAG File</a></span></dt></dl></div>
<p>The Condor DAGMan file ( .dag ) is the input to Condor DAGMan ( the
    workflow executor used by Pegasus ) .</p>
<p>Pegasus generated .dag file usually contains the following
    information for each job:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>The job submit file for each job in the DAG.</p></li>
<li class="listitem"><p>The post script that is to be invoked when a job completes. This
        is usually found in <span class="bold"><strong>$PEGASUS_HOME/bin/exitpost</strong></span> and parses the
        kickstart record in the job's .out file and determines the
        exitcode.</p></li>
<li class="listitem"><p>JOB RETRY - the number of times the job is to be retried in case
        of failure. In case of Pegasus, job postscript exits with a non zero
        exitcode if it determines a failure occurred.</p></li>
<li class="listitem"><p>The pre script to be invoked before running a job. This is
        usually for the dax jobs in the DAX. The pre script is pegasus-plan
        invocation for the subdax.</p></li>
</ol></div>
<p>In the last section of the DAG file the relations between the jobs (
    that identify the underlying DAG structure ) are highlighted.</p>
<div class="section" title="7.2.1. Sample Condor DAG File">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp9515088"></a>7.2.1. Sample Condor DAG File</h3></div></div></div>
<pre class="programlisting">#####################################################################
# PEGASUS WMS GENERATED DAG FILE
# DAG blackdiamond
# Index = 0, Count = 1
######################################################################

<span class="bold"><strong>JOB</strong></span> create_dir_blackdiamond_0_isi_viz create_dir_blackdiamond_0_isi_viz.sub
<span class="bold"><strong>SCRIPT POST</strong></span> create_dir_blackdiamond_0_isi_viz /pegasus/bin/pegasus-exitcode   \
                                   /submit-dir/create_dir_blackdiamond_0_isi_viz.out
<span class="bold"><strong>RETRY</strong></span> create_dir_blackdiamond_0_isi_viz 3

JOB create_dir_blackdiamond_0_local create_dir_blackdiamond_0_local.sub
SCRIPT POST create_dir_blackdiamond_0_local /pegasus/bin/pegasus-exitcode   
                                   /submit-dir/create_dir_blackdiamond_0_local.out

JOB pegasus_concat_blackdiamond_0 pegasus_concat_blackdiamond_0.sub

JOB stage_in_local_isi_viz_0 stage_in_local_isi_viz_0.sub
SCRIPT POST stage_in_local_isi_viz_0 /pegasus/bin/pegasus-exitcode   \
                                     /submit-dir/stage_in_local_isi_viz_0.out

JOB chmod_preprocess_ID000001_0 chmod_preprocess_ID000001_0.sub
SCRIPT POST chmod_preprocess_ID000001_0 /pegasus/bin/pegasus-exitcode \
                                        /submit-dir/chmod_preprocess_ID000001_0.out

JOB preprocess_ID000001 preprocess_ID000001.sub
SCRIPT POST preprocess_ID000001 /pegasus/bin/pegasus-exitcode   \
                                         /submit-dir/preprocess_ID000001.out

JOB subdax_black_ID000002 subdax_black_ID000002.sub
<span class="bold"><strong>SCRIPT PRE</strong></span> subdax_black_ID000002 /pegasus/bin/pegasus-plan  \
      -Dpegasus.user.properties=/submit-dir/./dag_1/test_ID000002/pegasus.3862379342822189446.properties\
      -Dpegasus.log.*=/submit-dir/subdax_black_ID000002.pre.log \
      -Dpegasus.dir.exec=app_domain/app -Dpegasus.dir.storage=duncan -Xmx1024 -Xms512\
      --dir /pegasus-features/dax-3.2/dags \
      --relative-dir user/pegasus/blackdiamond/run0005/user/pegasus/blackdiamond/run0005/./dag_1 \
      --relative-submit-dir user/pegasus/blackdiamond/run0005/./dag_1/test_ID000002\
      --basename black --sites dax_site \
      --output local --force  --nocleanup  \
      --verbose  --verbose  --verbose  --verbose  --verbose  --verbose  --verbose \
      --verbose  --monitor  --deferred  --group pegasus --rescue 0 \
      --dax /submit-dir/./dag_1/test_ID000002/dax/blackdiamond_dax.xml 

JOB stage_out_local_isi_viz_0_0 stage_out_local_isi_viz_0_0.sub
SCRIPT POST stage_out_local_isi_viz_0_0 /pegasus/bin/pegasus-exitcode   /submit-dir/stage_out_local_isi_viz_0_0.out

<span class="bold"><strong>SUBDAG EXTERNAL</strong></span> subdag_black_ID000003 /Users/user/Pegasus/work/dax-3.2/black.dag DIR /duncan/test

JOB clean_up_stage_out_local_isi_viz_0_0 clean_up_stage_out_local_isi_viz_0_0.sub
SCRIPT POST clean_up_stage_out_local_isi_viz_0_0 /lfs1/devel/Pegasus/pegasus/bin/pegasus-exitcode  \
                                          /submit-dir/clean_up_stage_out_local_isi_viz_0_0.out

JOB clean_up_preprocess_ID000001 clean_up_preprocess_ID000001.sub
SCRIPT POST clean_up_preprocess_ID000001 /lfs1/devel/Pegasus/pegasus/bin/pegasus-exitcode  \
                                     /submit-dir/clean_up_preprocess_ID000001.out
<span class="bold"><strong>
PARENT create_dir_blackdiamond_0_isi_viz CHILD pegasus_concat_blackdiamond_0</strong></span>
PARENT create_dir_blackdiamond_0_local CHILD pegasus_concat_blackdiamond_0
PARENT stage_out_local_isi_viz_0_0 CHILD clean_up_stage_out_local_isi_viz_0_0
PARENT stage_out_local_isi_viz_0_0 CHILD clean_up_preprocess_ID000001
PARENT preprocess_ID000001 CHILD subdax_black_ID000002
PARENT preprocess_ID000001 CHILD stage_out_local_isi_viz_0_0
PARENT subdax_black_ID000002 CHILD subdag_black_ID000003
PARENT stage_in_local_isi_viz_0 CHILD chmod_preprocess_ID000001_0
PARENT stage_in_local_isi_viz_0 CHILD preprocess_ID000001
PARENT chmod_preprocess_ID000001_0 CHILD preprocess_ID000001
PARENT pegasus_concat_blackdiamond_0 CHILD stage_in_local_isi_viz_0
######################################################################
# End of DAG
######################################################################
</pre>
</div>
</div>
<div class="section" title="7.3. Kickstart XML Record">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="kickstart_xml_record"></a>7.3. Kickstart XML Record</h2></div></div></div>
<div class="toc"><dl><dt><span class="section"><a href="submit_directory.php#idp6841024">7.3.1. Reading a Kickstart Output File</a></span></dt></dl></div>
<p>Kickstart is a light weight C executable that is shipped with the
    pegasus worker package. All jobs are launced via Kickstart on the remote
    end, unless explicitly disabled at the time of running
    pegasus-plan.</p>
<p>Kickstart does not work with:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Condor Standard Universe Jobs</p></li>
<li class="listitem"><p>MPI Jobs</p></li>
</ol></div>
<p>Pegasus automatically disables kickstart for the above jobs.</p>
<p>Kickstart captures useful runtime provenance information about the
    job launched by it on the remote note, and puts in an XML record that it
    writes to its own stdout. The stdout appears in the workflow submit
    directory as &lt;job&gt;.out.00n . The following information is captured
    by kickstart and logged:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>The exitcode with which the job it launched exited.</p></li>
<li class="listitem"><p>The duration of the job</p></li>
<li class="listitem"><p>The start time for the job</p></li>
<li class="listitem"><p>The node on which the job ran</p></li>
<li class="listitem"><p>The stdout and stderr of the job</p></li>
<li class="listitem"><p>The arguments with which it launched the job</p></li>
<li class="listitem"><p>The environment that was set for the job before it was
        launched.</p></li>
<li class="listitem"><p>The machine information about the node that the job ran
        on</p></li>
</ol></div>
<p>Amongst the above information, the dagman.out file gives a coarser
    grained estimate of the job duration and start time.</p>
<div class="section" title="7.3.1. Reading a Kickstart Output File">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp6841024"></a>7.3.1. Reading a Kickstart Output File</h3></div></div></div>
<p>The kickstart file below has the following fields
      highlighted:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>The host on which the job executed and the ipaddress of that
          host</p></li>
<li class="listitem"><p>The duration and start time of the job. The time here is in
          reference to the clock on the remote node where the job is
          executed.</p></li>
<li class="listitem"><p>The exitcode with which the job executed</p></li>
<li class="listitem"><p>The arguments with which the job was launched.</p></li>
<li class="listitem"><p>The directory in which the job executed on the remote
          site</p></li>
<li class="listitem"><p>The stdout of the job</p></li>
<li class="listitem"><p>The stderr of the job</p></li>
<li class="listitem"><p>The environment of the job</p></li>
</ol></div>
<pre class="programlisting">&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;

&lt;invocation xmlns="http://pegasus.isi.edu/schema/invocation" \
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" \
       xsi:schemaLocation="http://pegasus.isi.edu/schema/invocation http://pegasus.isi.edu/schema/iv-2.0.xsd" \
       version="2.0" start="2009-01-30T19:17:41.157-06:00" duration="0.321" transformation="pegasus::dirmanager"\
      derivation="pegasus::dirmanager:1.0" resource="cobalt" wf-label="scb" \
      wf-stamp="2009-01-30T17:12:55-08:00"<span class="bold"><strong> hostaddr="141.142.30.219" hostname="co-login.ncsa.uiuc.edu"</strong></span>\
      pid="27714" uid="29548" user="vahi" gid="13872" group="bvr" umask="0022"&gt;

<span class="bold"><strong>&lt;mainjob start="2009-01-30T19:17:41.426-06:00" duration="0.052" pid="27783"&gt;
</strong></span>
&lt;usage utime="0.036" stime="0.004" minflt="739" majflt="0" nswap="0" nsignals="0" nvcsw="36" nivcsw="3"/&gt;

<span class="bold"><strong>&lt;status raw="0"&gt;&lt;regular exitcode="0"/&gt;&lt;/status&gt;</strong></span>

&lt;statcall error="0"&gt;
&lt;!-- deferred flag: 0 --&gt;
&lt;file name="/u/ac/vahi/SOFTWARE/pegasus/default/bin/dirmanager"&gt;23212F7573722F62696E2F656E762070&lt;/file&gt;
&lt;statinfo mode="0100755" size="8202" inode="85904615883" nlink="1" blksize="16384" \
    blocks="24" mtime="2008-09-22T18:52:37-05:00" atime="2009-01-30T14:54:18-06:00" \
    ctime="2009-01-13T19:09:47-06:00" uid="29548" user="vahi" gid="13872" group="bvr"/&gt;
&lt;/statcall&gt;

<span class="bold"><strong>&lt;argument-vector&gt;
&lt;arg nr="1"&gt;--create&lt;/arg&gt;
&lt;arg nr="2"&gt;--dir&lt;/arg&gt;
&lt;arg nr="3"&gt;/u/ac/vahi/globus-test/EXEC/vahi/pegasus/scb/run0001&lt;/arg&gt;
&lt;/argument-vector&gt;</strong></span>

&lt;/mainjob&gt;<span class="bold"><strong>

&lt;cwd&gt;/u/ac/vahi/globus-test/EXEC&lt;/cwd&gt;</strong></span>

&lt;usage utime="0.012" stime="0.208" minflt="4232" majflt="0" nswap="0" nsignals="0" nvcsw="15" nivcsw="74"/&gt;
&lt;machine page-size="16384" provider="LINUX"&gt;
&lt;stamp&gt;2009-01-30T19:17:41.157-06:00&lt;/stamp&gt;
&lt;uname system="linux" nodename="co-login" release="2.6.16.54-0.2.5-default" machine="ia64"&gt;#1 SMP Mon Jan 21\
         13:29:51 UTC 2008&lt;/uname&gt;
&lt;ram total="148299268096" free="123371929600" shared="0" buffer="2801664"/&gt;
&lt;swap total="1179656486912" free="1179656486912"/&gt;
&lt;boot idle="1315786.920"&gt;2009-01-15T10:19:50.283-06:00&lt;/boot&gt;
&lt;cpu count="32" speed="1600" vendor=""&gt;&lt;/cpu&gt;
&lt;load min1="3.50" min5="3.50" min15="2.60"/&gt;
&lt;proc total="841" running="5" sleeping="828" stopped="5" vmsize="10025418752" rss="2524299264"/&gt;
&lt;task total="1125" running="6" sleeping="1114" stopped="5"/&gt;
&lt;/machine&gt;
&lt;statcall error="0" id="stdin"&gt;
&lt;!-- deferred flag: 0 --&gt;
&lt;file name="/dev/null"/&gt;
&lt;statinfo mode="020666" size="0" inode="68697" nlink="1" blksize="16384" blocks="0" \
     mtime="2007-05-04T05:54:02-05:00" atime="2007-05-04T05:54:02-05:00" \
   ctime="2009-01-15T10:21:54-06:00" uid="0" user="root" gid="0" group="root"/&gt;
&lt;/statcall&gt;

<span class="bold"><strong>&lt;statcall error="0" id="stdout"&gt;
&lt;temporary name="/tmp/gs.out.s9rTJL" descriptor="3"/&gt;
&lt;statinfo mode="0100600" size="29" inode="203420686" nlink="1" blksize="16384" blocks="128" \
 mtime="2009-01-30T19:17:41-06:00" atime="2009-01-30T19:17:41-06:00"\
 ctime="2009-01-30T19:17:41-06:00" uid="29548" user="vahi" gid="13872" group="bvr"/&gt;
&lt;data&gt;mkdir finished successfully.
&lt;/data&gt;
&lt;/statcall&gt;
&lt;statcall error="0" id="stderr"&gt;
&lt;temporary name="/tmp/gs.err.kobn3S" descriptor="5"/&gt;
&lt;statinfo mode="0100600" size="0" inode="203420689" nlink="1" blksize="16384" blocks="0" \
 mtime="2009-01-30T19:17:41-06:00" atime="2009-01-30T19:17:41-06:00" \
ctime="2009-01-30T19:17:41-06:00" uid="29548" user="vahi" gid="13872" group="bvr"/&gt;
&lt;/statcall&gt;
</strong></span>
&lt;statcall error="0" id="gridstart"&gt;
&lt;!-- deferred flag: 0 --&gt;
&lt;file name="/u/ac/vahi/SOFTWARE/pegasus/default/bin/kickstart"&gt;7F454C46020101000000000000000000&lt;/file&gt;
&lt;statinfo mode="0100755" size="255445" inode="85904615876" nlink="1" blksize="16384" blocks="504" \
  mtime="2009-01-30T18:06:28-06:00" atime="2009-01-30T19:17:41-06:00"\
 ctime="2009-01-30T18:06:28-06:00" uid="29548" user="vahi" gid="13872" group="bvr"/&gt;
&lt;/statcall&gt;
&lt;statcall error="0" id="logfile"&gt;
&lt;descriptor number="1"/&gt;
&lt;statinfo mode="0100600" size="0" inode="53040253" nlink="1" blksize="16384" blocks="0" \
 mtime="2009-01-30T19:17:39-06:00" atime="2009-01-30T19:17:39-06:00" \
ctime="2009-01-30T19:17:39-06:00" uid="29548" user="vahi" gid="13872" group="bvr"/&gt;
&lt;/statcall&gt;
&lt;statcall error="0" id="channel"&gt;
&lt;fifo name="/tmp/gs.app.Ien1m0" descriptor="7" count="0" rsize="0" wsize="0"/&gt;
&lt;statinfo mode="010640" size="0" inode="203420696" nlink="1" blksize="16384" blocks="0" \
  mtime="2009-01-30T19:17:41-06:00" atime="2009-01-30T19:17:41-06:00" \
ctime="2009-01-30T19:17:41-06:00" uid="29548" user="vahi" gid="13872" group="bvr"/&gt;
&lt;/statcall&gt;

<span class="bold"><strong>&lt;environment&gt;
&lt;env key="GLOBUS_GRAM_JOB_CONTACT"&gt;https://co-login.ncsa.uiuc.edu:50001/27456/1233364659/&lt;/env&gt;
&lt;env key="GLOBUS_GRAM_MYJOB_CONTACT"&gt;URLx-nexus://co-login.ncsa.uiuc.edu:50002/&lt;/env&gt;
&lt;env key="GLOBUS_LOCATION"&gt;/usr/local/prews-gram-4.0.7-r1/&lt;/env&gt;
....
&lt;/environment&gt;
</strong></span>
&lt;resource&gt;
&lt;soft id="RLIMIT_CPU"&gt;unlimited&lt;/soft&gt;
&lt;hard id="RLIMIT_CPU"&gt;unlimited&lt;/hard&gt;
&lt;soft id="RLIMIT_FSIZE"&gt;unlimited&lt;/soft&gt;
....
&lt;/resource&gt;
&lt;/invocation&gt;</pre>
</div>
</div>
<div class="section" title="7.4. Jobstate.Log File">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="jobstate_log_file"></a>7.4. Jobstate.Log File</h2></div></div></div>
<div class="toc"><dl><dt><span class="section"><a href="submit_directory.php#submit_directory-delays">7.4.1. Pegasus Workflow Job States and Delays</a></span></dt></dl></div>
<p>The jobstate.log file logs the various states that a job goes
    through during workflow execution. It is created by the <span class="bold"><strong>pegasus-monitord</strong></span> daemon that is launched when a
    workflow is submitted to Condor DAGMan by pegasus-run. <span class="bold"><strong>pegasus-monitord</strong></span> parses the dagman.out file and
    writes out the jobstate.log file, the format of which is more amenable to
    parsing.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>The jobstate.log file is not created if a user uses
      condor_submit_dag to submit a workflow to Condor DAGMan.</p>
</div>
<p>The jobstate.log file can be created after a workflow has finished
    executing by running <span class="bold"><strong>pegasus-monitord</strong></span> on
    the .dagman.out file in the workflow submit directory.</p>
<p>Below is a snippet from the jobstate.log for a single job executed
    via condorg:</p>
<pre class="programlisting">1239666049 create_dir_blackdiamond_0_isi_viz SUBMIT 3758.0 isi_viz - 1
1239666059 create_dir_blackdiamond_0_isi_viz EXECUTE 3758.0 isi_viz - 1
1239666059 create_dir_blackdiamond_0_isi_viz GLOBUS_SUBMIT 3758.0 isi_viz - 1
1239666059 create_dir_blackdiamond_0_isi_viz GRID_SUBMIT 3758.0 isi_viz - 1
1239666064 create_dir_blackdiamond_0_isi_viz JOB_TERMINATED 3758.0 isi_viz - 1
1239666064 create_dir_blackdiamond_0_isi_viz JOB_SUCCESS 0 isi_viz - 1
1239666064 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_STARTED - isi_viz - 1
1239666069 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_TERMINATED 3758.0 isi_viz - 1
1239666069 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_SUCCESS - isi_viz - 1</pre>
<p>Each entry in jobstate.log has the following:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>The ISO timestamp for the time at which the particular event
        happened.</p></li>
<li class="listitem"><p>The name of the job.</p></li>
<li class="listitem"><p>The event recorded by DAGMan for the job.</p></li>
<li class="listitem"><p>The condor id of the job in the queue on the submit node.</p></li>
<li class="listitem"><p>The pegasus site to which the job is mapped.</p></li>
<li class="listitem"><p>The job time requirements from the submit file.</p></li>
<li class="listitem"><p>The job submit sequence for this workflow.</p></li>
</ol></div>
<div class="table">
<a name="idp16347456"></a><p class="title"><b>Table 7.1. Table 1: The job lifecycle when executed as part of the
      workflow</b></p>
<div class="table-contents"><table summary="Table 1: The job lifecycle when executed as part of the
      workflow" border="1">
<colgroup>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><span class="bold"><strong>STATE/EVENT</strong></span></td>
<td><span class="bold"><strong>DESCRIPTION</strong></span></td>
</tr>
<tr>
<td>SUBMIT</td>
<td>job is submitted by condor schedd for execution.</td>
</tr>
<tr>
<td>EXECUTE</td>
<td>condor schedd detects that a job has started
            execution.</td>
</tr>
<tr>
<td>GLOBUS_SUBMIT</td>
<td>the job has been submitted to the remote resource. It's
            only written for GRAM jobs (i.e. gt2 and gt4).</td>
</tr>
<tr>
<td>GRID_SUBMIT</td>
<td>same as GLOBUS_SUBMIT event. The ULOG_GRID_SUBMIT event is
            written for all grid universe jobs./</td>
</tr>
<tr>
<td>JOB_TERMINATED</td>
<td>job terminated on the remote node.</td>
</tr>
<tr>
<td>JOB_SUCCESS</td>
<td>job succeeded on the remote host, condor id will be zero
            (successful exit code).</td>
</tr>
<tr>
<td>JOB_FAILURE</td>
<td>job failed on the remote host, condor id will be the job's
            exit code.</td>
</tr>
<tr>
<td>POST_SCRIPT_STARTED</td>
<td>post script started by DAGMan on the submit host, usually
            to parse the kickstart output</td>
</tr>
<tr>
<td>POST_SCRIPT_TERMINATED</td>
<td>post script finished on the submit node.</td>
</tr>
<tr>
<td>POST_SCRIPT_SUCCESS | POST_SCRIPT_FAILURE</td>
<td>post script succeeded or failed.</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>There are other monitoring related files that are explained in the
    monitoring <a class="link" href="reference.php#monitoring-files" title="10.8.1.2. Monitoring related files in the workflow directory">chapter</a>.</p>
<div class="section" title="7.4.1. Pegasus Workflow Job States and Delays">
<div class="titlepage"><div><div><h3 class="title">
<a name="submit_directory-delays"></a>7.4.1. Pegasus Workflow Job States and Delays</h3></div></div></div>
<p>The various job states that a job goes through ( as caputured in
      the dagman.out and jobstate.log file) during it's lifecycle are
      illustrated below. The figure below highlights the various local and
      remote delays during job lifecycle.</p>
<div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/Pegasus_Job_State_Delay.jpg" height="360"></td></tr></table></div>
</div>
</div>
<div class="section" title="7.5. Braindump File">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="braindump_file"></a>7.5. Braindump File</h2></div></div></div>
<p>The braindump file is created per workflow in the submit file and
    contains metadata about the workflow.</p>
<div class="table">
<a name="idp8898656"></a><p class="title"><b>Table 7.2. Table 2: Information Captured in Braindump File</b></p>
<div class="table-contents"><table summary="Table 2: Information Captured in Braindump File" border="1">
<colgroup>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><span class="bold"><strong>KEY</strong></span></td>
<td><span class="bold"><strong>DESCRIPTION</strong></span></td>
</tr>
<tr>
<td>user</td>
<td>the username of the user that ran pegasus-plan</td>
</tr>
<tr>
<td>grid_dn</td>
<td>the Distinguished Name in the proxy</td>
</tr>
<tr>
<td>submit_hostname</td>
<td>the hostname of the submit host</td>
</tr>
<tr>
<td>root_wf_uuid</td>
<td>the workflow uuid of the root workflow</td>
</tr>
<tr>
<td>wf_uuid</td>
<td>the workflow uuid of the current workflow i.e the one whose
            submit directory the braindump file is.</td>
</tr>
<tr>
<td>dax</td>
<td>the path to the dax file</td>
</tr>
<tr>
<td>dax_label</td>
<td>the label attribute in the adag element of the dax</td>
</tr>
<tr>
<td>dax_index</td>
<td>the index in the dax.</td>
</tr>
<tr>
<td>dax_version</td>
<td>the version of the DAX schema that DAX referred to.</td>
</tr>
<tr>
<td>pegasus_wf_name</td>
<td>the workflow name constructed by pegasus when
            planning</td>
</tr>
<tr>
<td>timestamp</td>
<td>the timestamp when planning occured</td>
</tr>
<tr>
<td>basedir</td>
<td>the base submit directory</td>
</tr>
<tr>
<td>submit_dir</td>
<td>the full path for the submit directory</td>
</tr>
<tr>
<td>properties</td>
<td>the full path to the properties file in the submit
            directory</td>
</tr>
<tr>
<td>planner</td>
<td>the planner used to construct the executable workflow.
            always pegasus</td>
</tr>
<tr>
<td>planner_version</td>
<td>the versions of the planner</td>
</tr>
<tr>
<td>pegasus_build</td>
<td>the build timestamp</td>
</tr>
<tr>
<td>planner_arguments</td>
<td>the arguments with which the planner is invoked.</td>
</tr>
<tr>
<td>jsd</td>
<td>the path to the jobstate file</td>
</tr>
<tr>
<td>rundir</td>
<td>the rundir in the numbering scheme for the submit
            directories</td>
</tr>
<tr>
<td>pegasushome</td>
<td>the root directory of the pegasus installation</td>
</tr>
<tr>
<td>vogroup</td>
<td>the vo group to which the user belongs to. Defaults to
            pegasus</td>
</tr>
<tr>
<td>condor_log</td>
<td>the full path to condor common log in the submit
            directory</td>
</tr>
<tr>
<td>notify</td>
<td>the notify file that contains any notifications that need
            to be sent for the workflow.</td>
</tr>
<tr>
<td>dag</td>
<td>the basename of the dag file created</td>
</tr>
<tr>
<td>type</td>
<td>the type of executable workflow. Can be dag | shell</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>A Sample Braindump File is displayed below:</p>
<pre class="programlisting">user vahi
grid_dn null
submit_hostname obelix
root_wf_uuid a4045eb6-317a-4710-9a73-96a745cb1fe8
wf_uuid a4045eb6-317a-4710-9a73-96a745cb1fe8
dax /data/scratch/vahi/examples/synthetic-scec/Test.dax
dax_label Stampede-Test
dax_index 0
dax_version 3.3
pegasus_wf_name Stampede-Test-0
timestamp 20110726T153746-0700
basedir /data/scratch/vahi/examples/synthetic-scec/dags
submit_dir /data/scratch/vahi/examples/synthetic-scec/dags/vahi/pegasus/Stampede-Test/run0005
properties pegasus.6923599674234553065.properties
planner /data/scratch/vahi/software/install/pegasus/default/bin/pegasus-plan
planner_version 3.1.0cvs
pegasus_build 20110726221240Z
planner_arguments "--conf ./conf/properties --dax Test.dax --sites local --output local --dir dags --force --submit "
jsd jobstate.log
rundir run0005
pegasushome /data/scratch/vahi/software/install/pegasus/default
vogroup pegasus
condor_log Stampede-Test-0.log
notify Stampede-Test-0.notify
dag Stampede-Test-0.dag
type dag
</pre>
</div>
<div class="section" title="7.6. Pegasus static.bp File">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="static_bp_file"></a>7.6. Pegasus static.bp File</h2></div></div></div>
<p>Pegasus creates a workflow.static.bp file that links jobs in the DAG
    with the jobs in the DAX. The contents of the file are in netlogger
    format. The purpose of this file is to be able to link an invocation
    record of a task to the corresponding job in the DAX</p>
<p>The workflow is replaced by the name of the workflow i.e. same
    prefix as the .dag file</p>
<p>In the file there are five types of events:</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
<p>task.info</p>
<p>This event is used to capture information about all the tasks in
        the DAX( abstract workflow)</p>
</li>
<li class="listitem">
<p>task.edge</p>
<p>This event is used to capture information about the edges
        between the tasks in the DAX ( abstract workflow )</p>
</li>
<li class="listitem">
<p>job.info</p>
<p>This event is used to capture information about the jobs in the
        DAG ( executable workflow generated by Pegasus )</p>
</li>
<li class="listitem">
<p>job.edge</p>
<p>This event is used to capture information about edges between
        the jobs in the DAG ( executable workflow ).</p>
</li>
<li class="listitem">
<p>wf.map.task_job</p>
<p>This event is used to associate the tasks in the DAX with the
        corresponding jobs in the DAG.</p>
</li>
</ul></div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="execution_environments.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="monitoring_debugging_stats.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 6. Execution Environments </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> Chapter 8. Monitoring, Debugging and Statistics</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
