<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="large_workflows.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="data_transfers.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section" title="10.4. Hierarchical Workflows">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="hierarchial_workflows"></a>10.4. Hierarchical Workflows</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="hierarchial_workflows.php#idp47740976">10.4.1. Introduction</a></span></dt>
<dt><span class="section"><a href="hierarchial_workflows.php#idp47728432">10.4.2. Specifying a DAX Job in the DAX</a></span></dt>
<dt><span class="section"><a href="hierarchial_workflows.php#idp47697216">10.4.3. Specifying a DAG Job in the DAX</a></span></dt>
<dt><span class="section"><a href="hierarchial_workflows.php#idp47679584">10.4.4. File Dependencies Across DAX Jobs</a></span></dt>
<dt><span class="section"><a href="hierarchial_workflows.php#idp47683040">10.4.5. Recursion in Hierarchal Workflows</a></span></dt>
<dt><span class="section"><a href="hierarchial_workflows.php#idp47660336">10.4.6. Example</a></span></dt>
</dl></div>
<div class="section" title="10.4.1. Introduction">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp47740976"></a>10.4.1. Introduction</h3></div></div></div>
<p>The Abstract Workflow in addition to containing compute jobs, can
      also contain jobs that refer to other workflows. This is useful for
      running large workflows or ensembles of workflows.</p>
<p>Users can embed two types of workflow jobs in the DAX</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
<p>daxjob - refers to a sub workflow represented as a DAX. During
          the planning of a workflow, the DAX jobs are mapped to condor dagman
          jobs that have pegasus plan invocation on the dax ( referred to in
          the DAX job ) as the prescript.</p>
<div class="figure">
<a name="idp47738112"></a><p class="title"><b>Figure 10.6. Planning of a DAX Job</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="./images/daxjob-mapping.png" height="360" alt="Planning of a DAX Job"></td></tr></table></div></div>
</div>
<br class="figure-break">
</li>
<li class="listitem">
<p>dagjob - refers to a sub workflow represented as a DAG. During
          the planning of a workflow, the DAG jobs are mapped to condor dagman
          and refer to the DAG file mentioned in the DAG job.</p>
<div class="figure">
<a name="idp47733184"></a><p class="title"><b>Figure 10.7. Planning of a DAG Job</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="./images/dagjob-mapping.png" height="360" alt="Planning of a DAG Job"></td></tr></table></div></div>
</div>
<br class="figure-break">
</li>
</ol></div>
</div>
<div class="section" title="10.4.2. Specifying a DAX Job in the DAX">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp47728432"></a>10.4.2. Specifying a DAX Job in the DAX</h3></div></div></div>
<p>Specifying a DAXJob in a DAX is pretty similar to how normal
      compute jobs are specified. There are minor differences in terms of the
      xml element name ( dax vs job ) and the attributes specified. DAXJob XML
      specification is described in detail in the <a class="link" href="api.php" title="Chapter 14. API Reference">chapter
      on DAX API</a> . An example DAX Job in a DAX is shown below</p>
<a name="dax_job_example"></a><pre class="programlisting">  &lt;dax id="ID000002" name="black.dax" node-label="bar" &gt;
    &lt;profile namespace="dagman" key="maxjobs"&gt;10&lt;/profile&gt;
    &lt;argument&gt;-Xmx1024 -Xms512 -Dpegasus.dir.storage=storagedir  -Dpegasus.dir.exec=execdir -o local -vvvvv --force -s dax_site &lt;/argument&gt;
  &lt;/dax&gt;</pre>
<div class="section" title="10.4.2.1. DAX File Locations">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp47725792"></a>10.4.2.1. DAX File Locations</h4></div></div></div>
<p>The name attribute in the dax element refers to the LFN (
        Logical File Name ) of the dax file. The location of the DAX file can
        be catalogued either in the</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Replica Catalog</p></li>
<li class="listitem">
<p>Replica Catalog Section in the <a class="link" href="api.php#dax_replica_catalog" title="14.1.1.3.1. The Replica Catalog Section">DAX</a> .</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>Currently, only file url's on the local site ( submit
                host ) can be specified as DAX file locations.</p>
</div>
</li>
</ol></div>
</div>
<div class="section" title="10.4.2.2. Arguments for a DAX Job">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp47719568"></a>10.4.2.2. Arguments for a DAX Job</h4></div></div></div>
<p>Users can specify specific arguments to the DAX Jobs. The
        arguments specified for the DAX Jobs are passed to the pegasus-plan
        invocation in the prescript for the corresponding condor dagman job in
        the executable workflow.</p>
<p>The following options for pegasus-plan are inherited from the
        pegasus-plan invocation of the parent workflow. If an option is
        specified in the arguments section for the DAX Job then that overrides
        what is inherited.</p>
<div class="table">
<a name="idp47717712"></a><p class="title"><b>Table 10.2. Options inherited from parent workflow</b></p>
<div class="table-contents"><table summary="Options inherited from parent workflow" border="1">
<colgroup>
<col>
<col>
</colgroup>
<thead><tr>
<th>Option Name</th>
<th>Description</th>
</tr></thead>
<tbody><tr>
<td>--sites</td>
<td>list of execution sites.</td>
</tr></tbody>
</table></div>
</div>
<br class="table-break"><p>It is highly recommended that users <span class="bold"><strong>dont
        specify</strong></span> directory related options in the arguments section
        for the DAX Jobs. Pegasus assigns values to these options for the sub
        workflows automatically.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>--relative-dir</p></li>
<li class="listitem"><p>--dir</p></li>
<li class="listitem"><p>--relative-submit-dir</p></li>
</ol></div>
</div>
<div class="section" title="10.4.2.3. Profiles for DAX Job">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp47707088"></a>10.4.2.3. Profiles for DAX Job</h4></div></div></div>
<p>Users can choose to specify dagman profiles with the DAX Job to
        control the behavior of the corresponding condor dagman instance in
        the executable workflow. In the example <a class="link" href="hierarchial_workflows.php#dax_job_example">above</a> maxjobs is set to 10 for the
        sub workflow.</p>
</div>
<div class="section" title="10.4.2.4. Execution of the PRE script and Condor DAGMan instance">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp47704816"></a>10.4.2.4. Execution of the PRE script and Condor DAGMan instance</h4></div></div></div>
<p>The pegasus plan that is invoked as part of the prescript to the
        condor dagman job is executed on the submit host. The log from the
        output of pegasus plan is redirected to a file ( ending with suffix
        pre.log ) in the submit directory of the workflow that contains the
        DAX Job. The path to pegasus-plan is automatically determined.</p>
<p>The DAX Job maps to a Condor DAGMan job. The path to condor
        dagman binary is determined according to the following rules -</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>entry in the transformation catalog for condor::dagman for
            site local, else</p></li>
<li class="listitem"><p>pick up the value of CONDOR_HOME from the environment if
            specified and set path to condor dagman as
            $CONDOR_HOME/bin/condor_dagman , else</p></li>
<li class="listitem"><p>pick up the value of CONDOR_LOCATION from the environment if
            specified and set path to condor dagman as
            $CONDOR_LOCATION/bin/condor_dagman , else</p></li>
<li class="listitem"><p>pick up the path to condor dagman from what is defined in
            the user's PATH</p></li>
</ol></div>
<div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Tip</h3>
<p>It is recommended that user dagman.maxpre in their properties
          file to control the maximum number of pegasus plan instances
          launched by each running dagman instance.</p>
</div>
</div>
</div>
<div class="section" title="10.4.3. Specifying a DAG Job in the DAX">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp47697216"></a>10.4.3. Specifying a DAG Job in the DAX</h3></div></div></div>
<p>Specifying a DAGJob in a DAX is pretty similar to how normal
      compute jobs are specified. There are minor differences in terms of the
      xml element name ( dag vs job ) and the attributes specified. For DAGJob
      XML details,see the <a class="link" href="api.php" title="Chapter 14. API Reference"> API Reference </a> chapter .
      An example DAG Job in a DAX is shown below</p>
<a name="dag_job_example"></a><pre class="programlisting">  &lt;dag id="ID000003" name="black.dag" node-label="foo" &gt;
    &lt;profile namespace="dagman" key="maxjobs"&gt;10&lt;/profile&gt;
    &lt;profile namespace="dagman" key="DIR"&gt;/dag-dir/test&lt;/profile&gt;
  &lt;/dag&gt;</pre>
<div class="section" title="10.4.3.1. DAG File Locations">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp47694672"></a>10.4.3.1. DAG File Locations</h4></div></div></div>
<p>The name attribute in the dag element refers to the LFN (
        Logical File Name ) of the dax file. The location of the DAX file can
        be catalogued either in the</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Replica Catalog</p></li>
<li class="listitem">
<p>Replica Catalog Section in the DAX.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>Currently, only file url's on the local site ( submit
                host ) can be specified as DAG file locations.</p>
</div>
</li>
</ol></div>
</div>
<div class="section" title="10.4.3.2. Profiles for DAG Job">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp47689552"></a>10.4.3.2. Profiles for DAG Job</h4></div></div></div>
<p>Users can choose to specify dagman profiles with the DAX Job to
        control the behavior of the corresponding condor dagman instance in
        the executable workflow. In the example above, maxjobs is set to 10
        for the sub workflow.</p>
<p>The dagman profile DIR allows users to specify the directory in
        which they want the condor dagman instance to execute. In the example
        <a class="link" href="hierarchial_workflows.php#dag_job_example">above</a> black.dag is set to be
        executed in directory /dag-dir/test . The /dag-dir/test should be
        created beforehand.</p>
</div>
</div>
<div class="section" title="10.4.4. File Dependencies Across DAX Jobs">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp47679584"></a>10.4.4. File Dependencies Across DAX Jobs</h3></div></div></div>
<p>In hierarchal workflows , if a sub workflow generates some output
      files required by another sub workflow then there should be an edge
      connecting the two dax jobs. Pegasus will ensure that the prescript for
      the child sub-workflow, has the path to the cache file generated during
      the planning of the parent sub workflow. The cache file in the submit
      directory for a workflow is a textual replica catalog that lists the
      locations of all the output files created in the remote workflow
      execution directory when the workflow executes.</p>
<p>This automatic passing of the cache file to a child sub-workflow
      ensures that the datasets from the same workflow run are used. However,
      the passing the locations in a cache file also ensures that Pegasus will
      prefer them over all other locations in the Replica Catalog. If you need
      the Replica Selection to consider locations in the Replica Catalog also,
      then set the following property.</p>
<pre class="programlisting"><span class="bold"><strong>pegasus.catalog.replica.cache.asrc  true</strong></span></pre>
<p>The above is useful in the case, where you are staging out the
      output files to a storage site, and you want the child sub workflow to
      stage these files from the storage output site instead of the workflow
      execution directory where the files were originally created.</p>
</div>
<div class="section" title="10.4.5. Recursion in Hierarchal Workflows">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp47683040"></a>10.4.5. Recursion in Hierarchal Workflows</h3></div></div></div>
<p>It is possible for a user to add a dax jobs to a dax that already
      contain dax jobs in them. Pegasus does not place a limit on how many
      levels of recursion a user can have in their workflows. From Pegasus
      perspective recursion in hierarchal workflows ends when a DAX with only
      compute jobs is encountered . However, the levels of recursion are
      limited by the system resources consumed by the DAGMan processes that
      are running (each level of nesting produces another DAGMan process)
      .</p>
<p>The figure below illustrates an example with recursion 2 levels
      deep.</p>
<div class="figure">
<a name="idp47681920"></a><p class="title"><b>Figure 10.8. Recursion in Hierarchal Workflows</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><img src="./images/recursion_in_hierarchal_workflows.png" align="middle" height="500" alt="Recursion in Hierarchal Workflows"></div></div>
</div>
<br class="figure-break"><p>The execution time-line of the various jobs in the above figure is
      illustrated below.</p>
<div class="figure">
<a name="idp47675984"></a><p class="title"><b>Figure 10.9. Execution Time-line for Hierarchal Workflows</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><img src="./images/hierarchal_workflows_execution_timeline.png" align="middle" height="500" alt="Execution Time-line for Hierarchal Workflows"></div></div>
</div>
<br class="figure-break">
</div>
<div class="section" title="10.4.6. Example">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp47660336"></a>10.4.6. Example</h3></div></div></div>
<p>The Galactic Plane workflow is a Hierarchical workflow of many
      Montage workflows. For details, see <a class="link" href="example_workflows.php" title="Chapter 8. Example Workflows">Workflow of Workflows</a>.</p>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="large_workflows.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="optimization.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="data_transfers.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">10.3. How to Scale Large Workflows </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 10.5. Optimizing Data Transfers</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
