<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="ch02s05.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="ch02s07.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="idp50972976"></a>2.6. Planning the Workflow</h2></div></div></div>
<p>The planning stage is where Pegasus maps the abstract DAX to one or
    more execution sites. The planning step includes:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Adding a job to create the remote working directory</p></li>
<li class="listitem"><p>Adding stage-in jobs to transfer input data to the remote
        working directory</p></li>
<li class="listitem"><p>Adding cleanup jobs to remove data from the remote working
        directory when it is no longer needed</p></li>
<li class="listitem"><p>Adding stage-out jobs to transfer data to the final output
        location as it is generated</p></li>
<li class="listitem"><p>Adding registration jobs to register the data in a replica
        catalog</p></li>
<li class="listitem"><p>Task clustering to combine several short-running jobs into a
        single, longer-running job. This is done to make short-running jobs
        more efficient.</p></li>
<li class="listitem"><p>Adding wrappers to the jobs to collect provenance information so
        that statistics and plots can be created when the workflow is
        finished</p></li>
</ol></div>
<p>The <code class="literal">pegasus-plan</code> command is used to plan a
    workflow. This command takes quite a few arguments, so we created a
    <code class="filename">plan_dax.sh</code> wrapper script that has all of the
    arguments required for the diamond workflow:</p>
<pre class="programlisting">$ <span class="bold"><strong>more plan_dax.sh</strong></span>
...</pre>
<p>The script invokes the <code class="literal">pegasus-plan</code> command with
    arguments for the configuration file (<code class="literal">--conf</code>), the DAX
    file (<code class="literal">-d</code>), the submit directory
    (<code class="literal">--dir</code>), the execution site
    (<code class="literal">--sites</code>), the output site (<code class="literal">-o</code>) and
    two extra arguments that prevent Pegasus from removing any jobs from the
    workflow (<code class="literal">--force</code>) and that prevent Pegasus from adding
    cleanup jobs to the workflow (<code class="literal">--nocleanup</code>).</p>
<p>Top plan the diamond workflow invoke the
    <code class="filename">plan_dax.sh</code> script with the path to the DAX
    file:</p>
<pre class="programlisting">$ <span class="bold"><strong>./plan_dax.sh diamond.dax</strong></span>
2012.07.24 21:11:03.256 EDT:

I have concretized your abstract workflow. The workflow has been entered
into the workflow database with a state of "planned". The next step is to
start or execute your workflow. The invocation required is:

pegasus-run  /home/tutorial/submit/tutorial/pegasus/diamond/run0001


2012.07.24 21:11:03.257 EDT:   Time taken to execute is 1.103 seconds
</pre>
<p>Note the line in the output that starts with
    <code class="literal">pegasus-run</code>. That is the command that we will use to
    submit the workflow. The path it contains is the path to the submit
    directory where all of the files required to submit and monitor the
    workflow are stored.</p>
<p>This is what the diamond workflow looks like after Pegasus has
    finished planning the DAX:</p>
<div class="figure">
<a name="idp50996384"></a><p class="title"><b>Figure 2.2. Diamond DAG</b></p>
<div class="figure-contents"><div class="mediaobject"><img src="images/concepts-diamond-dag.png" width="378" alt="Diamond DAG"></div></div>
</div>
<br class="figure-break"><p>For this workflow the only jobs Pegasus needs to add are a directory
    creation job, a stage-in job (for f.a), and a stage-out job (for f.d). No
    registration jobs are added because all the files in the DAX are marked
    register="false", and no cleanup jobs are added because we passed the
    <code class="literal">--nocleanup</code> argument to
    <code class="literal">pegasus-plan</code>.</p>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="ch02s05.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="tutorial.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="ch02s07.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">2.5. Configuring Pegasus </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 2.7. Submitting the Workflow</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
