<?php  
            require('/srv/new-pegasus.isi.edu/includes/common.php'); 
            pegasus_header("Chapter 4. Creating Workflows");
        ?><div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.php">Pegasus 4.8.0 User Guide</a></span> &gt; <span class="breadcrumb-node">Creating Workflows</span>
</div><hr><div class="chapter">
<div class="titlepage"><div><div><h1 class="title">
<a name="creating_workflows"></a>Chapter 4. Creating Workflows</h1></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="creating_workflows.php#abstract_workflows">4.1. Abstract Workflows (DAX)</a></span></dt>
<dt><span class="section"><a href="replica.php">4.2. Data Discovery (Replica Catalog)</a></span></dt>
<dt><span class="section"><a href="site.php">4.3. Resource Discovery (Site Catalog)</a></span></dt>
<dt><span class="section"><a href="transformation.php">4.4. Executable Discovery (Transformation Catalog)</a></span></dt>
<dt><span class="section"><a href="variable_expansion.php">4.5. Variable Expansion</a></span></dt>
</dl></div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="abstract_workflows"></a>4.1. Abstract Workflows (DAX)</h2></div></div></div>
<p>The DAX is a description of an abstract workflow in XML format that
    is used as the primary input into Pegasus. The DAX schema is described in
    <a class="ulink" href="schemas/dax-3.4/dax-3.4.xsd" target="_top">dax-3.4.xsd</a> The
    documentation of the schema and its elements can be found in <a class="ulink" href="schemas/dax-3.4/dax-3.4.html" target="_top">dax-3.4.html</a>.</p>
<p>A DAX can be created by all users with the DAX generating API in
    Java, Perl, or Python format</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
       We highly recommend using the DAX API. 
    </div>
<p>Advanced users who can read XML schema definitions can generate a
    DAX directly from a script</p>
<p>The sample workflow below incorporates some of the elementary graph
    structures used in all abstract workflows.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="bold"><strong>fan-out</strong></span>, <span class="bold"><strong>scatter</strong></span>, and <span class="bold"><strong>diverge</strong></span> all describe the fact that multiple
        siblings are dependent on fewer parents.</p>
<p>The example shows how the <span class="bold"><strong> Job 2 and
        3</strong></span> nodes depend on <span class="bold"><strong>Job 1</strong></span>
        node.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>fan-in</strong></span>, <span class="bold"><strong>gather</strong></span>, <span class="bold"><strong>join</strong></span>,
        and <span class="bold"><strong>converge</strong></span> describe how multiple
        siblings are merged into fewer dependent child nodes.</p>
<p>The example shows how the <span class="bold"><strong>Job 4</strong></span>
        node depends on both <span class="bold"><strong>Job 2 and Job 3</strong></span>
        nodes.</p>
</li>
</ul></div>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p><span class="bold"><strong>serial execution</strong></span> implies that
        nodes are dependent on one another, like pearls on a string.</p></li>
<li class="listitem"><p><span class="bold"><strong>parallel execution</strong></span> implies that
        nodes can be executed in parallel</p></li>
</ul></div>
<div class="figure">
<a name="components_blackdiamond"></a><p class="title"><b>Figure 4.1. Sample Workflow</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" style="cellpadding: 0; cellspacing: 0;"><tr><td align="center" valign="middle"><img src="images/DiamondWorkflow.png" align="middle" alt="Sample Workflow"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>The example diamond workflow consists of four nodes representing
    jobs, and are linked by six files.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>Required input files must be registered with the Replica catalog
        in order for Pegasus to find it and integrate it into the
        workflow.</p></li>
<li class="listitem"><p>Leaf files are a product or output of a workflow. Output files
        can be collected at a location.</p></li>
<li class="listitem"><p>The remaining files all have lines leading to them and
        originating from them. These files are products of some job steps
        (lines leading to them), and consumed by other job steps (lines
        leading out of them). Often, these files represent intermediary
        results that can be cleaned.</p></li>
</ul></div>
<p>There are two main ways of generating DAX's</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
<p>Using a DAX generating API in <a class="link" href="dax_generator_api.php#api-java" title="16.2.1. The Java DAX Generator API">Java</a>, <a class="link" href="dax_generator_api.php#api-perl" title="16.2.3. The Perl DAX Generator">Perl</a>
        or <a class="link" href="dax_generator_api.php#api-python" title="16.2.2. The Python DAX Generator API">Python</a>.</p>
<p><span class="bold"><strong>Note:</strong></span> We recommend this
        option.</p>
</li>
<li class="listitem">
<p>Generating XML directly from your script.</p>
<p><span class="bold"><strong>Note:</strong></span> This option should only
        be considered by advanced users who can also read XML schema
        definitions.</p>
</li>
</ol></div>
<p>One example for a DAX representing the example workflow can look
    like the following:</p>
<pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated on: 2016-01-21T10:36:39-08:00 --&gt;
&lt;!-- generated by: vahi [ ?? ] --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd" version="3.6" name="diamond" index="0" count="1"&gt;

&lt;!-- Section 1: Metadata attributes for the workflow (can be empty)  --&gt;

   &lt;metadata key="name"&gt;diamond&lt;/metadata&gt;
   &lt;metadata key="createdBy"&gt;Karan Vahi&lt;/metadata&gt;

&lt;!-- Section 2: Invokes - Adds notifications for a workflow (can be empty) --&gt;

   &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;

&lt;!-- Section 3: Files - Acts as a Replica Catalog (can be empty) --&gt;

   &lt;file name="f.a"&gt;
      &lt;metadata key="size"&gt;1024&lt;/metadata&gt;
      &lt;pfn url="file:///Volumes/Work/lfs1/work/pegasus-features/PM-902/f.a" site="local"/&gt;
   &lt;/file&gt;

&lt;!-- Section 4: Executables - Acts as a Transformaton Catalog (can be empty) --&gt;

   &lt;executable namespace="pegasus" name="preprocess" version="4.0" installed="true" arch="x86" os="linux"&gt;
      &lt;metadata key="size"&gt;2048&lt;/metadata&gt;
      &lt;pfn url="file:///usr/bin/keg" site="TestCluster"/&gt;
   &lt;/executable&gt;
   &lt;executable namespace="pegasus" name="findrange" version="4.0" installed="true" arch="x86" os="linux"&gt;
      &lt;pfn url="file:///usr/bin/keg" site="TestCluster"/&gt;
   &lt;/executable&gt;
   &lt;executable namespace="pegasus" name="analyze" version="4.0" installed="true" arch="x86" os="linux"&gt;
      &lt;pfn url="file:///usr/bin/keg" site="TestCluster"/&gt;
   &lt;/executable&gt;

&lt;!-- Section 5: Transformations - Aggregates executables and Files (can be empty) --&gt;


&lt;!-- Section 6: Job's, DAX's or Dag's - Defines a JOB or DAX or DAG (Atleast 1 required) --&gt;

   &lt;job id="j1" namespace="pegasus" name="preprocess" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a preprocess -T 60 -i  &lt;file name="f.a"/&gt; -o  &lt;file name="f.b1"/&gt;   &lt;file name="f.b2"/&gt;&lt;/argument&gt;
      &lt;uses name="f.a" link="input"&gt;
         &lt;metadata key="size"&gt;1024&lt;/metadata&gt;
      &lt;/uses&gt;
      &lt;uses name="f.b1" link="output" transfer="true" register="true"/&gt;
      &lt;uses name="f.b2" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;
   &lt;job id="j2" namespace="pegasus" name="findrange" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a findrange -T 60 -i  &lt;file name="f.b1"/&gt; -o  &lt;file name="f.c1"/&gt;&lt;/argument&gt;
      &lt;uses name="f.b1" link="input"/&gt;
      &lt;uses name="f.c1" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;
   &lt;job id="j3" namespace="pegasus" name="findrange" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a findrange -T 60 -i  &lt;file name="f.b2"/&gt; -o  &lt;file name="f.c2"/&gt;&lt;/argument&gt;
      &lt;uses name="f.b2" link="input"/&gt;
      &lt;uses name="f.c2" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;
   &lt;job id="j4" namespace="pegasus" name="analyze" version="4.0"&gt;
      &lt;metadata key="time"&gt;60&lt;/metadata&gt;
      &lt;argument&gt;-a analyze -T 60 -i  &lt;file name="f.c1"/&gt;   &lt;file name="f.c2"/&gt; -o  &lt;file name="f.d"/&gt;&lt;/argument&gt;
      &lt;uses name="f.c1" link="input"/&gt;
      &lt;uses name="f.c2" link="input"/&gt;
      &lt;uses name="f.d" link="output" transfer="true" register="true"/&gt;
      &lt;invoke when="start"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
      &lt;invoke when="at_end"&gt;/pegasus/libexec/notification/email -t notify@example.com&lt;/invoke&gt;
   &lt;/job&gt;

&lt;!-- Section 7: Dependencies - Parent Child relationships (can be empty) --&gt;

   &lt;child ref="j2"&gt;
      &lt;parent ref="j1"/&gt;
   &lt;/child&gt;
   &lt;child ref="j3"&gt;
      &lt;parent ref="j1"/&gt;
   &lt;/child&gt;
   &lt;child ref="j4"&gt;
      &lt;parent ref="j2"/&gt;
      &lt;parent ref="j3"/&gt;
   &lt;/child&gt;
&lt;/adag&gt;</pre>
<p>The example workflow representation in form of a DAX requires
    external catalogs, such as transformation catalog (TC) to resolve the
    logical job names (such as diamond::preprocess:2.0), and a replica catalog
    (RC) to resolve the input file <code class="filename">f.a</code>. The above
    workflow defines the four jobs just like the example picture, and the
    files that flow between the jobs. The intermediary files are neither
    registered nor staged out, and can be considered transient. Only the final
    result file <code class="filename">f.d</code> is staged out.</p>
</div>
</div><div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="tarballs.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="replica.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">3.8. Pegasus from Tarballs </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 4.2. Data Discovery (Replica Catalog)</td>
</tr>
</table>
</div><?php  
            pegasus_footer();
        ?>
