<?php  
            require('/srv/new-pegasus.isi.edu/includes/common.php'); 
            pegasus_header("15.3. The Pegasus DAX and Jupyter Python APIs");
        ?><div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.php">Pegasus 4.8.0 User Guide</a></span> &gt; <span class="breadcrumb-link"><a href="jupyter.php">Jupyter Notebooks</a></span> &gt; <span class="breadcrumb-node">The Pegasus DAX and Jupyter Python APIs</span>
</div><hr><div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="jupyter-api"></a>15.3. The Pegasus DAX and Jupyter Python APIs</h2></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="jupyter-api.php#jupyter-api-dax">15.3.1. Creating an Abstract Workflow</a></span></dt>
<dt><span class="section"><a href="jupyter-api.php#jupyter-api-catalogs">15.3.2. Creating the Catalogs</a></span></dt>
<dt><span class="section"><a href="jupyter-api.php#jupyter-api-exec">15.3.3. Workflow Execution</a></span></dt>
</dl></div>
<p>The first step to enable Jupyter to use the Pegasus API is to import
    the Python Pegasus Jupyter API. The instance module will automatically
    load the Pegasus DAX3 API and the catalogs APIs.</p>
<pre class="programlisting">from Pegasus.jupyter.instance import *</pre>
<p>By default, the API automatically creates a folder in the user's
    $HOME directory based on the workflow name. However, a predefined path for
    the workflow files can also be defined as follows:</p>
<pre class="programlisting">workflow_dir = '/home/pegasus/wf-split-tutorial'</pre>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="jupyter-api-dax"></a>15.3.1. Creating an Abstract Workflow</h3></div></div></div>
<p>Workflow creation within Jupyter follows the same steps to
      generate a DAX with the <a class="link" href="dax_generator_api.php" title="16.2. DAX Generator API">DAX Generator
      API</a>.</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="jupyter-api-catalogs"></a>15.3.2. Creating the Catalogs</h3></div></div></div>
<p>The <span class="emphasis"><em><a class="link" href="replica.php" title="4.2. Data Discovery (Replica Catalog)">Replica
      Catalog</a></em></span> (RC) tells Pegasus where to find each of the
      input files for the workflow. We provide a Python API for creating the
      RC programmatically. For detailed information on how the RC works and
      its semantics can be found <a class="link" href="replica.php" title="4.2. Data Discovery (Replica Catalog)">here</a>, and the
      auto-generated python documentation for this API can be found <a class="ulink" href="python/replica_catalog.html" target="_top">here</a>.</p>
<pre class="programlisting">rc = ReplicaCatalog(workflow_dir)
rc.add('pegasus.html', 'file:///home/pegasus/pegasus.html', site='local')
</pre>
<p>The <span class="emphasis"><em><a class="link" href="transformation.php" title="4.4. Executable Discovery (Transformation Catalog)">Transformation
      Catalog </a></em></span> (TC) describes all of the executables (called
      "transformations") used by the workflow. The Python Jupyter API also
      provides methods to manage this catalog. A detailed description of the
      TC properties can be found <a class="link" href="transformation.php" title="4.4. Executable Discovery (Transformation Catalog)">here</a>,
      and the auto-generated python documentation for this API can be found
      <a class="ulink" href="python/transformation_catalog.html" target="_top">here</a>.</p>
<pre class="programlisting">e_split = Executable('split', arch=Arch.X86_64, os=OSType.LINUX, installed=True)
e_split.addPFN(PFN('file:///usr/bin/split', 'condorpool'))

e_wc = Executable('wc', arch=Arch.X86_64, os=OSType.LINUX, installed=True)
e_wc.addPFN(PFN('file:///usr/bin/wc', 'condorpool'))

tc = TransformationCatalog(workflow_dir)
tc.add(e_split)
tc.add(e_wc)
</pre>
<p>The <span class="emphasis"><em><a class="link" href="site.php" title="4.3. Resource Discovery (Site Catalog)">Site Catalog</a></em></span>
      (SC) describes the sites where the workflow jobs are to be executed. A
      detailed description of the SC properties and handlers can be found
      <a class="link" href="transformation.php" title="4.4. Executable Discovery (Transformation Catalog)">here</a>, and the auto-generated
      python documentation for this API can be found <a class="ulink" href="python/sites_catalog.html" target="_top">here</a>.</p>
<pre class="programlisting">sc = SitesCatalog(workflow_dir)
sc.add_site('condorpool', arch=Arch.X86_64, os=OSType.LINUX)
sc.add_site_profile('condorpool', namespace=Namespace.PEGASUS, key='style', value='condor')
sc.add_site_profile('condorpool', namespace=Namespace.CONDOR, key='universe', value='vanilla')
</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="jupyter-api-exec"></a>15.3.3. Workflow Execution</h3></div></div></div>
<p>Workflow execution and management are performed using an
      <span class="emphasis"><em>Instance</em></span> object. An instance receives a DAX object
      (created with the <a class="link" href="dax_generator_api.php" title="16.2. DAX Generator API"> DAX Generator
      API</a>), and the catalogs objects (replica, transformation, and
      site). A path to the workflow directory can also be provided:</p>
<pre class="programlisting">instance = Instance(dax, replica_catalog=rc, transformation_catalog=tc, sites_catalog=sc, workflow_dir=workflow_dir)</pre>
<p>An instance object represents a workflow run, from where the
      workflow execution can be launched, monitored, and managed. The
      <span class="emphasis"><em>run</em></span> method starts the workflow execution.</p>
<pre class="programlisting">instance.run(site='condorpool')</pre>
<p>After the workflow has been submitted you can monitor it using the
      <span class="emphasis"><em>status()</em></span> method. This method takes two
      arguments:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="emphasis"><em>loop</em></span>: whether the status command should
          be invoked once or continuously until the workflow is completed or a
          failure is detected.</p></li>
<li class="listitem"><p><span class="emphasis"><em>delay</em></span>: The delay (in seconds) the status
          will be refreshed. Default value is 10s.</p></li>
</ol></div>
<pre class="programlisting">instance.status(loop=True, delay=5)</pre>
</div>
</div><div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="jupyter-requirements.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="jupyter.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="jupyterhub.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">15.2. Requirements </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 15.4. JupyterHub</td>
</tr>
</table>
</div><?php  
            pegasus_footer();
        ?>
