<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="installation.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="running_workflows.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter" title="Chapter 4. Creating Workflows">
<div class="titlepage"><div><div><h2 class="title">
<a name="creating_workflows"></a>Chapter 4. Creating Workflows</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="creating_workflows.php#abstract_workflows">4.1. Abstract Workflows (DAX)</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#replica">4.2. Data Discovery (Replica Catalog)</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#site">4.3. Resource Discovery (Site Catalog)</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#transformation">4.4. Executable Discovery (Transformation Catalog)</a></span></dt>
</dl></div>
<div class="section" title="4.1. Abstract Workflows (DAX)">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="abstract_workflows"></a>4.1. Abstract Workflows (DAX)</h2></div></div></div>
<p>The DAX is a description of an abstract workflow in XML format that
    is used as the primary input into Pegasus. The DAX schema is described in
    <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/dax-3.4/dax-3.4.xsd" target="_top">dax-3.4.xsd</a>
    The documentation of the schema and its elements can be found in <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/dax-3.4/dax-3.4.html" target="_top">dax-3.4.html</a>.</p>
<p>A DAX can be created by all users with the DAX generating API in
    Java, Perl, or Python format</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
       We highly recommend using the DAX API. 
    </div>
<p>Advanced users who can read XML schema definitions can generate a
    DAX directly from a script</p>
<p>The sample workflow below incorporates some of the elementary graph
    structures used in all abstract workflows.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
<p><span class="bold"><strong>fan-out</strong></span>, <span class="bold"><strong>scatter</strong></span>, and <span class="bold"><strong>diverge</strong></span> all describe the fact that multiple
        siblings are dependent on fewer parents.</p>
<p>The example shows how the <span class="bold"><strong> Job 2 and
        3</strong></span> nodes depend on <span class="bold"><strong>Job 1</strong></span>
        node.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>fan-in</strong></span>, <span class="bold"><strong>gather</strong></span>, <span class="bold"><strong>join</strong></span>,
        and <span class="bold"><strong>converge</strong></span> describe how multiple
        siblings are merged into fewer dependent child nodes.</p>
<p>The example shows how the <span class="bold"><strong>Job 4</strong></span>
        node depends on both <span class="bold"><strong>Job 2 and Job 3</strong></span>
        nodes.</p>
</li>
</ul></div>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>serial execution</strong></span> implies that
        nodes are dependent on one another, like pearls on a string.</p></li>
<li class="listitem"><p><span class="bold"><strong>parallel execution</strong></span> implies that
        nodes can be executed in parallel</p></li>
</ul></div>
<div class="figure">
<a name="components_blackdiamond"></a><p class="title"><b>Figure 4.1. Sample Workflow</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0"><tr><td align="center" valign="middle"><img src="images/DiamondWorkflow.png" align="middle" alt="Sample Workflow"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>The example diamond workflow consists of four nodes representing
    jobs, and are linked by six files.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>Required input files must be registered with the Replica catalog
        in order for Pegasus to find it and integrate it into the
        workflow.</p></li>
<li class="listitem"><p>Leaf files are a product or output of a workflow. Output files
        can be collected at a location.</p></li>
<li class="listitem"><p>The remaining files all have lines leading to them and
        originating from them. These files are products of some job steps
        (lines leading to them), and consumed by other job steps (lines
        leading out of them). Often, these files represent intermediary
        results that can be cleaned.</p></li>
</ul></div>
<p>There are two main ways of generating DAX's</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
<p>Using a DAX generating API in <a class="link" href="reference.php#api-java" title="10.9.2.1. The Java DAX Generator API">Java</a>, <a class="link" href="reference.php#api-perl" title="10.9.2.3. The Perl DAX Generator">Perl</a>
        or <a class="link" href="reference.php#api-python" title="10.9.2.2. The Python DAX Generator API">Python</a>.</p>
<p><span class="bold"><strong>Note:</strong></span> We recommend this
        option.</p>
</li>
<li class="listitem">
<p>Generating XML directly from your script.</p>
<p><span class="bold"><strong>Note:</strong></span> This option should only
        be considered by advanced users who can also read XML schema
        definitions.</p>
</li>
</ol></div>
<p>One example for a DAX representing the example workflow can look
    like the following:</p>
<pre class="programlisting">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- generated: 2010-11-22T22:55:08Z --&gt;
&lt;adag xmlns="http://pegasus.isi.edu/schema/DAX"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.2.xsd"
      version="3.2" name="diamond" index="0" count="1"&gt;
  &lt;!-- part 2: definition of all jobs (at least one) --&gt;
  &lt;job namespace="diamond" name="preprocess" version="2.0" id="ID000001"&gt;
    &lt;argument&gt;-a preprocess -T60 -i &lt;file name="f.a" /&gt; -o &lt;file name="f.b1" /&gt; &lt;file name="f.b2" /&gt;&lt;/argument&gt;
    &lt;uses name="f.b2" link="output" register="false" transfer="false" /&gt;
    &lt;uses name="f.b1" link="output" register="false" transfer="false" /&gt;
    &lt;uses name="f.a" link="input" /&gt;
  &lt;/job&gt;
  &lt;job namespace="diamond" name="findrange" version="2.0" id="ID000002"&gt;
    &lt;argument&gt;-a findrange -T60 -i &lt;file name="f.b1" /&gt; -o &lt;file name="f.c1" /&gt;&lt;/argument&gt;
    &lt;uses name="f.b1" link="input" register="false" transfer="false" /&gt;
    &lt;uses name="f.c1" link="output" register="false" transfer="false" /&gt;
  &lt;/job&gt;
  &lt;job namespace="diamond" name="findrange" version="2.0" id="ID000003"&gt;
    &lt;argument&gt;-a findrange -T60 -i &lt;file name="f.b2" /&gt; -o &lt;file name="f.c2" /&gt;&lt;/argument&gt;
    &lt;uses name="f.c2" link="output" register="false" transfer="false" /&gt;
    &lt;uses name="f.b2" link="input" register="false" transfer="false" /&gt;
  &lt;/job&gt;
  &lt;job namespace="diamond" name="analyze" version="2.0" id="ID000004"&gt;
    &lt;argument&gt;-a analyze -T60 -i &lt;file name="f.c1" /&gt; &lt;file name="f.c2" /&gt; -o &lt;file name="f.d" /&gt;&lt;/argument&gt;
    &lt;uses name="f.c2" link="input" register="false" transfer="false" /&gt;
    &lt;uses name="f.d" link="output" register="false" transfer="true" /&gt;
    &lt;uses name="f.c1" link="input" register="false" transfer="false" /&gt;
  &lt;/job&gt;
  &lt;!-- part 3: list of control-flow dependencies --&gt;
  &lt;child ref="ID000002"&gt;
    &lt;parent ref="ID000001" /&gt;
  &lt;/child&gt;
  &lt;child ref="ID000003"&gt;
    &lt;parent ref="ID000001" /&gt;
  &lt;/child&gt;
  &lt;child ref="ID000004"&gt;
    &lt;parent ref="ID000002" /&gt;
    &lt;parent ref="ID000003" /&gt;
  &lt;/child&gt;
&lt;/adag&gt;</pre>
<p>The example workflow representation in form of a DAX requires
    external catalogs, such as transformation catalog (TC) to resolve the
    logical job names (such as diamond::preprocess:2.0), and a replica catalog
    (RC) to resolve the input file <code class="filename">f.a</code>. The above
    workflow defines the four jobs just like the example picture, and the
    files that flow between the jobs. The intermediary files are neither
    registered nor staged out, and can be considered transient. Only the final
    result file <code class="filename">f.d</code> is staged out.</p>
</div>
<div class="section" title="4.2. Data Discovery (Replica Catalog)">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="replica"></a>4.2. Data Discovery (Replica Catalog)</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="creating_workflows.php#rc-FILE">4.2.1. File</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#idp11063152">4.2.2. Regex</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#idp11074560">4.2.3. Directory</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#rc-JDBCRC">4.2.4. JDBCRC</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#rc-RLS">4.2.5. Replica Location Service</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#rc-MRC">4.2.6. MRC</a></span></dt>
</dl></div>
<p>The Replica Catalog keeps mappings of logical file ids/names (LFN's)
    to physical file ids/names (PFN's). A single LFN can map to several PFN's.
    A PFN consists of a URL with protocol, host and port information and a
    path to a file. Along with the PFN one can also store additional key/value
    attributes to be associated with a PFN.</p>
<p>Pegasus supports the following implementations of the Replica
    Catalog.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>File</strong></span>(Default)</p></li>
<li class="listitem"><p><span class="bold"><strong>Regex</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>Directory</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>Database via JDBC</strong></span></p></li>
<li class="listitem">
<p><span class="bold"><strong>Replica Location Service</strong></span></p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>RLS</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>LRC</strong></span></p></li>
</ul></div>
</li>
<li class="listitem"><p><span class="bold"><strong>MRC</strong></span></p></li>
</ol></div>
<div class="section" title="4.2.1. File">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-FILE"></a>4.2.1. File</h3></div></div></div>
<p>In this mode, Pegasus queries a file based replica catalog. The
      file format is a simple multicolumn format. It is neither
      transactionally safe, nor advised to use for production purposes in any
      way. Multiple concurrent instances will conflict with each other. The
      site attribute should be specified whenever possible. The attribute key
      for the site attribute is <span class="bold"><strong>"pool".</strong></span></p>
<pre class="programlisting">
LFN PFN
LFN PFN a=b [..]
LFN PFN a="b" [..]
"LFN w/LWS" "PFN w/LWS" [..]
      </pre>
<p>The LFN may or may not be quoted. If it contains linear
      whitespace, quotes, backslash or an equal sign, it must be quoted and
      escaped. The same conditions apply for the PFN. The attribute key-value
      pairs are separated by an equality sign without any whitespaces. The
      value may be quoted. The LFN sentiments about quoting apply.</p>
<p>The file mode is the Default mode. In order to use the File mode
      you have to set the following properties</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica=File</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.file=<em class="replaceable"><code>&lt;path to
            the replica catalog file&gt;</code></em></strong></span></p></li>
</ol></div>
</div>
<div class="section" title="4.2.2. Regex">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp11063152"></a>4.2.2. Regex</h3></div></div></div>
<p>In this mode, Pegasus queries a file based replica catalog. The
      file format is a simple multicolumn format. It is neither
      transactionally safe purposes in any way. Multiple concurrent instances
      will conflict with each other. The site attribute should be specified
      whenever possible. The attribute key for the site attribute is <span class="bold"><strong>"pool".</strong></span></p>
<p>In addition users can specifiy regular expression based LFN's. A
      regular expression based entry should be qualified with an attribute
      named 'regex'. The attribute regex when set to true identifies the
      catalog entry as a regular expression based entry. Regular expressions
      should follow Java regular expression syntax.</p>
<p>For example, consider a replica catalog as shown below.</p>
<p>Entry 1 refers to an entry which does not use a resular
      expressions. This entry would only match a file named 'f.a', and nothing
      else.</p>
<p>Entry 2 referes to an entry which uses a regular expression. In
      this entry f.a referes to files having name as f&lt;any-character&gt;a
      i.e. faa, f.a, f0a, etc.</p>
<pre class="programlisting">#1
f.a file:///Volumes/data/input/f.a pool="local"
#2
f.a file:///Volumes/data/input/f.a pool="local" <span class="bold"><strong>regex</strong></span>="true"
</pre>
<p>Regular expression based entries also support substitutions. For
      example, consider the regular expression based entry shown below.</p>
<p>Entry 3 will match files with name alpha.csv, alpha.txt,
      alpha.xml. In addition, values matched in the expression can be used to
      generate a PFN.</p>
<p>For the entry below if the file being looked up is alpha.csv, the
      PFN for the file would be generated as
      file:///Volumes/data/input/csv/alpha.csv. Similary if the file being
      lookedup was alpha.csv, the PFN for the file would be generated as
      file:///Volumes/data/input/xml/alpha.xml i.e. The section [0], [1] will
      be replaced. Section [0] refers to the entire string i.e. alpha.csv.
      Section [1] refers to a partial match in the input i.e. csv, or txt, or
      xml. Users can utilize as many sections as they wish.</p>
<pre class="programlisting">#3
alpha\.(csv|txt|xml) file:///Volumes/data/input/<span class="bold"><strong>[1]</strong></span>/<span class="bold"><strong>[0]</strong></span> pool="local" <span class="bold"><strong>regex</strong></span>="true"</pre>
</div>
<div class="section" title="4.2.3. Directory">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp11074560"></a>4.2.3. Directory</h3></div></div></div>
<p>In this mode, Pegasus does a directory listing on an input
      directory to create the LFN to PFN mappings. The directory listing is
      performed recursively, resulting in deep LFN mappings. For example, if
      an input directory $input is specified with the following
      structure</p>
<pre class="programlisting">$input
$input/f.1
$input/f.2
$input/D1
$input/D1/f.3</pre>
<p>Pegasus will create the mappings the following LFN PFN mappings
      internally</p>
<pre class="programlisting">f.1 file://$input/f.1  pool="local"
f.2 file://$input/f.2  pool="local"
D1/f.3 file://$input/D1/f.3 pool="local"</pre>
<p>Users can optionally specify additional properties to configure
      the behavior of this implementation.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.directory.site</strong></span> to
          specify a site attribute other than local to associate with the
          mappings.</p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.directory.flat.lfn</strong></span> to
          specify whether you want deep LFN's to be constructed or not. If not
          specified, value defaults to false i.e. deep lfn's are constructed
          for the mappings.</p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.directory.url.prefix</strong></span>
          to associate a URL prefix for the PFN's constructed. If not
          specified, the URL defaults to file://</p></li>
</ol></div>
<div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Tip</h3>
<p>pegasus-plan has -<span class="bold"><strong>-input-dir</strong></span>
        option that can be used to specify an input directory on the command
        line. This allows you to specify a separate replica catalog to catalog
        the locations of output files.</p>
</div>
</div>
<div class="section" title="4.2.4. JDBCRC">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-JDBCRC"></a>4.2.4. JDBCRC</h3></div></div></div>
<p>In this mode, Pegasus queries a SQL based replica catalog that is
      accessed via JDBC. The sql schema&amp;rsquor;s for this catalog can be
      found at <span class="bold"><strong>$PEGASUS_HOME/sql</strong></span> directory.
      You will have to install the schema into either PostgreSQL or MySQL by
      running the appropriate commands to load the two schemas <span class="bold"><strong>create-XX-init.sql</strong></span> and <span class="bold"><strong>create-XX-rc.sql</strong></span> where <span class="bold"><strong>XX</strong></span> is either <span class="bold"><strong>my</strong></span>
      (for MySQL) or <span class="bold"><strong>pg</strong></span> (for
      PostgreSQL)</p>
<p>To use JDBCRC, the user additionally needs to set the following
      properties</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica JDBCRC
          </strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.driver mysql
          </strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.url=<em class="replaceable"><code>&lt;jdbc url
          to the database&gt; e.g
          jdbc:mysql://database-host.isi.edu/database-name</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.user=<em class="replaceable"><code>&lt;database
          user&gt;</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.db.password=<em class="replaceable"><code>&lt;database
          password&gt;</code></em></strong></span></p></li>
</ol></div>
<p>Users can use the command line client
      <span class="emphasis"><em>pegasus-rc-client</em></span> to interface to query, insert and
      remove entries from the JDBCRC backend.</p>
</div>
<div class="section" title="4.2.5. Replica Location Service">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-RLS"></a>4.2.5. Replica Location Service</h3></div></div></div>
<p>Replica Location Service (RLS) is a distributed replica catalog,
      that ships with Globus. There is an index service called Replica
      Location Index (RLI) to which 1 or more Local Replica Catalog (LRC)
      report. Each LRC can contain all or a subset of mappings.</p>
<p>Details about RLS can be found at <a class="ulink" href="http://www.globus.org/toolkit/data/rls/" target="_top">http://www.globus.org/toolkit/data/rls/</a></p>
<div class="section" title="4.2.5.1. RLS">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp11104848"></a>4.2.5.1. RLS</h4></div></div></div>
<p>In this mode, Pegasus queries the central RLI to discover in
        which LRC&amp;rsquor;s the mappings for a LFN reside. It then queries
        the individual LRC&amp;rsquor;s for the PFN&amp;rsquor;s. To use this
        mode the following properties need to be set:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica=RLS</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.url=<em class="replaceable"><code>&lt;url to
              the globus LRC&gt;</code></em></strong></span></p></li>
</ol></div>
</div>
<div class="section" title="4.2.5.2. LRC">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp11110480"></a>4.2.5.2. LRC</h4></div></div></div>
<p>This mode is availabe If the user does not want to query the RLI
        (Replica Location Index), but instead wishes to directly query a
        single Local Replica Catalog. To use the LRC mode the follow
        properties need to be set</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica=<em class="replaceable"><code>LRC</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.url=<em class="replaceable"><code>&lt;url to
              the globus LRC&gt;</code></em></strong></span></p></li>
</ol></div>
<p>Details about Globus Replica Catalog and LRC can be found at
        <a class="ulink" href="http://www.globus.org/toolkit/data/rls/" target="_top">http://www.globus.org/toolkit/data/rls/</a></p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>Replica Location Service is no longer officially supported by
          Globus.</p>
</div>
</div>
</div>
<div class="section" title="4.2.6. MRC">
<div class="titlepage"><div><div><h3 class="title">
<a name="rc-MRC"></a>4.2.6. MRC</h3></div></div></div>
<p>In this mode, Pegasus queries multiple replica catalogs to
      discover the file locations on the grid.</p>
<p>To use it set</p>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica=<em class="replaceable"><code>MRC</code></em></strong></span></p></li></ol></div>
<p>Each associated replica catalog can be configured via properties
      as follows.</p>
<p>The user associates a variable name referred to as [value] for
      each of the catalogs, where [value] is any legal identifier (concretely
      [A-Za-z][_A-Za-z0-9]*) For each associated replica catalogs the user
      specifies the following properties</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.[value]
          </strong></span>- specifies the type of replica catalog.</p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.[value].key
          </strong></span>- specifies a property name key for a particular
          catalog</p></li>
</ul></div>
<p>For example, to query two lrcs at the same time specify the
      following:</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.lrc1=LRC</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.lrc1.url=<em class="replaceable"><code>&lt;url
            to the 1st globus LRC&gt;</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.lrc2=LRC</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.replica.mrc.lrc2.url=</strong></span><span class="bold"><strong>&lt;url to the 2nd globus LRC&gt;</strong></span></p></li>
</ul></div>
<p>In the above example,<span class="bold"><strong>lrc1</strong></span> and
      <span class="bold"><strong> lrc2</strong></span> are any valid identifier names
      and <span class="bold"><strong>url</strong></span> is the property key that needed
      to be specified.</p>
<div class="section" title="4.2.6.1. Replica Catalog Client pegasus-rc-client">
<div class="titlepage"><div><div><h4 class="title">
<a name="pegasus-rc-client"></a>4.2.6.1. Replica Catalog Client pegasus-rc-client</h4></div></div></div>
<p>The client used to interact with the Replica Catalogs is
        pegasus-rc-client. The implementation that the client talks to is
        configured using Pegasus properties.</p>
<p>Lets assume we create a file f.a in your home directory as shown
        below.</p>
<pre class="screen"><span class="command"><strong>$ date &gt; $HOME/f.a </strong></span></pre>
<p>We now need to register this file in the <span class="bold"><strong>File</strong></span> replica catalog located in <span class="bold"><strong>$HOME/rc</strong></span> using the pegasus-rc-client. Replace
        the <span class="bold"><strong>gsiftp://url</strong></span> with the appropriate
        parameters for your grid site.</p>
<pre class="screen"><span class="emphasis"><em>$<span class="command"><strong> rc-client -Dpegasus.catalog.replica=File -Dpegasus.catalog.replica.file=$HOME/rc insert \
 f.a</strong></span> <em class="replaceable"><code>gsiftp://somehost:port/path/to/file/f.a pool=local</code></em></em></span></pre>
<p>You may first want to verify that the file registeration is in
        the replica catalog. Since we are using a File catalog we can look at
        the file <span class="bold"><strong>$HOME/rc</strong></span> to view
        entries.</p>
<pre class="screen"><span class="command"><strong>$ cat $HOME/rc</strong></span><code class="computeroutput">
    
# file-based replica catalog: 2010-11-10T17:52:53.405-07:00
f.a gsiftp://somehost:port/path/to/file/f.a pool=local</code></pre>
<p>The above line shows that entry for file <span class="bold"><strong>f.a</strong></span> was made correctly.</p>
<p>You can also use the <span class="bold"><strong>pegasus-rc-client</strong></span> to look for entries.</p>
<pre class="screen"><span class="command"><strong>$ pegasus-rc-client -Dpegasus.catalog.replica=File -Dpegasus.catalog.replica.file=$HOME/rc lookup LFN f.a</strong></span><code class="computeroutput">

f.a gsiftp://somehost:port/path/to/file/f.a pool=local</code></pre>
</div>
</div>
</div>
<div class="section" title="4.3. Resource Discovery (Site Catalog)">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="site"></a>4.3. Resource Discovery (Site Catalog)</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="creating_workflows.php#sc-XML4">4.3.1. XML4</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#sc-XML3">4.3.2. XML3</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#pegasus-sc-client">4.3.3. Site Catalog Client pegasus-sc-client</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#idp11227056">4.3.4. Site Catalog Converter pegasus-sc-converter</a></span></dt>
</dl></div>
<p>The Site Catalog describes the compute resources (which are often
    clusters) that we intend to run the workflow upon. A site is a homogeneous
    part of a cluster that has at least a single GRAM gatekeeper with a
    <span class="bold"><strong>jobmanager-fork</strong></span>
    and<span class="emphasis"><em>jobmanager-&lt;scheduler&gt; </em></span> interface and at
    least one <span class="bold"><strong>gridftp</strong></span> server along with a
    shared file system. The GRAM gatekeeper can be either WS GRAM or Pre-WS
    GRAM. A site can also be a condor pool or glidein pool with a shared file
    system.</p>
<p>The Site Catalog can be described as an XML . Pegasus currently
    supports two schemas for the Site Catalog:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>XML4</strong></span>(Default) Corresponds to
        the schema described <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/sc-4.0/sc-4.0.html" target="_top">here</a>.</p></li>
<li class="listitem"><p><span class="bold"><strong>XML3</strong></span>(Deprecated) Corresponds to
        the schema described <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/sc-3.0/sc-3.0.html" target="_top">here</a></p></li>
</ol></div>
<div class="section" title="4.3.1. XML4">
<div class="titlepage"><div><div><h3 class="title">
<a name="sc-XML4"></a>4.3.1. XML4</h3></div></div></div>
<p>This is the default format for Pegasus 4.2. This format allows
      defining filesystem of shared as well as local type on the head node of
      the remote cluster as well as on the backend nodes</p>
<div class="figure">
<a name="idp11163776"></a><p class="title"><b>Figure 4.2. Schema Image of the Site Catalog XML4</b></p>
<div class="figure-contents"><div class="mediaobject"><img src="images/sc-4.0_p2.png" alt="Schema Image of the Site Catalog XML4"></div></div>
</div>
<br class="figure-break"><p>Below is an example of the XML4 site catalog</p>
<pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;

    &lt;site  handle="local" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/tmp/workflows/scratch"&gt;
            &lt;file-server operation="all" url="file:///tmp/workflows/scratch"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/tmp/workflows/outputs"&gt;
            &lt;file-server operation="all" url="file:///tmp/workflows/outputs"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

    &lt;site  handle="condor_pool" arch="x86_64" os="LINUX"&gt;
        &lt;grid type="gt5" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/&gt;
        &lt;grid type="gt5" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/&gt;
        &lt;directory type="shared-scratch" path="/lustre"&gt;
            &lt;file-server operation="all" url="gsiftp://smarty.isi.edu/lustre"/&gt;
        &lt;/directory&gt;
        &lt;replica-catalog type="LRC" url="rlsn://smarty.isi.edu"/&gt;
    &lt;/site&gt;

    &lt;site  handle="staging_site" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/data"&gt;
            &lt;file-server operation="put" url="scp://obelix.isi.edu/data"/&gt;
            &lt;file-server operation="get" url="http://obelix.isi.edu/data"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

&lt;/sitecatalog&gt;
      </pre>
<p>Described below are some of the entries in the site
      catalog.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>site</strong></span> - A site
            identifier.</p></li>
<li class="listitem">
<p><span class="bold"><strong>Directory</strong></span> - Info about
            filesystems Pegasus can use for storing temporary and long-term
            files. There are several configurations:</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>shared-scratch</strong></span> - This
                describe a scratch file systems. Pegasus will use this to
                store intermediate data between jobs and other temporary
                files.</p></li>
<li class="listitem"><p><span class="bold"><strong>local-storage</strong></span> - This
                describes the storage file systems (long term). This is the
                directory Pegasus will stage output files to.</p></li>
<li class="listitem"><p><span class="bold"><strong>local-scratch</strong></span> - This
                describe the scratch file systems available locally on a
                compute node. This parameter is not commonly used and can be
                left unset in most cases.</p></li>
</ul></div>
<p>For each of the directories, you can specify access methods.
            Allowed methods are <span class="bold"><strong>put</strong></span>,
            <span class="bold"><strong>get</strong></span>, and <span class="bold"><strong>all</strong></span> which means both put and get. For each
            mehod, specify a URL including the protocol. For example, if you
            want share data via http using the /var/www/staging directory, you
            can use scp://hostname/var/www for the put element and
            http://hostname/staging for the get element.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>arch,os,osrelease,osversion,
            glibc</strong></span> - The arch/os/osrelease/osversion/glibc of the
            site. OSRELEASE, OSVERSION and GLIBC are optional</p>
<p>ARCH can have one of the following values X86, X86_64,
            SPARCV7, SPARCV9, AIX, PPC.</p>
<p>OS can have one of the following values LINUX,SUNOS,MACOSX.
            The default value for sysinfo if none specified is
            X86::LINUX</p>
</li>
<li class="listitem"><p><span class="bold"><strong>replica-catalog</strong></span> - URL for a
            local replica catalog (LRC) to register your files in. Only used
            for RLS implementation of the RC. This is optional</p></li>
<li class="listitem">
<p><span class="bold"><strong>Profiles</strong></span> - One or many
            profiles can be attached to a pool.</p>
<p>One example is the environments to be set on a remote
            pool.</p>
</li>
</ol></div>
<p>To use this site catalog the follow properties need to be
      set:</p>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>pegasus.catalog.site.file=<em class="replaceable"><code>&lt;path to the
          site catalog file&gt;</code></em></strong></span></p></li></ol></div>
</div>
<div class="section" title="4.3.2. XML3">
<div class="titlepage"><div><div><h3 class="title">
<a name="sc-XML3"></a>4.3.2. XML3</h3></div></div></div>
<div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Warning</h3>
<p>This format is now deprecated in favor of the XML4 format. If
          you are still using the File format you should convert it to XML4
          format using the client pegasus-sc-converter</p>
</div>
<p>This is the default format for Pegasus 3.0. This format allows
      defining filesystem of shared as well as local type on the head node of
      the remote cluster as well as on the backend nodes</p>
<div class="figure">
<a name="idp11194320"></a><p class="title"><b>Figure 4.3. Schema Image of the Site Catalog XML 3</b></p>
<div class="figure-contents"><div class="mediaobject"><img src="images/sc-3.0_p2.png" alt="Schema Image of the Site Catalog XML 3"></div></div>
</div>
<br class="figure-break"><p>Below is an example of the XML3 site catalog</p>
<pre class="programlisting">&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog 
http://pegasus.isi.edu/schema/sc-3.0.xsd" version="3.0"&gt;
  &lt;site  handle="isi" arch="x86" os="LINUX" osrelease="" osversion="" glibc=""&gt;
      &lt;grid  type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/&gt;
      &lt;grid  type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/&gt;
          &lt;head-fs&gt;
               &lt;scratch&gt;
                  &lt;shared&gt;
                     &lt;file-server protocol="gsiftp" url="gsiftp://skynet-data.isi.edu"
                                  mount-point="/nfs/scratch01" /&gt;
                     &lt;internal-mount-point mount-point="/nfs/scratch01"/&gt;
                  &lt;/shared&gt;
               &lt;/scratch&gt;
               &lt;storage&gt;
                  &lt;shared&gt;
                     &lt;file-server protocol="gsiftp" url="gsiftp://skynet-data.isi.edu" 
                                  mount-point="/exports/storage01"/&gt;
                     &lt;internal-mount-point mount-point="/exports/storage01"/&gt;
                  &lt;/shared&gt;
               &lt;/storage&gt;
          &lt;/head-fs&gt;
      &lt;replica-catalog  type="LRC" url="rlsn://smarty.isi.edu"/&gt;
      &lt;profile namespace="env" key="PEGASUS_HOME" &gt;/nfs/vdt/pegasus&lt;/profile&gt;
      &lt;profile namespace="env" key="GLOBUS_LOCATION" &gt;/vdt/globus&lt;/profile&gt;
  &lt;/site&gt;
&lt;/sitecatalog&gt;</pre>
<p>Described below are some of the entries in the site
      catalog.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>site</strong></span> - A site
            identifier.</p></li>
<li class="listitem"><p><span class="bold"><strong>replica-catalog</strong></span> - URL for a
            local replica catalog (LRC) to register your files in. Only used
            for RLS implementation of the RC. This is optional</p></li>
<li class="listitem">
<p><span class="bold"><strong>File Systems</strong></span> - Info about
            filesystems mounted on the remote clusters head node or worker
            nodes. It has several configurations</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>head-fs/scratch</strong></span> - This
                describe the scratch file systems (temporary for execution)
                available on the head node</p></li>
<li class="listitem"><p><span class="bold"><strong>head-fs/storage</strong></span> - This
                describes the storage file systems (long term) available on
                the head node</p></li>
<li class="listitem"><p><span class="bold"><strong>worker-fs/scratch</strong></span> -
                This describe the scratch file systems (temporary for
                execution) available on the worker node</p></li>
<li class="listitem"><p><span class="bold"><strong>worker-fs/storage</strong></span> -
                This describes the storage file systems (long term) available
                on the worker node</p></li>
</ul></div>
<p>Each scratch and storage entry can contain two sub
            entries,</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong></strong></span> SHARED for shared file systems
                like NFS, LUSTRE etc.</p></li>
<li class="listitem"><p><span class="bold"><strong></strong></span> LOCAL for local file systems
                (local to the node/machine)</p></li>
</ul></div>
<p>Each of the filesystems are defined by used a file-server
            element. Protocol defines the protocol uses to access the files,
            URL defines the url prefix to obtain the files from and
            mount-point is the mount point exposed by the file server.</p>
<p>Along with this an internal-mount-point needs to defined to
            access the files directly from the machine without any file
            servers.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>arch,os,osrelease,osversion,
            glibc</strong></span> - The arch/os/osrelease/osversion/glibc of the
            site. OSRELEASE, OSVERSION and GLIBC are optional</p>
<p>ARCH can have one of the following values X86, X86_64,
            SPARCV7, SPARCV9, AIX, PPC.</p>
<p>OS can have one of the following values LINUX,SUNOS,MACOSX.
            The default value for sysinfo if none specified is
            X86::LINUX</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Profiles</strong></span> - One or many
            profiles can be attached to a pool.</p>
<p>One example is the environments to be set on a remote
            pool.</p>
</li>
</ol></div>
<p>To use this site catalog the follow properties need to be
      set:</p>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>pegasus.catalog.site.file=<em class="replaceable"><code>&lt;path to the
          site catalog file&gt;</code></em></strong></span></p></li></ol></div>
</div>
<div class="section" title="4.3.3. Site Catalog Client pegasus-sc-client">
<div class="titlepage"><div><div><h3 class="title">
<a name="pegasus-sc-client"></a>4.3.3. Site Catalog Client pegasus-sc-client</h3></div></div></div>
<p>The pegasus-sc-client can be used to generate a site catalog for
      Open Science Grid (OSG) by querying their Monitoring Interface likes
      VORS or OSGMM. See pegasus-sc-client --help for more details</p>
</div>
<div class="section" title="4.3.4. Site Catalog Converter pegasus-sc-converter">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp11227056"></a>4.3.4. Site Catalog Converter pegasus-sc-converter</h3></div></div></div>
<p>Pegasus 4.2 by default now parses Site Catalog format conforming
      to the SC schema 4.0 (XML4) available <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/sc-4.0/sc-4.0.xsd" target="_top">here</a> and is explained in detail in the Catalog
      Properties section of <a class="link" href="running_workflows.php" title="Chapter 5. Running Workflows">Running
      Workflows</a>.</p>
<p>Pegasus 4.2 comes with a pegasus-sc-converter that will convert
      users old site catalog (XML3) to the XML4 format. Sample usage is given
      below.</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-sc-converter -i sample.sites.xml -I XML3 -o sample.sites.xml4 -O XML4
</strong></span>
2010.11.22 12:55:14.169 PST:   Written out the converted file to sample.sites.xml4
</pre>
<p>To use the converted site catalog, in the properties do the
      following:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>unset pegasus.catalog.site or set pegasus.catalog.site to
          XML</p></li>
<li class="listitem"><p>point pegasus.catalog.site.file to the converted site
          catalog</p></li>
</ol></div>
</div>
</div>
<div class="section" title="4.4. Executable Discovery (Transformation Catalog)">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="transformation"></a>4.4. Executable Discovery (Transformation Catalog)</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="creating_workflows.php#tc-Text">4.4.1. MultiLine Text based TC (Text)</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#tc-File">4.4.2. Singleline Text based TC (File)</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#tc-Database">4.4.3. Database TC (Database)</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#pegasus-tc-client">4.4.4. TC Client pegasus-tc-client</a></span></dt>
<dt><span class="section"><a href="creating_workflows.php#idp12368816">4.4.5. TC Converter Client pegasus-tc-converter</a></span></dt>
</dl></div>
<p>The Transformation Catalog maps logical transformations to physical
    executables on the system. It also provides additional information about
    the transformation as to what system they are compiled for, what profiles
    or environment variables need to be set when the transformation is invoked
    etc.</p>
<p>Pegasus currently supports two implementations of the Transformation
    Catalog</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>Text: </strong></span>A multiline text based
        Transformation Catalog (DEFAULT)</p></li>
<li class="listitem"><p><span class="bold"><strong>File:</strong></span> A simple multi column
        text based Transformation Catalog</p></li>
<li class="listitem"><p><span class="bold"><strong>Database:</strong></span> A database backend
        (MySQL or PostgreSQL) via JDB</p></li>
</ol></div>
<p>In this guide we will look at the format of the Multiline Text based
    TC.</p>
<div class="section" title="4.4.1. MultiLine Text based TC (Text)">
<div class="titlepage"><div><div><h3 class="title">
<a name="tc-Text"></a>4.4.1. MultiLine Text based TC (Text)</h3></div></div></div>
<p>The multile line text based TC is the new default TC in Pegasus.
      This format allows you to define the transformations</p>
<p>The file is read and cached in memory. Any modifications, as
      adding or deleting, causes an update of the memory and hence to the file
      underneath. All queries are done against the memory representation. The
      file sample.tc.text in the etc directory contains an example</p>
<pre class="programlisting">tr example::keg:1.0 { 

#specify profiles that apply for all the sites for the transformation 
#in each site entry the profile can be overriden 

  profile env "APP_HOME" "/tmp/myscratch"
  profile env "JAVA_HOME" "/opt/java/1.6"

  site isi {
    profile env "HELLo" "WORLD"
    profile condor "FOO" "bar"
    profile env "JAVA_HOME" "/bin/java.1.6"
    pfn "/path/to/keg"
    arch "x86"
    os "linux"
    osrelease "fc"
    osversion "4"
    type "INSTALLED"
  }

  site wind {
    profile env "CPATH" "/usr/cpath"
    profile condor "universe" "condor"
    pfn "file:///path/to/keg"
    arch "x86"
    os "linux"
    osrelease "fc"
    osversion "4"
    type "STAGEABLE"
  }
}</pre>
<p>The entries in this catalog have the following meaning</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>tr</strong></span> tr - A transformation
            identifier. (Normally a Namespace::Name:Version.. The Namespace
            and Version are optional.)</p></li>
<li class="listitem"><p><span class="bold"><strong>pfn</strong></span> - URL or file path for
            the location of the executable. The pfn is a file path if the
            transformation is of type INSTALLED and generally a url (file:///
            or http:// or gridftp://) if of type STAGEABLE</p></li>
<li class="listitem"><p><span class="bold"><strong>site</strong></span> - The site identifier
            for the site where the transformation is available</p></li>
<li class="listitem"><p><span class="bold"><strong>type</strong></span> - The type of
            transformation. Whether it is Iinstalled ("INSTALLED") on the
            remote site or is availabe to stage ("STAGEABLE").</p></li>
<li class="listitem">
<p><span class="bold"><strong>arch, os, osrelease,
            osversion</strong></span> - The arch/os/osrelease/osversion of the
            transformation. osrelease and osversion are optional.</p>
<p>ARCH can have one of the following values x86, x86_64,
            sparcv7, sparcv9, ppc, aix. The default value for arch is
            x86</p>
<p>OS can have one of the following values linux,sunos,macosx.
            The default value for OS if none specified is linux</p>
</li>
<li class="listitem"><p><span class="bold"><strong>Profiles</strong></span> - One or many
            profiles can be attached to a transformation for all sites or to a
            transformation on a particular site.</p></li>
</ol></div>
<p>To use this format of the Transformation Catalog you need to set
      the following properties</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation=Text</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation.file=<em class="replaceable"><code>&lt;path
            to the transformation catalog
            file&gt;</code></em></strong></span></p></li>
</ol></div>
</div>
<div class="section" title="4.4.2. Singleline Text based TC (File)">
<div class="titlepage"><div><div><h3 class="title">
<a name="tc-File"></a>4.4.2. Singleline Text based TC (File)</h3></div></div></div>
<div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Warning</h3>
<p>This format is now deprecated in favor of the multiline TC. If
        you are still using the single line TC you should convert it to
        multiline using the tc-converter client.</p>
</div>
<p>The format of the this TC is as follows.</p>
<pre class="programlisting">#site  logicaltr   physicaltr   type  system  profiles(NS::KEY="VALUE")

site1 sys::date:1.0 /usr/bin/date  INSTALLED INTEL32::LINUX:FC4.2:3.6 ENV::PATH="/usr/bin";PEGASUS_HOME="/usr/local/pegasus"</pre>
<p>The system and profile entries are optional and will use default
      values if not specified. The entries in the file format have the
      following meaning:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>site</strong></span> - A site
          identifier.</p></li>
<li class="listitem"><p><span class="bold"><strong>logicaltr</strong></span> - The logical
          transformation name. The format is NAMESPACE::NAME:VERSION where
          NAMESPACE and NAME are optional.</p></li>
<li class="listitem">
<p><span class="bold"><strong>physicaltr</strong></span> - The physical
          transformation path or URL.</p>
<p>If the transformation type is INSTALLED then it needs to be an
          absolute path to the executable. If the type is STAGEABLE then the
          path needs to be a HTTP, FTP or gsiftp URL</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>type</strong></span> - The type of
          transformation. Can have on of two values</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>INSTALLED</strong></span>: This means
              that the transformation is installed on the remote site</p></li>
<li class="listitem"><p><span class="bold"><strong>STAGEABLE</strong></span>: This means
              that the transformation is available as a static binary and can
              be staged to a remote site.</p></li>
</ul></div>
</li>
<li class="listitem">
<p><span class="bold"><strong>system</strong></span> - The system for which
          the transformation is compiled.</p>
<p>The formation of the sytem is ARCH::OS:OSVERSION:GLIBC where
          the GLIBC and OS VERSION are optional. ARCH can have one of the
          following values INTEL32, INTEL64, SPARCV7, SPARCV9, AIX, AMD64. OS
          can have one of the following values LINUX,SUNOS. The default value
          for system if none specified is INTEL32::LINUX</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Profiles</strong></span> - The profiles
          associated with the transformation. For indepth information about
          profiles and their priorities read the Profile Guide.</p>
<p>The format for profiles is NS::KEY="VALUE" where NS is the
          namespace of the profile e.g. Pegasus,condor,DAGMan,env,globus. The
          key and value can be any strings. Remember to quote the value with
          double quotes. If you need to specify several profiles you can do it
          in several ways</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
<p>NS1::KEY1="VALUE1",KEY2="VALUE2";NS2::KEY3="VALUE3",KEY4="VALUE4"</p>
<p>This is the most optimized form. Multiple key values for
              the same namespace are separated by a comma "," and different
              namespaces are separated by a semicolon ";"</p>
</li>
<li class="listitem">
<p>NS1::KEY1="VALUE1";NS1::KEY2="VALUE2";NS2::KEY3="VALUE3";NS2::KEY4="VALUE4"</p>
<p>You can also just repeat the triple of NS::KEY="VALUE"
              separated by semicolons for a simple format;</p>
</li>
</ul></div>
</li>
</ol></div>
<p>To use this format of the Transformation Catalog you need to set
      the following properties</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation=File</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation.file=<em class="replaceable"><code>&lt;path
            to the transformation catalog
            file&gt;</code></em></strong></span></p></li>
</ol></div>
</div>
<div class="section" title="4.4.3. Database TC (Database)">
<div class="titlepage"><div><div><h3 class="title">
<a name="tc-Database"></a>4.4.3. Database TC (Database)</h3></div></div></div>
<p>The database TC alows you to use a relational database. To use the
      database TC you need to have installed a MySQL or PostgreSQL server. The
      schema for the database is available in $PEGASUS_HOME/sql directory. You
      will have to install the schema into either PostgreSQL or MySQL by
      running the appropriate commands to load the two scheams <span class="bold"><strong>create-XX-init.sql</strong></span> and <span class="bold"><strong>create-XX-tc.sql</strong></span> where XX is either <span class="bold"><strong>my</strong></span> (for MySQL) or <span class="bold"><strong>pg</strong></span> (for PostgreSQL)</p>
<p>To use the Database TC you need to set the following
      properties</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation.db.driver=MySQL |
            Postgres</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation.db.url=<em class="replaceable"><code>&lt;jdbc
            url to the databse&gt;</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation.db.user=<em class="replaceable"><code>&lt;database
            user&gt;</code></em></strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>pegasus.catalog.transformation.db.password=<em class="replaceable"><code>&lt;database
            password&gt;</code></em></strong></span></p></li>
</ol></div>
</div>
<div class="section" title="4.4.4. TC Client pegasus-tc-client">
<div class="titlepage"><div><div><h3 class="title">
<a name="pegasus-tc-client"></a>4.4.4. TC Client pegasus-tc-client</h3></div></div></div>
<p>We need to map our declared transformations (preprocess,
      findranage, and analyze) from the example DAX above to a simple "mock
      application" name "keg" ("canonical example for the grid") which reads
      input files designated by arguments, writes them back onto output files,
      and produces on STDOUT a summary of where and when it was run. Keg ships
      with Pegasus in the bin directory. Run keg on the command line to see
      how it works.</p>
<pre class="screen"><span class="command"><strong>$ keg -o /dev/fd/1</strong></span>
<code class="computeroutput">
Timestamp Today: 20040624T054607-05:00 (1088073967.418;0.022)
Applicationname: keg @ 10.10.0.11 (VPN)
Current Workdir: /home/unique-name
Systemenvironm.: i686-Linux 2.4.18-3
Processor Info.: 1 x Pentium III (Coppermine) @ 797.425
Output Filename: /dev/fd/1</code></pre>
<p>Now we need to map all 3 transformations onto the "keg"
      executable. We place these mappings in our File transformation catalog
      for site clus1.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>In earlier version of Pegasus users had to define entries for
        Pegasus executables such as transfer, replica client, dirmanager, etc
        on each site as well as site "local". This is no longer required.
        Pegasus versions 2.0 and later automatically pick up the paths for
        these binaries from the environment profile PEGASUS_HOME set in the
        site catalog for each site.</p>
<p>A single entry needs to be on one line. The above example is
        just formatted for convenience.</p>
</div>
<p>Alternatively you can also use the pegasus-tc-client to add
      entries to any implementation of the transformation catalog. The
      following example shows the addiition the last entry in the File based
      transformation catalog.</p>
<pre class="screen"><span class="command"><strong>$ pegasus-tc-client -Dpegasus.catalog.transformation=Text \
-Dpegasus.catalog.transformation.file=$HOME/tc -a -r clus1 -l black::analyze:1.0 \
-p gsiftp://clus1.com/opt/nfs/vdt/pegasus/bin/keg  -t STAGEABLE -s INTEL32::LINUX \
-e ENV::KEY3="VALUE3"</strong></span><code class="computeroutput">

2007.07.11 16:12:03.712 PDT: [INFO] Added tc entry sucessfully</code></pre>
<p>To verify if the entry was correctly added to the transformation
      catalog you can use the pegasus-tc-client to query.</p>
<pre class="screen"><span class="command"><strong>$ pegasus-tc-client -Dpegasus.catalog.transformation=File \
-Dpegasus.catalog.transformation.file=$HOME/tc -q -P -l black::analyze:1.0</strong></span>

<code class="computeroutput">#RESID     LTX          PFN                  TYPE              SYSINFO

clus1    black::analyze:1.0    gsiftp://clus1.com/opt/nfs/vdt/pegasus/bin/keg
                STAGEABLE    INTEL32::LINUX</code></pre>
<p></p>
</div>
<div class="section" title="4.4.5. TC Converter Client pegasus-tc-converter">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp12368816"></a>4.4.5. TC Converter Client pegasus-tc-converter</h3></div></div></div>
<p>Pegasus 3.0 by default now parses a file based multiline textual
      format of a Transformation Catalog. The new Text format is explained in
      detail in the chapter on Catalogs.</p>
<p>Pegasus 3.0 comes with a pegasus-tc-converter that will convert
      users old transformation catalog ( File ) to the Text format. Sample
      usage is given below.</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-tc-converter -i sample.tc.data -I File -o sample.tc.text -O Text
</strong></span>
2010.11.22 12:53:16.661 PST:   Successfully converted Transformation Catalog from File to Text 
2010.11.22 12:53:16.666 PST:   The output transfomation catalog is in file  /lfs1/software/install/pegasus/pegasus-3.0.0cvs/etc/sample.tc.text 
</pre>
<p>To use the converted transformation catalog, in the properties do
      the following:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>unset pegasus.catalog.transformation or set
          pegasus.catalog.transformation to Text</p></li>
<li class="listitem"><p>point pegasus.catalog.transformation.file to the converted
          transformation catalog</p></li>
</ol></div>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="installation.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="running_workflows.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 3. Installation </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> Chapter 5. Running Workflows</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
