<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="replica.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="transformation.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section" title="4.3. Resource Discovery (Site Catalog)">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="site"></a>4.3. Resource Discovery (Site Catalog)</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="site.php#sc-XML4">4.3.1. XML4</a></span></dt>
<dt><span class="section"><a href="site.php#sc-XML3">4.3.2. XML3</a></span></dt>
<dt><span class="section"><a href="site.php#idp5929504">4.3.3. Site Catalog Converter pegasus-sc-converter</a></span></dt>
</dl></div>
<p>The Site Catalog describes the compute resources (which are often
    clusters) that we intend to run the workflow upon. A site is a homogeneous
    part of a cluster that has at least a single GRAM gatekeeper with a
    <span class="bold"><strong>jobmanager-fork</strong></span>
    and<span class="emphasis"><em>jobmanager-&lt;scheduler&gt; </em></span> interface and at
    least one <span class="bold"><strong>gridftp</strong></span> server along with a
    shared file system. The GRAM gatekeeper can be either WS GRAM or Pre-WS
    GRAM. A site can also be a condor pool or glidein pool with a shared file
    system.</p>
<p>The Site Catalog can be described as an XML . Pegasus currently
    supports two schemas for the Site Catalog:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>XML4</strong></span>(Default) Corresponds to
        the schema described <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/sc-4.0/sc-4.0.html" target="_top">here</a>.</p></li>
<li class="listitem"><p><span class="bold"><strong>XML3</strong></span>(Deprecated) Corresponds to
        the schema described <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/sc-3.0/sc-3.0.html" target="_top">here</a></p></li>
</ol></div>
<div class="section" title="4.3.1. XML4">
<div class="titlepage"><div><div><h3 class="title">
<a name="sc-XML4"></a>4.3.1. XML4</h3></div></div></div>
<p>This is the default format for Pegasus 4.2. This format allows
      defining filesystem of shared as well as local type on the head node of
      the remote cluster as well as on the backend nodes</p>
<div class="figure">
<a name="idp5867952"></a><p class="title"><b>Figure 4.3. Schema Image of the Site Catalog XML4</b></p>
<div class="figure-contents"><div class="mediaobject"><img src="images/sc-4.0_p2.png" alt="Schema Image of the Site Catalog XML4"></div></div>
</div>
<br class="figure-break"><p>Below is an example of the XML4 site catalog</p>
<pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;

    &lt;site  handle="local" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/tmp/workflows/scratch"&gt;
            &lt;file-server operation="all" url="file:///tmp/workflows/scratch"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/tmp/workflows/outputs"&gt;
            &lt;file-server operation="all" url="file:///tmp/workflows/outputs"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

    &lt;site  handle="condor_pool" arch="x86_64" os="LINUX"&gt;
        &lt;grid type="gt5" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/&gt;
        &lt;grid type="gt5" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/&gt;
        &lt;directory type="shared-scratch" path="/lustre"&gt;
            &lt;file-server operation="all" url="gsiftp://smarty.isi.edu/lustre"/&gt;
        &lt;/directory&gt;
        &lt;replica-catalog type="LRC" url="rlsn://smarty.isi.edu"/&gt;
    &lt;/site&gt;

    &lt;site  handle="staging_site" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/data"&gt;
            &lt;file-server operation="put" url="scp://obelix.isi.edu/data"/&gt;
            &lt;file-server operation="get" url="http://obelix.isi.edu/data"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

&lt;/sitecatalog&gt;
      </pre>
<p>Described below are some of the entries in the site
      catalog.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>site</strong></span> - A site
            identifier.</p></li>
<li class="listitem">
<p><span class="bold"><strong>Directory</strong></span> - Info about
            filesystems Pegasus can use for storing temporary and long-term
            files. There are several configurations:</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>shared-scratch</strong></span> - This
                describe a scratch file systems. Pegasus will use this to
                store intermediate data between jobs and other temporary
                files.</p></li>
<li class="listitem"><p><span class="bold"><strong>local-storage</strong></span> - This
                describes the storage file systems (long term). This is the
                directory Pegasus will stage output files to.</p></li>
<li class="listitem"><p><span class="bold"><strong>local-scratch</strong></span> - This
                describe the scratch file systems available locally on a
                compute node. This parameter is not commonly used and can be
                left unset in most cases.</p></li>
</ul></div>
<p>For each of the directories, you can specify access methods.
            Allowed methods are <span class="bold"><strong>put</strong></span>,
            <span class="bold"><strong>get</strong></span>, and <span class="bold"><strong>all</strong></span> which means both put and get. For each
            mehod, specify a URL including the protocol. For example, if you
            want share data via http using the /var/www/staging directory, you
            can use scp://hostname/var/www for the put element and
            http://hostname/staging for the get element.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>arch,os,osrelease,osversion,
            glibc</strong></span> - The arch/os/osrelease/osversion/glibc of the
            site. OSRELEASE, OSVERSION and GLIBC are optional</p>
<p>ARCH can have one of the following values X86, X86_64,
            SPARCV7, SPARCV9, AIX, PPC.</p>
<p>OS can have one of the following values LINUX,SUNOS,MACOSX.
            The default value for sysinfo if none specified is
            X86::LINUX</p>
</li>
<li class="listitem"><p><span class="bold"><strong>replica-catalog</strong></span> - URL for a
            local replica catalog (LRC) to register your files in. Only used
            for RLS implementation of the RC. This is optional and support for
            RLS has been dropped in Pegasus 4.5.0 release.</p></li>
<li class="listitem">
<p><span class="bold"><strong>Profiles</strong></span> - One or many
            profiles can be attached to a pool.</p>
<p>One example is the environments to be set on a remote
            pool.</p>
</li>
</ol></div>
<p>To use this site catalog the follow properties need to be
      set:</p>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>pegasus.catalog.site.file=<em class="replaceable"><code>&lt;path to the
          site catalog file&gt;</code></em></strong></span></p></li></ol></div>
</div>
<div class="section" title="4.3.2. XML3">
<div class="titlepage"><div><div><h3 class="title">
<a name="sc-XML3"></a>4.3.2. XML3</h3></div></div></div>
<div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Warning</h3>
<p>This format is now deprecated in favor of the XML4 format. If
          you are still using the File format you should convert it to XML4
          format using the client pegasus-sc-converter</p>
</div>
<p>This is the default format for Pegasus 3.0. This format allows
      defining filesystem of shared as well as local type on the head node of
      the remote cluster as well as on the backend nodes</p>
<div class="figure">
<a name="idp5898560"></a><p class="title"><b>Figure 4.4. Schema Image of the Site Catalog XML 3</b></p>
<div class="figure-contents"><div class="mediaobject"><img src="images/sc-3.0_p2.png" alt="Schema Image of the Site Catalog XML 3"></div></div>
</div>
<br class="figure-break"><p>Below is an example of the XML3 site catalog</p>
<pre class="programlisting">&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog 
http://pegasus.isi.edu/schema/sc-3.0.xsd" version="3.0"&gt;
  &lt;site  handle="isi" arch="x86" os="LINUX" osrelease="" osversion="" glibc=""&gt;
      &lt;grid  type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/&gt;
      &lt;grid  type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/&gt;
          &lt;head-fs&gt;
               &lt;scratch&gt;
                  &lt;shared&gt;
                     &lt;file-server protocol="gsiftp" url="gsiftp://skynet-data.isi.edu"
                                  mount-point="/nfs/scratch01" /&gt;
                     &lt;internal-mount-point mount-point="/nfs/scratch01"/&gt;
                  &lt;/shared&gt;
               &lt;/scratch&gt;
               &lt;storage&gt;
                  &lt;shared&gt;
                     &lt;file-server protocol="gsiftp" url="gsiftp://skynet-data.isi.edu" 
                                  mount-point="/exports/storage01"/&gt;
                     &lt;internal-mount-point mount-point="/exports/storage01"/&gt;
                  &lt;/shared&gt;
               &lt;/storage&gt;
          &lt;/head-fs&gt;
      &lt;replica-catalog  type="LRC" url="rlsn://smarty.isi.edu"/&gt;
      &lt;profile namespace="env" key="PEGASUS_HOME" &gt;/nfs/vdt/pegasus&lt;/profile&gt;
      &lt;profile namespace="env" key="GLOBUS_LOCATION" &gt;/vdt/globus&lt;/profile&gt;
  &lt;/site&gt;
&lt;/sitecatalog&gt;</pre>
<p>Described below are some of the entries in the site
      catalog.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>site</strong></span> - A site
            identifier.</p></li>
<li class="listitem"><p><span class="bold"><strong>replica-catalog</strong></span> - URL for a
            local replica catalog (LRC) to register your files in. Only used
            for RLS implementation of the RC. This is optional and support for
            RLS has been dropped in Pegasus 4.5.0.</p></li>
<li class="listitem">
<p><span class="bold"><strong>File Systems</strong></span> - Info about
            filesystems mounted on the remote clusters head node or worker
            nodes. It has several configurations</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>head-fs/scratch</strong></span> - This
                describe the scratch file systems (temporary for execution)
                available on the head node</p></li>
<li class="listitem"><p><span class="bold"><strong>head-fs/storage</strong></span> - This
                describes the storage file systems (long term) available on
                the head node</p></li>
<li class="listitem"><p><span class="bold"><strong>worker-fs/scratch</strong></span> -
                This describe the scratch file systems (temporary for
                execution) available on the worker node</p></li>
<li class="listitem"><p><span class="bold"><strong>worker-fs/storage</strong></span> -
                This describes the storage file systems (long term) available
                on the worker node</p></li>
</ul></div>
<p>Each scratch and storage entry can contain two sub
            entries,</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong></strong></span> SHARED for shared file systems
                like NFS, LUSTRE etc.</p></li>
<li class="listitem"><p><span class="bold"><strong></strong></span> LOCAL for local file systems
                (local to the node/machine)</p></li>
</ul></div>
<p>Each of the filesystems are defined by used a file-server
            element. Protocol defines the protocol uses to access the files,
            URL defines the url prefix to obtain the files from and
            mount-point is the mount point exposed by the file server.</p>
<p>Along with this an internal-mount-point needs to defined to
            access the files directly from the machine without any file
            servers.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>arch,os,osrelease,osversion,
            glibc</strong></span> - The arch/os/osrelease/osversion/glibc of the
            site. OSRELEASE, OSVERSION and GLIBC are optional</p>
<p>ARCH can have one of the following values X86, X86_64,
            SPARCV7, SPARCV9, AIX, PPC.</p>
<p>OS can have one of the following values LINUX,SUNOS,MACOSX.
            The default value for sysinfo if none specified is
            X86::LINUX</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Profiles</strong></span> - One or many
            profiles can be attached to a pool.</p>
<p>One example is the environments to be set on a remote
            pool.</p>
</li>
</ol></div>
<p>To use this site catalog the follow properties need to be
      set:</p>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p><span class="bold"><strong>pegasus.catalog.site.file=<em class="replaceable"><code>&lt;path to the
          site catalog file&gt;</code></em></strong></span></p></li></ol></div>
</div>
<div class="section" title="4.3.3. Site Catalog Converter pegasus-sc-converter">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp5929504"></a>4.3.3. Site Catalog Converter pegasus-sc-converter</h3></div></div></div>
<p>Pegasus 4.2 by default now parses Site Catalog format conforming
      to the SC schema 4.0 (XML4) available <a class="ulink" href="http://pegasus.isi.edu/wms/docs/schemas/sc-4.0/sc-4.0.xsd" target="_top">here</a> and is explained in detail in the Catalog
      Properties section of <a class="link" href="running_workflows.php" title="Chapter 5. Running Workflows">Running
      Workflows</a>.</p>
<p>Pegasus 4.2 comes with a pegasus-sc-converter that will convert
      users old site catalog (XML3) to the XML4 format. Sample usage is given
      below.</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-sc-converter -i sample.sites.xml -I XML3 -o sample.sites.xml4 -O XML4
</strong></span>
2010.11.22 12:55:14.169 PST:   Written out the converted file to sample.sites.xml4
</pre>
<p>To use the converted site catalog, in the properties do the
      following:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>unset pegasus.catalog.site or set pegasus.catalog.site to
          XML</p></li>
<li class="listitem"><p>point pegasus.catalog.site.file to the converted site
          catalog</p></li>
</ol></div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="replica.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="creating_workflows.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="transformation.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">4.2. Data Discovery (Replica Catalog) </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 4.4. Executable Discovery (Transformation Catalog)</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
