<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="workflow_of_workflows.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="transfer.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter" title="Chapter 9. Data Management">
<div class="titlepage"><div><div><h2 class="title">
<a name="data_management"></a>Chapter 9. Data Management</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="data_management.php#replica_selection">9.1. Replica Selection</a></span></dt>
<dt><span class="section"><a href="transfer.php">9.2. Data Transfers</a></span></dt>
<dt><span class="section"><a href="cred_staging.php">9.3. Credentials Management</a></span></dt>
<dt><span class="section"><a href="ref_output_mapper.php">9.4. Output Mappers</a></span></dt>
<dt><span class="section"><a href="data_cleanup.php">9.5. Data Cleanup</a></span></dt>
</dl></div>
<div class="section" title="9.1. Replica Selection">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="replica_selection"></a>9.1. Replica Selection</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="data_management.php#idp9918016">9.1.1. Configuration</a></span></dt>
<dt><span class="section"><a href="data_management.php#idp6978112">9.1.2. Supported Replica Selectors</a></span></dt>
</dl></div>
<p>Each job in the DAX maybe associated with input LFN&amp;rsquor;s
    denoting the files that are required for the job to run. To determine the
    physical replica (PFN) for a LFN, Pegasus queries the Replica catalog to
    get all the PFN&amp;rsquor;s (replicas) associated with a LFN. The Replica
    Catalog may return multiple PFN's for each of the LFN's queried. Hence,
    Pegasus needs to select a single PFN amongst the various PFN's returned
    for each LFN. This process is known as replica selection in Pegasus. Users
    can specify the replica selector to use in the properties file.</p>
<p>This document describes the various Replica Selection Strategies in
    Pegasus.</p>
<div class="section" title="9.1.1. Configuration">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp9918016"></a>9.1.1. Configuration</h3></div></div></div>
<p>The user properties determine what replica selector Pegasus
      Workflow Mapper uses. The property <span class="bold"><strong>pegasus.selector.replica</strong></span> is used to specify the
      replica selection strategy. Currently supported Replica Selection
      strategies are</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Default</p></li>
<li class="listitem"><p>Restricted</p></li>
<li class="listitem"><p>Regex</p></li>
</ol></div>
<p>The values are case sensitive. For example the following property
      setting will throw a Factory Exception .</p>
<pre class="programlisting">pegasus.selector.replica  default</pre>
<p>The correct way to specify is</p>
<pre class="programlisting">pegasus.selector.replica  Default</pre>
</div>
<div class="section" title="9.1.2. Supported Replica Selectors">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp6978112"></a>9.1.2. Supported Replica Selectors</h3></div></div></div>
<p>The various Replica Selectors supported in Pegasus Workflow Mapper
      are explained below</p>
<div class="section" title="9.1.2.1. Default">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp5773712"></a>9.1.2.1. Default</h4></div></div></div>
<p>This is the default replica selector used in the Pegasus
        Workflow Mapper. If the property pegasus.selector.replica is not
        defined in properties, then Pegasus uses this selector.</p>
<p>This selector looks at each PFN returned for a LFN and checks to
        see if</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>the PFN is a file URL (starting with file:///)</p></li>
<li class="listitem"><p>the PFN has a pool attribute matching to the site handle of
            the site where the compute job that requires the input file is to
            be run.</p></li>
</ol></div>
<p>If a PFN matching the conditions above exists then that is
        returned by the selector .</p>
<p><span class="bold"><strong>Else,</strong></span> a random PFN is selected
        amongst all the PFN&amp;rsquor;s that have a pool attribute matching
        to the site handle of the site where a compute job is to be
        run.</p>
<p><span class="bold"><strong>Else,</strong></span> a random pfn is selected
        amongst all the PFN&amp;rsquor;s</p>
<p>To use this replica selector set the following
        property</p>
<pre class="programlisting">pegasus.selector.replica                  Default</pre>
</div>
<div class="section" title="9.1.2.2. Restricted">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp7613808"></a>9.1.2.2. Restricted</h4></div></div></div>
<p>This replica selector, allows the user to specify good sites and
        bad sites for staging in data to a particular compute site. A good
        site for a compute site X, is a preferred site from which replicas
        should be staged to site X. If there are more than one good sites
        having a particular replica, then a random site is selected amongst
        these preferred sites.</p>
<p>A bad site for a compute site X, is a site from which
        replica&amp;rsquor;s should not be staged. The reason of not accessing
        replica from a bad site can vary from the link being down, to the user
        not having permissions on that site&amp;rsquor;s data.</p>
<p>The good | bad sites are specified by the following
        properties</p>
<pre class="programlisting">pegasus.replica.*.prefer.stagein.sites
pegasus.replica.*.ignore.stagein.sites</pre>
<p>where the * in the property name denotes the name of the compute
        site. A * in the property key is taken to mean all sites. The value to
        these properties is a comma separated list of sites.</p>
<p>For example the following settings</p>
<pre class="programlisting">pegasus.selector.replica.*.prefer.stagein.sites            usc
pegasus.replica.uwm.prefer.stagein.sites                   isi,cit
</pre>
<p>means that prefer all replicas from site usc for staging in to
        any compute site. However, for uwm use a tighter constraint and prefer
        only replicas from site isi or cit. The pool attribute associated with
        the PFN's tells the replica selector to what site a replica/PFN is
        associated with.</p>
<p>The pegasus.replica.*.prefer.stagein.sites property takes
        precedence over pegasus.replica.*.ignore.stagein.sites property i.e.
        if for a site X, a site Y is specified both in the ignored and the
        preferred set, then site Y is taken to mean as only a preferred site
        for a site X.</p>
<p>To use this replica selector set the following property</p>
<pre class="programlisting">pegasus.selector.replica                  Restricted</pre>
</div>
<div class="section" title="9.1.2.3. Regex">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp6448016"></a>9.1.2.3. Regex</h4></div></div></div>
<p>This replica selector allows the user allows the user to
        specific regex expressions that can be used to rank various
        PFN&amp;rsquor;s returned from the Replica Catalog for a particular
        LFN. This replica selector selects the highest ranked PFN i.e the
        replica with the lowest rank value.</p>
<p>The regular expressions are assigned different rank, that
        determine the order in which the expressions are employed. The rank
        values for the regex can expressed in user properties using the
        property.</p>
<pre class="programlisting">pegasus.selector.replica.regex.rank.<span class="bold"><strong>[value]</strong></span>                  regex-expression</pre>
<p>The <span class="bold"><strong>[value]</strong></span> in the above
        property is an integer value that denotes the rank of an expression
        with a rank value of 1 being the highest rank.</p>
<p>For example, a user can specify the following regex expressions
        that will ask Pegasus to prefer file URL's over gsiftp url's from
        example.isi.edu</p>
<pre class="programlisting">pegasus.selector.replica.regex.rank.1                       file://.*
pegasus.selector.replica.regex.rank.2                       gsiftp://example\.isi\.edu.*</pre>
<p>User can specify as many regex expressions as they want.</p>
<p>Since Pegasus is in Java , the regex expression support is what
        Java supports. It is pretty close to what is supported by Perl. More
        details can be found at
        http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html</p>
<p>Before applying any regular expressions on the PFN&amp;rsquor;s
        for a particular LFN that has to be staged to a site X, the file
        URL&amp;rsquor;s that don't match the site X are explicitly filtered
        out.</p>
<p>To use this replica selector set the following
        property</p>
<pre class="programlisting">pegasus.selector.replica                  Regex</pre>
</div>
<div class="section" title="9.1.2.4. Local">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp7433088"></a>9.1.2.4. Local</h4></div></div></div>
<p>This replica selector always prefers replicas from the local
        host ( pool attribute set to local ) and that start with a file: URL
        scheme. It is useful, when users want to stagein files to a remote
        site from the submit host using the Condor file transfer
        mechanism.</p>
<p>To use this replica selector set the following
        property</p>
<pre class="programlisting">pegasus.selector.replica                  Default</pre>
</div>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="workflow_of_workflows.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="transfer.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">8.5. Workflow of Workflows </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 9.2. Data Transfers</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
