<?php  
            require('/srv/new-pegasus.isi.edu/includes/common.php'); 
            pegasus_header("Chapter 10. Data Management");
        ?><div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.php">Pegasus 4.8.0 User Guide</a></span> &gt; <span class="breadcrumb-node">Data Management</span>
</div><hr><div class="chapter">
<div class="titlepage"><div><div><h1 class="title">
<a name="data_management"></a>Chapter 10. Data Management</h1></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="data_management.php#replica_selection">10.1. Replica Selection</a></span></dt>
<dt><span class="section"><a href="transfer.php">10.2. Data Transfers</a></span></dt>
<dt><span class="section"><a href="cred_staging.php">10.3. Credentials Management</a></span></dt>
<dt><span class="section"><a href="ref_staging_mapper.php">10.4. Staging Mappers</a></span></dt>
<dt><span class="section"><a href="ref_output_mapper.php">10.5. Output Mappers</a></span></dt>
<dt><span class="section"><a href="data_cleanup.php">10.6. Data Cleanup</a></span></dt>
<dt><span class="section"><a href="metadata.php">10.7. Metadata</a></span></dt>
</dl></div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="replica_selection"></a>10.1. Replica Selection</h2></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="data_management.php#idm4060">10.1.1. Configuration</a></span></dt>
<dt><span class="section"><a href="data_management.php#idm4077">10.1.2. Supported Replica Selectors</a></span></dt>
</dl></div>
<p>Each job in the DAX maybe associated with input LFN's denoting the
    files that are required for the job to run. To determine the physical
    replica (PFN) for a LFN, Pegasus queries the Replica catalog to get all
    the PFN's (replicas) associated with a LFN. The Replica Catalog may return
    multiple PFN's for each of the LFN's queried. Hence, Pegasus needs to
    select a single PFN amongst the various PFN's returned for each LFN. This
    process is known as replica selection in Pegasus. Users can specify the
    replica selector to use in the properties file.</p>
<p>This document describes the various Replica Selection Strategies in
    Pegasus.</p>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="idm4060"></a>10.1.1. Configuration</h3></div></div></div>
<p>The user properties determine what replica selector Pegasus
      Workflow Mapper uses. The property <span class="bold"><strong>pegasus.selector.replica</strong></span> is used to specify the
      replica selection strategy. Currently supported Replica Selection
      strategies are</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Default</p></li>
<li class="listitem"><p>Regex</p></li>
<li class="listitem"><p>Restricted</p></li>
<li class="listitem"><p>Local</p></li>
</ol></div>
<p>The values are case sensitive. For example the following property
      setting will throw a Factory Exception .</p>
<pre class="programlisting">pegasus.selector.replica  default</pre>
<p>The correct way to specify is</p>
<pre class="programlisting">pegasus.selector.replica  Default</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="idm4077"></a>10.1.2. Supported Replica Selectors</h3></div></div></div>
<p>The various Replica Selectors supported in Pegasus Workflow Mapper
      are explained below.</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>Starting 4.6.0 release the Default and Regex Replica Selectors
        return an ordered list with priorities set. pegasus-transfer at
        runtime will failover to alternate url's specified, if a higher
        priority source URL is inaccessible.</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="replica_selection_default"></a>10.1.2.1. Default</h4></div></div></div>
<p>This is the default replica selector used in the Pegasus
        Workflow Mapper. If the property pegasus.selector.replica is not
        defined in properties, then Pegasus uses this selector.</p>
<p>The selector orders the various candidate replica's according to
        the following rules</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>valid file URL's . That is URL's that have the site
            attribute matching the site where the executable
            <span class="emphasis"><em>pegasus-transfer</em></span> is executed.</p></li>
<li class="listitem"><p>all URL's from preferred site (usually the compute
            site)</p></li>
<li class="listitem"><p>all other remotely accessible ( non file) URL's</p></li>
</ol></div>
<p>To use this replica selector set the following
        property</p>
<pre class="programlisting">pegasus.selector.replica                  Default</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="idm4096"></a>10.1.2.2. Regex</h4></div></div></div>
<p>This replica selector allows the user to specific regular
        expressions that can be used to rank various PFN's returned from the
        Replica Catalog for a particular LFN. This replica selector orders the
        replicas based on the rank. Lower the rank higher the
        preference.</p>
<p>The regular expressions are assigned different rank, that
        determine the order in which the expressions are employed. The rank
        values for the regex can expressed in user properties using the
        property.</p>
<pre class="programlisting">pegasus.selector.replica.regex.rank.<span class="bold"><strong>[value]</strong></span>                  regex-expression</pre>
<p>The <span class="bold"><strong>[value]</strong></span> in the above
        property is an integer value that denotes the rank of an expression
        with a rank value of 1 being the highest rank.</p>
<p>For example, a user can specify the following regex expressions
        that will ask Pegasus to prefer file URL's over gsiftp url's from
        example.isi.edu</p>
<pre class="programlisting">pegasus.selector.replica.regex.rank.1                       file://.*
pegasus.selector.replica.regex.rank.2                       gsiftp://example\.isi\.edu.*</pre>
<p>User can specify as many regex expressions as they want.</p>
<p>Since Pegasus is in Java , the regex expression support is what
        Java supports. It is pretty close to what is supported by Perl. More
        details can be found at
        http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html</p>
<p>Before applying any regular expressions on the PFN's for a
        particular LFN that has to be staged to a site X, the file URL's that
        don't match the site X are explicitly filtered out.</p>
<p>To use this replica selector set the following
        property</p>
<pre class="programlisting">pegasus.selector.replica                  Regex</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="idm4111"></a>10.1.2.3. Restricted</h4></div></div></div>
<p>This replica selector, allows the user to specify good sites and
        bad sites for staging in data to a particular compute site. A good
        site for a compute site X, is a preferred site from which replicas
        should be staged to site X. If there are more than one good sites
        having a particular replica, then a random site is selected amongst
        these preferred sites.</p>
<p>A bad site for a compute site X, is a site from which replicas
        should not be staged. The reason of not accessing replica from a bad
        site can vary from the link being down, to the user not having
        permissions on that site's data.</p>
<p>The good | bad sites are specified by the following
        properties</p>
<pre class="programlisting">pegasus.replica.*.prefer.stagein.sites
pegasus.replica.*.ignore.stagein.sites</pre>
<p>where the * in the property name denotes the name of the compute
        site. A * in the property key is taken to mean all sites. The value to
        these properties is a comma separated list of sites.</p>
<p>For example the following settings</p>
<pre class="programlisting">pegasus.selector.replica.*.prefer.stagein.sites            usc
pegasus.replica.uwm.prefer.stagein.sites                   isi,cit
</pre>
<p>means that prefer all replicas from site usc for staging in to
        any compute site. However, for uwm use a tighter constraint and prefer
        only replicas from site isi or cit. The pool attribute associated with
        the PFN's tells the replica selector to what site a replica/PFN is
        associated with.</p>
<p>The pegasus.replica.*.prefer.stagein.sites property takes
        precedence over pegasus.replica.*.ignore.stagein.sites property i.e.
        if for a site X, a site Y is specified both in the ignored and the
        preferred set, then site Y is taken to mean as only a preferred site
        for a site X.</p>
<p>To use this replica selector set the following property</p>
<pre class="programlisting">pegasus.selector.replica                  Restricted</pre>
</div>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="replica_selection_local"></a>10.1.2.4. Local</h4></div></div></div>
<p>This replica selector always prefers replicas from the local
        host ( pool attribute set to local ) and that start with a file: URL
        scheme. It is useful, when users want to stagein files to a remote
        site from the submit host using the Condor file transfer
        mechanism.</p>
<p>To use this replica selector set the following
        property</p>
<pre class="programlisting">pegasus.selector.replica                  Local</pre>
</div>
</div>
</div>
</div><div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="workflow_of_workflows.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="transfer.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">9.6. Workflow of Workflows </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 10.2. Data Transfers</td>
</tr>
</table>
</div><?php  
            pegasus_footer();
        ?>
