<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="running_workflows.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="submit_directory.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter" title="Chapter 6. Execution Environments">
<div class="titlepage"><div><div><h2 class="title">
<a name="execution_environments"></a>Chapter 6. Execution Environments</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="execution_environments.php#localhost">6.1. Localhost</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#condor_pool">6.2. Condor Pool</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#cloud">6.3. Infrastructure Clouds</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#globus_gram">6.4. Remote Cluster using Globus GRAM</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#creamce_submission">6.5. Remote Cluster using CREAMCE</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#glite">6.6. Local Cluster Using Glite</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#idp18424000">6.7. Remote Cluster using BOSCO and SSH submissions</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#campus_cluster">6.8. Campus Cluster</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#xsede">6.9. XSEDE</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#open_science_grid">6.10. Open Science Grid Using glideinWMS</a></span></dt>
</dl></div>
<p>Pegasus supports a number of execution environments. An execution
  environment is a setup where jobs from a workflow are running.</p>
<div class="section" title="6.1. Localhost">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="localhost"></a>6.1. Localhost</h2></div></div></div>
<p>In this configuration, Pegasus schedules the jobs to run locally on
    the submit host. Running locally is a good approach for smaller workflows,
    testing workflows, and for demonstations such as the <a class="link" href="tutorial.php" title="Chapter 2. Tutorial">Pegasus tutorial</a>. Pegasus supports two methods
    of local execution: local Condor pool, and shell planner. The former is
    preferred as the latter does not support all Pegasus' features (such as
    notifications).</p>
<p>Running on a local Condor pool is achieved by executing the workflow
    on site local ( <span class="bold"><strong>--sites local </strong></span>option to
    pegasus-plan ). The site "local" is a reserved site in Pegasus and results
    in the jobs to run on the submit host in condor universe local. The site
    catalog can be left very simple in this case:</p>
<pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;

    &lt;site  handle="local" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/tmp/wf/work"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/work"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/tmp/wf/storage"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/storage"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

&lt;/sitecatalog&gt;
</pre>
<p>The simplest execution environment does not involve Condor. Pegasus
    is capable of planning small workflows for local execution using a shell
    planner. Please refer to the <code class="filename">share/pegasus/examples</code> directory in your
    Pegasus installation, the shell planner's <a class="link" href="example_workflows.php#local_shell_examples" title="9.3. Local Shell Examples">documentation section</a>, or the
    tutorials, for details.</p>
</div>
<div class="section" title="6.2. Condor Pool">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="condor_pool"></a>6.2. Condor Pool</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="execution_environments.php#glideins">6.2.1. Glideins</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#idp13033264">6.2.2. CondorC</a></span></dt>
</dl></div>
<p>A Condor pool is a set of machines that use Condor for resource
    management. A Condor pool can be a cluster of dedicated machines or a set
    of distributively owned machines. Pegasus can generate concrete workflows
    that can be executed on a Condor pool.</p>
<div class="figure">
<a name="idp8198368"></a><p class="title"><b>Figure 6.1. The distributed resources appear to be part of a Condor
      pool.</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/condor_layout.png" height="360" alt="The distributed resources appear to be part of a Condor pool."></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The workflow is submitted using DAGMan from one of the job
    submission machines in the Condor pool. It is the responsibility of the
    Central Manager of the pool to match the task in the workflow submitted by
    DAGMan to the execution machines in the pool. This matching process can be
    guided by including Condor specific attributes in the submit files of the
    tasks. If the user wants to execute the workflow on the execution machines
    (worker nodes) in a Condor pool, there should be a resource defined in the
    site catalog which represents these execution machines. The universe
    attribute of the resource should be vanilla. There can be multiple
    resources associated with a single Condor pool, where each resource
    identifies a subset of machine (worker nodes) in the pool.</p>
<p>When running on a Condor pool, the user has to decide how Pegasus
    should transfer data. Please see the <a class="link" href="running_workflows.php#data_staging_configuration" title="5.3. Data Staging Configuration">Data Staging Configuration</a> for
    the options. The easiest is to use <span class="bold"><strong>condorio</strong></span> as that mode does not require any extra
    setup - Condor will do the transfers using the existing Condor daemons.
    For an example of this mode see the example workflow in
    <code class="filename">share/pegasus/examples/condor-blackdiamond-condorio/</code>
    . In condorio mode, the site catalog for the execution site is very simple
    as storage is provided by Condor:</p>
<pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;

    &lt;site  handle="local" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/tmp/wf/work"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/work"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/tmp/wf/storage"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/storage"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

    &lt;site  handle="condorpool" arch="x86_64" os="LINUX"&gt;
        &lt;profile namespace="pegasus" key="style" &gt;condor&lt;/profile&gt;
        &lt;profile namespace="condor" key="universe" &gt;vanilla&lt;/profile&gt;
    &lt;/site&gt;

&lt;/sitecatalog&gt;
</pre>
<p>There is a set of Condor profiles which are used commonly when
    running Pegasus workflows. You may have to set some or all of these
    depending on the setup of the Condor pool:</p>
<pre class="programlisting">  &lt;!-- Change the style to Condor for jobs to be executed in the Condor Pool.
       By default, Pegasus creates jobs suitable for grid execution. --&gt;
  &lt;profile namespace="pegasus" key="style"&gt;condor&lt;/profile&gt;

  &lt;!-- Change the universe to vanilla to make the jobs go to remote compute
       nodes. The default is local which will only run jobs on the submit host --&gt;
  &lt;profile namespace="condor" key="universe" &gt;vanilla&lt;/profhile&gt;

  &lt;!-- The requirements expression allows you to limit where your jobs go --&gt;
  &lt;profile namespace="condor" key="requirements"&gt;(Target.FileSystemDomain != &amp;quot;yggdrasil.isi.edu&amp;quot;)&lt;/profile&gt;

  &lt;!-- The following two profiles forces Condor to always transfer files. This
       has to be used if the pool does not have a shared filesystem --&gt;
  &lt;profile namespace="condor" key="should_transfer_files"&gt;True&lt;/profile&gt;
  &lt;profile namespace="condor" key="when_to_transfer_output"&gt;ON_EXIT&lt;/profile&gt;</pre>
<div class="section" title="6.2.1. Glideins">
<div class="titlepage"><div><div><h3 class="title">
<a name="glideins"></a>6.2.1. Glideins</h3></div></div></div>
<p>In this section we describe how machines from different
      administrative domains and supercomputing centers can be dynamically
      added to a Condor pool for certain timeframe. These machines join the
      Condor pool temporarily and can be used to execute jobs in a non
      preemptive manner. This functionality is achieved using a Condor feature
      called <span class="bold"><strong>glideins</strong></span> (see <a class="ulink" href="http://cs.wisc.edu/condor/glidein" target="_top">http://cs.wisc.edu/condor/glidein</a>)
      . The startd daemon is the Condor daemon which provides the compute
      slots and runs the jobs. In the glidein case, the submit machine is
      usually a static machine and the glideins are told configued to report
      to that submit machine. The glideins can be submitted to any type of
      resource: a GRAM enabled cluster, a campus cluster, a cloud environment
      such as Amazon AWS, or even another Condor cluster.</p>
<div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Tip</h3>
<p>As glideins are usually coming from different compute resource,
        and/or the glideins are running in an administrative domain different
        from the submit node, there is usually no shared filesystem available.
        Thus the most common <a class="link" href="running_workflows.php#data_staging_configuration" title="5.3. Data Staging Configuration">data
        staging modes</a> are <span class="bold"><strong>condorio</strong></span> and
        <span class="bold"><strong>nonsharedfs</strong></span> .</p>
</div>
<p>There are many useful tools which submits and manages glideins for
      you:</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><a class="ulink" href="http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/" target="_top">GlideinWMS</a>
          is a tool and host environment used mostly on the <a class="ulink" href="http://www.opensciencegrid.org/" target="_top">Open Science
          Grid</a>.</p></li>
<li class="listitem"><p><a class="ulink" href="http://pegasus.isi.edu/projects/corralwms" target="_top">CorralWMS</a> is
          a personal frontend for GlideinWMS. CorralWMS was developed by the
          Pegasus team and works very well for high throughput
          workflows.</p></li>
<li class="listitem"><p><a class="ulink" href="http://research.cs.wisc.edu/condor/manual/v7.6/condor_glidein.html" target="_top">condor_glidein</a>
          is a simple glidein tool for Globus GRAM clusters. condor_glidein is
          shipped with Condor.</p></li>
<li class="listitem"><p>Glideins can also be created by hand or scripts. This is a
          useful solution for example for cluster which have no external job
          submit mechanisms or do not allow outside networking.</p></li>
</ul></div>
</div>
<div class="section" title="6.2.2. CondorC">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp13033264"></a>6.2.2. CondorC</h3></div></div></div>
<p>Using CondorC users can submit workflows to remote condor pools.
      CondorC is a condor specific solution for remote submission that does
      not involve the setting up a GRAM on the headnode. To enable CondorC
      submission to a site, user needs to associate pegasus profile key named
      style with value as condorc. In case, the remote Condor pool does not
      have a shared filesytem between the nodes making up the pool, users
      should use pegasus in the condorio data configuration. In this mode, all
      the data is staged to the remote node in the Condor pool using Condor
      File transfers and is executed using PegasusLite.</p>
<p>A sample site catalog for submission to a CondorC enabled site is
      listed below</p>
<pre class="programlisting">
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;
      
    &lt;site  handle="local" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/tmp/wf/work"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/work"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/tmp/wf/storage"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/storage"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

    &lt;site  handle="condorcpool" arch="x86_86" os="LINUX"&gt;
         &lt;!-- the grid gateway entries are used to designate
              the remote schedd for the CondorC pool --&gt;
         &lt;grid type="condor" contact="ccg-condorctest.isi.edu" scheduler="Condor" jobtype="compute" /&gt;
         &lt;grid type="condor" contact="ccg-condorctest.isi.edu" scheduler="Condor" jobtype="auxillary" /&gt;
        
        &lt;!-- enable submission using condorc --&gt;
        &lt;profile namespace="pegasus" key="style"&gt;condorc&lt;/profile&gt;

        &lt;!-- specify which condor collector to use. 
             If not specified defaults to remote schedd specified in grid gateway --&gt;
        &lt;profile namespace="condor" key="condor_collector"&gt;condorc-collector.isi.edu&lt;/profile&gt;
        
        &lt;profile namespace="condor" key="should_transfer_files"&gt;Yes&lt;/profile&gt;
        &lt;profile namespace="condor" key="when_to_transfer_output"&gt;ON_EXIT&lt;/profile&gt;
        &lt;profile namespace="env" key="PEGASUS_HOME" &gt;/usr&lt;/profile&gt;
        &lt;profile namespace="condor" key="universe"&gt;vanilla&lt;/profile&gt;

    &lt;/site&gt;

&lt;/sitecatalog&gt;
</pre>
<p>To enable PegasusLite in CondorIO mode, users should set the
      following in their properties</p>
<pre class="programlisting"># pegasus properties
pegasus.data.configuration    condorio</pre>
</div>
</div>
<div class="section" title="6.3. Infrastructure Clouds">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="cloud"></a>6.3. Infrastructure Clouds</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="execution_environments.php#amazon_aws">6.3.1. Amazon EC2</a></span></dt>
<dt><span class="section"><a href="execution_environments.php#futuregrid_nimbus">6.3.2. FutureGrid</a></span></dt>
</dl></div>
<div class="figure">
<a name="concepts-fig-cloud-layout"></a><p class="title"><b>Figure 6.2. Cloud Sample Site Layout</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td align="center" valign="middle"><img src="images/fg-pwms-prefio.3.png" align="middle" height="360" alt="Cloud Sample Site Layout"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>This figure shows a sample environment for executing Pegasus on
    multiple clouds (known as "sky computing"). At this point, it is up to the
    user to provision the remote resources with a proper VM image that
    includes a Condor worker that is configured to report back to a Condor
    master outside the cloud.</p>
<p>In this discussion, the <span class="emphasis"><em>submit host</em></span> (SH) is
    located logically external to the cloud provider(s). The SH is the point
    where a user submits Pegasus workflows for execution. This site typically
    runs a Condor collector to gather resource announcements, or is part of a
    larger Condor pool that collects these announcements. Condor makes the
    remote resources available to the submit host's Condor
    installation.</p>
<p>The <a class="link" href="execution_environments.php#concepts-fig-cloud-layout" title="Figure 6.2. Cloud Sample Site Layout">figure above</a>
    shows the way Pegasus WMS is deployed in cloud computing resources,
    ignoring how these resources were provisioned. The provisioning request
    shows multiple resources per provisioning request.</p>
<p>The provisioning broker -- Nimbus, Eucalyptus or EC2 -- at the
    remote site is responsible to allocate and set up the resources. For a
    multi-node request, the worker nodes often require access to a form of
    shared data storage. Concretely, either a POSIX-compliant shared file
    system (e.g. NFS, PVFS) is available to the nodes, or can be brought up
    for the lifetime of the application workflow. The task steps of the
    application workflow facilitate shared file systems to exchange
    intermediary results between tasks on the same cloud site. Pegasus also
    supports an S3 data mode for the application workflow data staging.</p>
<p>The initial stage-in and final stage-out of application data into
    and out of the node set is part of any Pegasus-planned workflow. Several
    configuration options exist in Pegasus to deal with the dynamics of push
    and pull of data, and when to stage data. In many use-cases, some form of
    external access to or from the shared file system that is visible to the
    application workflow is required to facilitate successful data staging.
    However, Pegasus is prepared to deal with a set of boundary cases.</p>
<p>The data server in the figure is shown at the submit host. This is
    not a strict requirement. The data server for consumed data and data
    products may both be different and external to the submit host.</p>
<p>Once resources begin appearing in the pool managed by the submit
    machine&amp;rsquor;s Condor collector, the application workflow can be
    submitted to Condor. A Condor DAGMan will manage the application workflow
    execution. Pegasus run-time tools obtain timing-, performance and
    provenance information as the application workflow is executed. At this
    point, it is the user's responsibility to de-provision the allocated
    resources.</p>
<p>In the figure, the cloud resources on the right side are assumed to
    have uninhibited outside connectivity. This enables the Condor I/O to
    communicate with the resources. The right side includes a setup where the
    worker nodes use all private IP, but have out-going connectivity and a NAT
    router to talk to the internet. The <span class="emphasis"><em>Condor connection
    broker</em></span> (CCB) facilitates this setup almost effortlessly.</p>
<p>The left side shows a more difficult setup where the connectivity is
    fully firewalled without any connectivity except to in-site nodes. In this
    case, a proxy server process, the <span class="emphasis"><em>generic connection
    broker</em></span> (GCB), needs to be set up in the DMZ of the cloud site
    to facilitate Condor I/O between the submit host and worker nodes.</p>
<p>If the cloud supports data storage servers, Pegasus is starting to
    support workflows that require staging in two steps: Consumed data is
    first staged to a data server in the remote site's DMZ, and then a second
    staging task moves the data from the data server to the worker node where
    the job runs. For staging out, data needs to be first staged from the
    job's worker node to the site's data server, and possibly from there to
    another data server external to the site. Pegasus is capable to plan both
    steps: Normal staging to the site's data server, and the worker-node
    staging from and to the site's data server as part of the job.</p>
<div class="section" title="6.3.1. Amazon EC2">
<div class="titlepage"><div><div><h3 class="title">
<a name="amazon_aws"></a>6.3.1. Amazon EC2</h3></div></div></div>
<p>There are many different ways to set up an execution environment
      in Amazon EC2. The easiest way is to use a submit machine outside the
      cloud, and to provision several worker nodes and a file server node in
      the cloud as shown here:</p>
<div class="figure">
<a name="ec2"></a><p class="title"><b>Figure 6.3. Amazon EC2</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td align="center" valign="middle"><img src="images/ec2.png" align="middle" height="360" alt="Amazon EC2"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>The submit machine runs Pegasus and a Condor master (collector,
      schedd, negotiator). The workers run a Condor startd. And the file
      server node exports an NFS file system. The startd on the workers is
      configured to connect to the master running outside the cloud, and the
      workers also mount the NFS file system. More information on setting up
      Condor for this environment can be found at <a class="ulink" href="http://www.isi.edu/~gideon/condor-ec2/" target="_top">http://www.isi.edu/~gideon/condor-ec2</a>.</p>
<p>The site catalog entry for this configuration is similar to what
      you would create for running on a local <a class="link" href="execution_environments.php#condor_pool" title="6.2. Condor Pool">Condor pool</a> with a shared file
      system.</p>
</div>
<div class="section" title="6.3.2. FutureGrid">
<div class="titlepage"><div><div><h3 class="title">
<a name="futuregrid_nimbus"></a>6.3.2. FutureGrid</h3></div></div></div>
<p><a class="ulink" href="https://portal.futuregrid.org/" target="_top">FutureGrid</a> is
      a distributed testbed for cloud computing. There is a tutorial on how to
      run Pegasus on FutureGrid using the Nimbus cloud management system here:
      <a class="ulink" href="http://pegasus.isi.edu/futuregrid/tutorials/" target="_top">http://pegasus.isi.edu/futuregrid/tutorials</a></p>
</div>
</div>
<div class="section" title="6.4. Remote Cluster using Globus GRAM">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="globus_gram"></a>6.4. Remote Cluster using Globus GRAM</h2></div></div></div>
<div class="figure">
<a name="concepts-fig-site-layout"></a><p class="title"><b>Figure 6.4. Grid Sample Site Layout</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td align="center" valign="middle"><img src="images/concepts-site-layout.jpg" align="middle" height="360" alt="Grid Sample Site Layout"></td></tr></table></div></div>
</div>
<p><br class="figure-break"></p>
<p>A generic grid environment shown in the figure <a class="link" href="execution_environments.php#concepts-fig-site-layout" title="Figure 6.4. Grid Sample Site Layout">above</a>. We will work from the
    left to the right top, then the right bottom.</p>
<p>On the left side, you have a submit machine where Pegasus runs,
    Condor schedules jobs, and workflows are executed. We call it the
    <span class="emphasis"><em>submit host</em></span> (SH), though its functionality can be
    assumed by a virtual machine image. In order to properly communicate over
    secured channels, it is important that the submit machine has a proper
    notion of time, i.e. runs an NTP daemon to keep accurate time. To be able
    to connect to remote clusters and receive connections from the remote
    clusters, the submit host has a public IP address to facilitate this
    communication.</p>
<p>In order to send a job request to the remote cluster, Condor wraps
    the job into Globus calls via Condor-G. Globus uses GRAM to manage jobs on
    remote sites. In terms of a software stack, Pegasus wraps the job into
    Condor. Condor wraps the job into Globus. Globus transports the job to the
    remote site, and unwraps the Globus component, sending it to the remote
    site's <span class="emphasis"><em>resource manager</em></span> (RM).</p>
<p>To be able to communicate using the Globus security infrastructure
    (GSI), the submit machine needs to have the certificate authority (CA)
    certificates configured, requires a host certificate in certain
    circumstances, and the user a user certificate that is enabled on the
    remote site. On the remote end, the remote gatekeeper node requires a host
    certificate, all signing CA certificate chains and policy files, and a
    goot time source.</p>
<p>In a grid environment, there are one or more clusters accessible via
    grid middleware like the <a class="ulink" href="http://www.globus.org/" target="_top">Globus
    Toolkit</a>. In case of Globus, there is the Globus gatekeeper
    listening on TCP port 2119 of the remote cluster. The port is opened to a
    single machine called <span class="emphasis"><em>head node</em></span> (HN).The head-node is
    typically located in a de-militarized zone (DMZ) of the firewall setup, as
    it requires limited outside connectivity and a public IP address so that
    it can be contacted. Additionally, once the gatekeeper accepted a job, it
    passes it on to a jobmanager. Often, these jobmanagers require a limited
    port range, in the example TCP ports 40000-41000, to call back to the
    submit machine.</p>
<p>For the user to be able to run jobs on the remote site, the user
    must have some form of an account on the remtoe site. The user's grid
    identity is passed from the submit host. An entity called <span class="emphasis"><em>grid
    mapfile</em></span> on the gatekeeper maps the user's grid identity into a
    remote account. While most sites do not permit account sharing, it is
    possible to map multiple user certificates to the same account.</p>
<p>The gatekeeper is the interface through which jobs are submitted to
    the remote cluster's resource manager. A resource manager is a scheduling
    system like PBS, Maui, LSF, FBSNG or Condor that queues tasks and
    allocates worker nodes. The <span class="emphasis"><em>worker nodes</em></span> (WN) in the
    remote cluster might not have outside connectivity and often use all
    private IP addresses. The Globus toolkit requires a shared filesystem to
    properly stage files between the head node and worker nodes.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>The shared filesystem requirement is imposed by Globus. Pegasus is
      capable of supporting advanced site layouts that do not require a shared
      filesystem. Please contact us for details, should you require such a
      setup.</p>
</div>
<p>To stage data between external sites for the job, it is recommended
    to enable a GridFTP server. If a shared networked filesystem is involved,
    the GridFTP server should be located as close to the file-server as
    possible. The GridFTP server requires TCP port 2811 for the control
    channel, and a limited port range for data channels, here as an example
    the TPC ports from 40000 to 41000. The GridFTP server requires a host
    certificate, the signing CA chain and policy files, a stable time source,
    and a gridmap file that maps between a user's grid identify and the user's
    account on the remote site.</p>
<p>The GridFTP server is often installed on the head node, the same as
    the gatekeeper, so that they can share the grid mapfile, CA certificate
    chains and other setups. However, for performance purposes it is
    recommended that the GridFTP server has its own machine.</p>
<p>An example site catalog entry for a GRAM enabled site looks as
    follow in the site catalog</p>
<pre class="programlisting">&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;
      
     &lt;site handle="Trestles" arch="x86_64" os="LINUX"&gt;
        &lt;grid type="gt5" contact="trestles.sdsc.edu/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/&gt;
        &lt;grid type="gt5" contact="trestles.sdsc.edu/jobmanager-pbs" scheduler="unknown" jobtype="compute"/&gt;

        &lt;directory type="shared-scratch" path="/oasis/projects/nsf/USERNAME"&gt;
            &lt;file-server operation="all" url="gsiftp://trestles-dm1.sdsc.edu/oasis/projects/nsf/USERNAME"/&gt;
        &lt;/directory&gt;

        &lt;!-- specify the path to a PEGASUS WORKER INSTALL on the site --&gt;
        &lt;profile namespace="env" key="PEGASUS_HOME" &gt;/path/to/PEGASUS/INSTALL&lt;/profile&gt;
    &lt;/site&gt;


 &lt;/sitecatalog&gt;</pre>
</div>
<div class="section" title="6.5. Remote Cluster using CREAMCE">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="creamce_submission"></a>6.5. Remote Cluster using CREAMCE</h2></div></div></div>
<p><a class="ulink" href="https://wiki.italiangrid.it/twiki/bin/view/CREAM/FunctionalDescription" target="_top">CREAM</a>
    is a webservices based job submission front end for remote compute
    clusters. It can be viewed as a replaced for Globus GRAM and is mainly
    popular in Europe. It widely used in the Italian Grid.</p>
<p>In order to submit a workflow to compute site using the CREAMCE
    front end, the user needs to specify the following for the site in their
    site catalog</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus</strong></span> profile <span class="bold"><strong>style</strong></span> with value set to <span class="bold"><strong>cream</strong></span></p></li>
<li class="listitem"><p><span class="bold"><strong>grid gateway </strong></span>defined for the
        site with <span class="bold"><strong>contact</strong></span> attribute set to
        CREAMCE frontend and <span class="bold"><strong>scheduler</strong></span>
        attribute to remote scheduler.</p></li>
<li class="listitem"><p>a remote queue can be optionally specified using <span class="bold"><strong>globus</strong></span> profile <span class="bold"><strong>queue</strong></span> with value set to <span class="bold"><strong>queue-name</strong></span></p></li>
</ol></div>
<p>An example site catalog entry for a creamce site looks as follow in
    the site catalog</p>
<pre class="programlisting">&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;
      
    &lt;site  handle="creamce" arch="x86" os="LINUX"&gt;
        &lt;grid type="cream" contact="https://ce01-lcg.cr.cnaf.infn.it:8443/ce-cream/services/CREAM2" scheduler="LSF" jobtype="compute" /&gt;
        &lt;grid type="cream" contact="https://ce01-lcg.cr.cnaf.infn.it:8443/ce-cream/services/CREAM2" scheduler="LSF" jobtype="auxillary" /&gt;

        &lt;directory type="shared-scratch" path="/home/virgo034"&gt;
            &lt;file-server operation="all" url="gsiftp://ce01-lcg.cr.cnaf.infn.it/home/virgo034"/&gt;
        &lt;/directory&gt;                          
                                                                                                                                                                                                                                                   
        &lt;profile namespace="pegasus" key="style"&gt;cream&lt;/profile&gt;
        &lt;profile namespace="globus" key="queue"&gt;virgo&lt;/profile&gt;
    &lt;/site&gt;

 &lt;/sitecatalog&gt;</pre>
<p>The pegasus distribution comes with creamce examples in the examples
    directory. They can be used as a starting point to configure your
    setup.</p>
<div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Tip</h3>
<p>Usually , the CREAMCE frontends accept VOMS generated user proxies
      using the command voms-proxy-init . Steps on generating a VOMS proxy are
      listed in the CREAM User Guide <a class="ulink" href="https://wiki.italiangrid.it/twiki/bin/view/CREAM/UserGuide#1_1_Before_starting_get_your_use" target="_top">here</a>
      .</p>
</div>
</div>
<div class="section" title="6.6. Local Cluster Using Glite">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="glite"></a>6.6. Local Cluster Using Glite</h2></div></div></div>
<div class="toc"><dl><dt><span class="section"><a href="execution_environments.php#idp18402832">6.6.1. Changes to Jobs</a></span></dt></dl></div>
<p>This section describes the various changes required in the site
    catalog for Pegasus to generate an executable workflow that uses gLite
    blahp to directly submit to PBS on the local machine. This mode of
    submission should only be used when the condor on the submit host can
    directly talk to scheduler running on the cluster. It is recommended that
    the cluster that gLite talks to is designated as a separate compute site
    in the Pegasus site catalog. To tag a site as a gLite site the following
    two profiles need to be specified for the site in the site catalog</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus</strong></span> profile <span class="bold"><strong>style</strong></span> with value set to <span class="bold"><strong>glite</strong></span>.</p></li>
<li class="listitem"><p><span class="bold"><strong>condor</strong></span> profile <span class="bold"><strong>grid_resource</strong></span> with value set to <span class="bold"><strong>pbs|lsf</strong></span></p></li>
</ol></div>
<p>An example site catalog entry for a glite site looks as follows in
    the site catalog</p>
<pre class="programlisting">
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;
    
    &lt;site  handle="local" arch="x86" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/lfs/shared-scratch/glite-sharedfs-example/work"&gt;
            &lt;file-server operation="all" url="file:///lfs/local-scratch/glite-sharedfs-example/work"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/shared-scratch//glite-sharedfs-example/outputs"&gt;
            &lt;file-server operation="all" url="file:///lfs/local-scratch/glite-sharedfs-example/outputs"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

    &lt;site  handle="local-pbs" arch="x86" os="LINUX"&gt;
        
        &lt;!-- the following is a shared directory shared amongst all the nodes in the cluster --&gt;
        &lt;directory type="shared-scratch" path="/lfs/glite-sharedfs-example/local-pbs/shared-scratch"&gt;
            &lt;file-server operation="all" url="file:///lfs/glite-sharedfs-example/local-pbs/shared-scratch"/&gt;
        &lt;/directory&gt;

        &lt;profile namespace="env" key="PEGASUS_HOME"&gt;/lfs/software/pegasus/pegasus-4.2.0&lt;/profile&gt;

        &lt;profile namespace="pegasus" key="style" &gt;glite&lt;/profile&gt;
        &lt;profile namespace="pegasus" key="change.dir"&gt;true&lt;/profile&gt;

        &lt;profile namespace="condor" key="grid_resource"&gt;pbs&lt;/profile&gt;
        &lt;profile namespace="condor" key="batch_queue"&gt;batch&lt;/profile&gt;
        &lt;profile namespace="globus" key="maxwalltime"&gt;30000&lt;/profile&gt;
    &lt;/site&gt;


&lt;/sitecatalog&gt;

</pre>
<div class="tip" title="Tip" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Tip</h3>
<p>Starting 4.2.1 , in the examples directory you can find a glite
      shared filesystem example that you can use to test out this
      configuration</p>
</div>
<div class="section" title="6.6.1. Changes to Jobs">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp18402832"></a>6.6.1. Changes to Jobs</h3></div></div></div>
<p>As part of applying the style to the job, this style adds the
      following classads expressions to the job description.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>+remote_queue - value picked up from globus profile
          queue</p></li>
<li class="listitem"><p>+remote_cerequirements - See below</p></li>
</ol></div>
<div class="section" title="6.6.1.1. Remote CE Requirements">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp18406896"></a>6.6.1.1. Remote CE Requirements</h4></div></div></div>
<p>The remote CE requirements are constructed from the following
        profiles associated with the job. The profiles for a job are derived
        from various sources</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>transformation catalog</p></li>
<li class="listitem"><p>site catalog</p></li>
<li class="listitem"><p>DAX</p></li>
<li class="listitem"><p>user properties</p></li>
</ol></div>
<p>The following globus profiles if associated with the job are
        picked up and translated to corresponding glite key</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>hostcount -&gt; PROCS</p></li>
<li class="listitem"><p>count -&gt; NODES</p></li>
<li class="listitem"><p>maxwalltime -&gt; WALLTIME</p></li>
</ol></div>
<p>The following condor profiles if associated with the job are
        picked up and translated to corresponding glite key</p>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>priority -&gt; PRIORITY</p></li></ol></div>
<p>All the env profiles are translated to MYENV</p>
<p>The remote_cerequirements expression is constructed on the basis
        of the profiles associated with job . An example
        +remote_cerequirements classad expression in the submit file is listed
        below</p>
<pre class="programlisting"><span class="bold"><strong>+remote_cerequirements = "PROCS==18 &amp;&amp; NODES==1 &amp;&amp; PRIORITY==10 &amp;&amp; WALLTIME==3600 \
   &amp;&amp; PASSENV==1 &amp;&amp; JOBNAME==\"TEST JOB\" &amp;&amp; MYENV ==\"JAVA_HOME=/bin/java,APP_HOME=/bin/app\""</strong></span></pre>
</div>
<div class="section" title="6.6.1.2. Specifying directory for the jobs">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp18421984"></a>6.6.1.2. Specifying directory for the jobs</h4></div></div></div>
<p>gLite blahp does not follow the remote_initialdir or initialdir
        classad directives. Hence, all the jobs that have this style applied
        don't have a remote directory specified in the submit directory.
        Instead, Pegasus relies on kickstart to change to the working
        directory when the job is launched on the remote node.</p>
</div>
</div>
</div>
<div class="section" title="6.7. Remote Cluster using BOSCO and SSH submissions">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="idp18424000"></a>6.7. Remote Cluster using BOSCO and SSH submissions</h2></div></div></div>
<p><a class="ulink" href="http://bosco.opensciencegrid.org/about/" target="_top">BOSCO</a>
    enables users to submit jobs to remote clusters using SSH. This
    section describes how to specify a site catalog entry for a site to
    which jobs can be submitted over SSH. To tag a site for SSH submission,
    the following profiles need to be specified for the site in the site
    catalog:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p><span class="bold"><strong>pegasus</strong></span> profile <span class="bold"><strong>style</strong></span> with value set to <span class="bold"><strong>ssh</strong></span></p></li>
<li class="listitem"><p>Specify the service information as grid gateways. This should
        match what Bosco provided when the cluster was set up.</p></li>
</ol></div>
<p>An example site catalog entry for a BOSCO site looks as follows in
    the site catalog</p>
<pre class="programlisting">
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;
    
    &lt;site  handle="USC_HPCC_Bosco" arch="x86_64" os="LINUX"&gt;

        &lt;!-- Specify the service information as grid gateways. This should match what Bosco provided when the cluster
             was set up. --&gt;
        &lt;grid type="batch" contact="username@hpc-login2.usc.edu" scheduler="PBS" jobtype="compute"/&gt;
        &lt;grid type="batch" contact="username@hpc-login2.usc.edu" scheduler="PBS" jobtype="auxillary"/&gt;

        &lt;!-- Scratch directory on the cluster --&gt;
        &lt;directory type="shared-scratch" path="/home/rcf-40/tmp"&gt;
            &lt;file-server operation="all" url="scp://username@hpc-login2.usc.edu/home/rcf-40/tmp"/&gt;
        &lt;/directory&gt;

        &lt;!-- SSH is the style to use for Bosco SSH submits --&gt;
        &lt;profile namespace="pegasus" key="style"&gt;ssh&lt;/profile&gt;

        &lt;!-- Bosco is using the grid universe, which means the globus
             namespace can be used to control the jobs --&gt;
        &lt;profile namespace="globus" key="queue"&gt;default&lt;/profile&gt;
        &lt;profile namespace="globus" key="maxwalltime"&gt;30&lt;/profile&gt;

    &lt;/site&gt;



&lt;/sitecatalog&gt;

</pre>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>It is recommended to have a submit node configured either as a
      BOSCO submit node or a vanilla HTCondor node. You cannot have HTCondor
      configured both as a BOSCO install and a traditional HTCondor submit
      node at the same time as BOSCO will override the traditional HTCondor
      pool in the user environment.</p>
</div>
<p>Starting 4.3 there is a bosco-shared-fs example in the examples
    directory of the distribution.</p>
</div>
<div class="section" title="6.8. Campus Cluster">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="campus_cluster"></a>6.8. Campus Cluster</h2></div></div></div>
<p>There are almost as many different configurations of campus clusters
    as there are campus clusters, and because of that it can be hard to
    determine what the best way to run Pegasus workflows. Below is a ordered
    checklist with some ideas we have collected from working with users in the
    past:</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>If the cluster scheduler is Condor, please see the Condor Pool
        section.</p></li>
<li class="listitem"><p>If the cluster is Globus GRAM enabled, see the Globus GRAM
        section. If you have have a lot of short jobs, also read the Glidein
        section.</p></li>
<li class="listitem"><p>For clusters without GRAM, you might be able to do glideins. If
        outbound network connectivity is allowed, your submit host can be
        anywhere. If the cluster is setup to not allow any network connections
        to the outside, you will probably have to run the submit host inside
        the cluster as well.</p></li>
</ol></div>
<p>If the cluster you are trying to use is not fitting any of the above
    scenarios, please post to the <a class="ulink" href="http://pegasus.isi.edu/support" target="_top">Pegasus users mailing list</a>
    and we will help you find a solution.</p>
</div>
<div class="section" title="6.9. XSEDE">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="xsede"></a>6.9. XSEDE</h2></div></div></div>
<p>The <a class="ulink" href="https://www.xsede.org/" target="_top">Extreme Science and
    Engineering Discovery Environment (XSEDE)</a> provides a set of High
    Performance Computing (HPC) and High Throughput Computing (HTC)
    resources.</p>
<p>For the HPC resources, it is recommended to run using <a class="link" href="execution_environments.php#globus_gram" title="6.4. Remote Cluster using Globus GRAM">Globus GRAM</a> or <a class="link" href="execution_environments.php#glideins" title="6.2.1. Glideins">glideins</a>. Most of these resources have fast
    parallel file systesm, so running with <a class="link" href="running_workflows.php#data_staging_configuration" title="5.3. Data Staging Configuration">sharedfs data staging</a> is
    recommended. Below is example site catalog and pegasusrc to run on <a class="ulink" href="http://www.sdsc.edu/us/resources/trestles/" target="_top">SDSC
    Trestles</a>:</p>
<pre class="programlisting">
&lt;sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0"&gt;
      
    &lt;site  handle="local" arch="x86_64" os="LINUX"&gt;
        &lt;directory type="shared-scratch" path="/tmp/wf/work"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/work"/&gt;
        &lt;/directory&gt;
        &lt;directory type="local-storage" path="/tmp/wf/storage"&gt;
            &lt;file-server operation="all" url="file:///tmp/wf/storage"/&gt;
        &lt;/directory&gt;
    &lt;/site&gt;

    &lt;site handle="Trestles" arch="x86_64" os="LINUX"&gt;
       &lt;grid type="gt5" contact="trestles.sdsc.edu:2119/jobmanager-fork" scheduler="PBS" jobtype="auxillary"/&gt;
       &lt;grid type="gt5" contact="trestles.sdsc.edu:2119/jobmanager-pbs" scheduler="PBS" jobtype="compute"/&gt;
       &lt;directory type="shared-scratch" path="/phase1/USERNAME"&gt;
           &lt;file-server operation="all" url="gsiftp://trestles-dm1.sdsc.edu/phase1/USERNAME"/&gt;
       &lt;/directory&gt;
    &lt;/site&gt;

&lt;/sitecatalog&gt;
</pre>
<p>pegasusrc:</p>
<pre class="programlisting">pegasus.catalog.replica=SimpleFile
pegasus.catalog.replica.file=rc

pegasus.catalog.site.file=sites.xml

pegasus.catalog.transformation=Text
pegasus.catalog.transformation.file=tc

pegasus.data.configuration = sharedfs

# Pegasus might not be installed, or be of a different version
# so stage the worker package
pegasus.transfer.worker.package = true
</pre>
<p>The HTC resources available on XSEDE are all Condor based, so
    standard <a class="link" href="execution_environments.php#condor_pool" title="6.2. Condor Pool">Condor Pool</a> setup will work
    fine.</p>
<p>If you need to run high throughput workloads on the HPC machines
    (for example, post processing after a large parallel job), <a class="link" href="execution_environments.php#glideins" title="6.2.1. Glideins">glideins</a> can be useful as it is a more efficient
    method for small jobs on these systems.</p>
</div>
<div class="section" title="6.10. Open Science Grid Using glideinWMS">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="open_science_grid"></a>6.10. Open Science Grid Using glideinWMS</h2></div></div></div>
<div class="toc"><dl><dt><span class="section"><a href="execution_environments.php#idp7829936">6.10.1. </a></span></dt></dl></div>
<div class="section" title="6.10.1. ">
<div class="titlepage"></div>
<p><a class="ulink" href="http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/" target="_top">glideinWMS</a>
      is a glidein system widely used on Open Science Grid. Running on top of
      glideinWMS is like running on a <a class="link" href="execution_environments.php#condor_pool" title="6.2. Condor Pool">Condor
      Pool</a> without a shared filesystem.</p>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="running_workflows.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="submit_directory.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 5. Running Workflows </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> Chapter 7. Submit Directory Details</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
