<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="index.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="workflow_gallery.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter" title="Chapter 1. Introduction">
<div class="titlepage"><div><div><h2 class="title">
<a name="about"></a>Chapter 1. Introduction</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="about.php#overview">1.1. Overview and Features</a></span></dt>
<dt><span class="section"><a href="workflow_gallery.php">1.2. Workflow Gallery</a></span></dt>
<dt><span class="section"><a href="about_document.php">1.3. About this Document</a></span></dt>
</dl></div>
<div class="section" title="1.1. Overview and Features">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="overview"></a>1.1. Overview and Features</h2></div></div></div>
<p><a class="ulink" href="http://pegasus.isi.edu" target="_top">Pegasus WMS</a> is a
    configurable system for mapping and executing abstract application
    workflows over a wide range of execution environment including a laptop, a
    campus cluster, a Grid, or a commercial or academic cloud. Today, Pegasus
    runs workflows on Amazon EC2, Nimbus, Open Science Grid, the TeraGrid, and
    many campus clusters. One workflow can run on a single system or across a
    heterogeneous set of resources. Pegasus can run workflows ranging from
    just a few computational tasks up to 1 million.</p>
<p>Pegasus WMS bridges the scientific domain and the execution
    environment by automatically mapping high-level workflow descriptions onto
    distributed resources. It automatically locates the necessary input data
    and computational resources necessary for workflow execution. Pegasus
    enables scientists to construct workflows in abstract terms without
    worrying about the details of the underlying execution environment or the
    particulars of the low-level specifications required by the middleware
    (Condor, Globus, or Amazon EC2). Pegasus WMS also bridges the current
    cyberinfrastructure by effectively coordinating multiple distributed
    resources. The input to Pegasus is a description of the abstract workflow
    in XML format.</p>
<p>Pegasus allows researchers to translate complex computational tasks
    into workflows that link and manage ensembles of dependent tasks and
    related data files. Pegasus automatically chains dependent tasks together,
    so that a single scientist can complete complex computations that once
    required many different people. New users are encouraged to explore the
    <a class="link" href="tutorial.php" title="Chapter 2. Tutorial">tutorial chapter</a> to become
    familiar with how to operate Pegasus for their own workflows. Users create
    and run a sample project to demonstrate Pegasus capabilities. Users can
    also browse the <a class="link" href="useful_tips.php" title="Chapter 16. Useful Tips">Useful Tips</a> chapter to
    aid them in designing their workflows.</p>
<p>Pegasus has a number of features that contribute to its useability
    and effectiveness.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
<p><span class="bold"><strong>Portability / Reuse</strong></span></p>
<p>User created workflows can easily be run in different
        environments without alteration. Pegasus currently runs workflows on
        top of Condor, Grid infrastrucutures such as Open Science Grid and
        TeraGrid, Amazon EC2, Nimbus, and many campus clusters. The same
        workflow can run on a single system or across a heterogeneous set of
        resources.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Performance</strong></span></p>
<p>The Pegasus mapper can reorder, group, and prioritize tasks in
        order to increase the overall workflow performance.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Scalability</strong></span></p>
<p>Pegasus can easily scale both the size of the workflow, and the
        resources that the workflow is distributed over. Pegasus runs
        workflows ranging from just a few computational tasks up to 1 million.
        The number of resources involved in executing a workflow can scale as
        needed without any impediments to performance.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Provenance</strong></span></p>
<p>By default, all jobs in Pegasus are launched via the <span class="bold"><strong>kickstart</strong></span> process that captures runtime
        provenance of the job and helps in debugging. The provenance data is
        collected in a database, and the data can be summaries with tools such
        as <span class="bold"><strong>pegasus-statistics</strong></span>, <span class="bold"><strong>pegasus-plots</strong></span>, or directly with SQL
        queries.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Data Management</strong></span></p>
<p>Pegasus handles replica selection, data transfers and output
        registrations in data catalogs. These tasks are added to a workflow as
        auxilliary jobs by the Pegasus planner.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Reliability</strong></span></p>
<p>Jobs and data transfers are automatically retried in case of
        failures. Debugging tools such as <span class="bold"><strong>pegasus-analyzer</strong></span> helps the user to debug the
        workflow in case of non-recoverable failures.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Error Recovery</strong></span></p>
<p>When errors occur, Pegasus tries to recover when possible by
        retrying tasks, by retrying the entire workflow, by providing
        workflow-level checkpointing, by re-mapping portions of the workflow,
        by trying alternative data sources for staging data, and, when all
        else fails, by providing a rescue workflow containing a description of
        only the work that remains to be done. It cleans up storage as the
        workflow is executed so that data-intensive workflows have enough
        space to execute on storage-constrained resource. Pegasus keeps track
        of what has been done (provenance) including the locations of data
        used and produced, and which software was used with which
        parameters.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>Operating Environments</strong></span></p>
<p>Pegasus workflows can be deployed across a variety of
        environments:</p>
<div class="itemizedlist"><ul class="itemizedlist" type="circle">
<li class="listitem">
<p><span class="emphasis"><em>Local Execution</em></span></p>
<p>Pegasus can run a workflow on a single computer with
            Internet access. Running in a local environment is quicker to
            deploy as the user does not need to gain access to muliple
            resources in order to execute a workfow.</p>
</li>
<li class="listitem">
<p><span class="emphasis"><em>Condor Pools and Glideins</em></span></p>
<p>Condor is a specialized workload management system for
            compute-intensive jobs. Condor queues workflows, schedules, and
            monitors the execution of each workflow. Condor Pools and Glideins
            are tools for submitting and executing the Condor daemons on a
            Globus resource. As long as the daemons continue to run, the
            remote machine running them appears as part of your Condor pool.
            For a more complete description of Condor, see the <a class="ulink" href="http://www.cs.wisc.edu/condor/description.html" target="_top">Condor
            Project Pages </a></p>
</li>
<li class="listitem">
<p><span class="emphasis"><em>Grids</em></span></p>
<p>Pegasus WMS is entirely compatible with Grid computing. Grid
            computing relies on the concept of distributed computations.
            Pegasus apportions pieces of a workflow to run on distributed
            resources.</p>
</li>
<li class="listitem">
<p><span class="emphasis"><em>Clouds</em></span></p>
<p>Cloud computing uses a network as a means to connect a
            Pegasus end user to distributed resources that are based in the
            cloud.</p>
</li>
</ul></div>
</li>
</ul></div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="index.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="workflow_gallery.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Pegasus 4.5.2 User Guide </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 1.2. Workflow Gallery</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
