<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="data_cleanup.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="job_clustering.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter">
<div class="titlepage"><div><div><h1 class="title">
<a name="optimization"></a>Chapter 10. Optimizing Workflows for Efficiency and Scalability</h1></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="optimization.php#short_jobs">10.1. Optimizing Short Jobs / Scheduling Delays</a></span></dt>
<dt><span class="section"><a href="job_clustering.php">10.2. Job Clustering</a></span></dt>
<dt><span class="section"><a href="large_workflows.php">10.3. How to Scale Large Workflows</a></span></dt>
<dt><span class="section"><a href="hierarchial_workflows.php">10.4. Hierarchical Workflows</a></span></dt>
<dt><span class="section"><a href="data_transfers.php">10.5. Optimizing Data Transfers</a></span></dt>
<dt><span class="section"><a href="job_throttling.php">10.6. Job Throttling</a></span></dt>
</dl></div>
<p>By default, Pegasus generates workflows which targets the most common
  usecases and execution environments. For more specialized environments or
  workflows, the following sections can provide hints on how to optimize your
  workflow to scale better, and run more efficient. Below are some common
  issues and solutions.</p>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="short_jobs"></a>10.1. Optimizing Short Jobs / Scheduling Delays</h2></div></div></div>
<p><span class="emphasis"><em>Issue:</em></span> Even though HTCondor is a high
    throughput system, there are overheads when scheduling short jobs. Common
    overheads include scheduling, data transfers, state notifications, and
    task book keeping. These overheads can be very noticeable for short jobs,
    but not noticeable at all for longer jobs as the ration between the
    computation and the overhead is higher.</p>
<p><span class="emphasis"><em>Solution:</em></span> If you have many short tasks to run,
    the solution to minimize the overheads is to use <a class="link" href="job_clustering.php" title="10.2. Job Clustering">task clustering</a>. This instructs Pegasus to
    take a set of tasks, selected <a class="link" href="job_clustering.php#horizontal_clustering" title="10.2.1.1.1. Horizontal Clustering">horizontally</a>, by <a class="link" href="job_clustering.php#label_clustering" title="10.2.1.1.3. Label Clustering">labels</a>, or by <a class="link" href="job_clustering.php#runtime_clustering" title="10.2.1.1.2. Runtime Clustering">runtime</a>, and create jobs containing
    that whole set of tasks. The result is more efficient jobs, for wich the
    overheads are less noticeable.</p>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="data_cleanup.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="job_clustering.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">9.5. Data Cleanup </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 10.2. Job Clustering</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
