<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="optimization.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="large_workflows.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="job_clustering"></a>10.2. Job Clustering</h2></div></div></div>
<div class="toc"><dl class="toc"><dt><span class="section"><a href="job_clustering.php#idp68222048">10.2.1. Overview</a></span></dt></dl></div>
<p>A large number of workflows executed through the Pegasus Workflow
    Management System, are composed of several jobs that run for only a few
    seconds or so. The overhead of running any job on the grid is usually 60
    seconds or more. Hence, it makes sense to cluster small independent jobs
    into a larger job. This is done while mapping an abstract workflow to an
    executable workflow. Site specific or transformation specific criteria are
    taken into consideration while clustering smaller jobs into a larger job
    in the executable workflow. The user is allowed to control the granularity
    of this clustering on a per transformation per site basis.</p>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp68222048"></a>10.2.1. Overview</h3></div></div></div>
<p>The abstract workflow is mapped onto the various sites by the Site
      Selector. This semi executable workflow is then passed to the clustering
      module. The clustering of the workflow can be either be</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>level based horizontal clustering - where you can denote how
          many jobs get clustered into a single clustered job per level, or
          how many clustered jobs should be created per level of the
          workflow</p></li>
<li class="listitem"><p>level based runtime clustering - similar to horizontal
          clustering , but while creating the clusters per level take into
          account the job runtimes.</p></li>
<li class="listitem"><p>label based (label clustering)</p></li>
</ul></div>
<p>The clustering module clusters the jobs into larger/clustered
      jobs, that can then be executed on the remote sites. The execution can
      either be sequential on a single node or on multiple nodes using MPI. To
      specify which clustering technique to use the user has to pass the
      <span class="bold"><strong>--cluster</strong></span> option to <span class="bold"><strong>pegasus-plan</strong></span> .</p>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp68215728"></a>10.2.1.1. Generating Clustered Executable Workflow</h4></div></div></div>
<p>The clustering of a workflow is activated by passing the
        <span class="bold"><strong>--cluster|-C</strong></span> option to <span class="bold"><strong>pegasus-plan</strong></span>. The clustering granularity of a
        particular logical transformation on a particular site is dependant
        upon the clustering techniques being used. The executable that is used
        for running the clustered job on a particular site is determined as
        explained in section 7.</p>
<pre class="programlisting">#Running pegasus-plan to generate clustered workflows

$ pegasus-plan --dax example.dax --dir ./dags -p siteX --output local
               --cluster [comma separated list of clustering techniques]  -verbose

Valid clustering techniques are horizontal and label.</pre>
<p>The naming convention of submit files of the clustered jobs
        is<span class="bold"><strong> merge_NAME_IDX.sub</strong></span> . The NAME is
        derived from the logical transformation name. The IDX is an integer
        number between 1 and the total number of jobs in a cluster. Each of
        the submit files has a corresponding input file, following the naming
        convention <span class="bold"><strong>merge_NAME_IDX.in </strong></span>. The
        input file contains the respective execution targets and the arguments
        for each of the jobs that make up the clustered job.</p>
<div class="section">
<div class="titlepage"><div><div><h5 class="title">
<a name="horizontal_clustering"></a>10.2.1.1.1. Horizontal Clustering</h5></div></div></div>
<p>In case of horizontal clustering, each job in the workflow is
          associated with a level. The levels of the workflow are determined
          by doing a modified Breadth First Traversal of the workflow starting
          from the root nodes. The level associated with a node, is the
          furthest distance of it from the root node instead of it being the
          shortest distance as in normal BFS. For each level the jobs are
          grouped by the site on which they have been scheduled by the Site
          Selector. Only jobs of same type (txnamespace, txname, txversion)
          can be clustered into a larger job. To use horizontal clustering the
          user needs to set the <span class="bold"><strong>--cluster</strong></span>
          option of <span class="bold"><strong>pegasus-plan to
          horizontal</strong></span> .</p>
<div class="section">
<div class="titlepage"><div><div><h6 class="title">
<a name="idp68206464"></a>10.2.1.1.1.1. Controlling Clustering Granularity</h6></div></div></div>
<p>The number of jobs that have to be clustered into a single
            large job, is determined by the value of two parameters associated
            with the smaller jobs. Both these parameters are specified by the
            use of a PEGASUS namespace profile keys. The keys can be specified
            at any of the placeholders for the profiles (abstract
            transformation in the DAX, site in the site catalog,
            transformation in the transformation catalog). The normal
            overloading semantics apply i.e. profile in transformation catalog
            overrides the one in the site catalog and that in turn overrides
            the one in the DAX. The two parameters are described below.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="bold"><strong>clusters.size
                factor</strong></span></p>
<p>The clusters.size factor denotes how many jobs need to
                be merged into a single clustered job. It is specified via the
                use of a PEGASUS namespace profile key
                &amp;ldquo;clusters.size&amp;rdquor;. for e.g. if at a
                particular level, say 4 jobs referring to logical
                transformation B have been scheduled to a siteX. The
                clusters.size factor associated with job B for siteX is say 3.
                This will result in 2 clustered jobs, one composed of 3 jobs
                and another of 2 jobs. The clusters.size factor can be
                specified in the transformation catalog as follows</p>
<pre class="programlisting"># multiple line text-based transformation catalog: 2014-09-30T16:05:01.731-07:00
tr B {
        site siteX {
                profile pegasus "clusters.size" "3" 
                pfn "/shared/PEGASUS/bin/jobB"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

tr C {
        site siteX {
                profile pegasus "clusters.size" "2" 
                pfn "/shared/PEGASUS/bin/jobC"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

</pre>
<div class="figure">
<a name="idp68203088"></a><p class="title"><b>Figure 10.1. Clustering by clusters.size</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><img src="images/advanced-clustering-1.png" align="middle" height="750" alt="Clustering by clusters.size"></div></div>
</div>
<br class="figure-break">
</li>
<li class="listitem">
<p><span class="bold"><strong>clusters.num
                factor</strong></span></p>
<p>The clusters.num factor denotes how many clustered jobs
                does the user want to see per level per site. It is specified
                via the use of a PEGASUS namespace profile key
                &amp;ldquo;clusters.num&amp;rdquor;. for e.g. if at a
                particular level, say 4 jobs referring to logical
                transformation B have been scheduled to a siteX. The
                &amp;ldquo;clusters.num&amp;rdquor; factor associated with job
                B for siteX is say 3. This will result in 3 clustered jobs,
                one composed of 2 jobs and others of a single job each. The
                clusters.num factor in the transformation catalog can be
                specified as follows</p>
<pre class="programlisting"># multiple line text-based transformation catalog: 2014-09-30T16:06:23.397-07:00
tr B {
        site siteX {
                profile pegasus "clusters.num" "3" 
                pfn "/shared/PEGASUS/bin/jobB"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

tr C {
        site siteX {
                profile pegasus "clusters.num" "2" 
                pfn "/shared/PEGASUS/bin/jobC"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

</pre>
<p>In the case, where both the factors are associated with
                the job, the clusters.num value supersedes the clusters.size
                value.</p>
<pre class="programlisting"># multiple line text-based transformation catalog: 2014-09-30T16:08:01.537-07:00
tr B {
        site siteX {
                profile pegasus "clusters.num" "3" 
                profile pegasus "clusters.size" "3" 
                pfn "/shared/PEGASUS/bin/jobB"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}
</pre>
<p>In the above case the jobs referring to logical
                transformation B scheduled on siteX will be clustered on the
                basis of &amp;ldquo;clusters.num&amp;rdquor; value. Hence, if
                there are 4 jobs referring to logical transformation B
                scheduled to siteX, then 3 clustered jobs will be
                created.</p>
<div class="figure">
<a name="idp68194656"></a><p class="title"><b>Figure 10.2. Clustering by clusters.num</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><img src="images/advanced-clustering-2.png" align="middle" height="750" alt="Clustering by clusters.num"></div></div>
</div>
<br class="figure-break">
</li>
</ul></div>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h5 class="title">
<a name="runtime_clustering"></a>10.2.1.1.2. Runtime Clustering</h5></div></div></div>
<p>Workflows often consist of jobs of same type, but have varying
          run times. Two or more instances of the same job, with varying
          inputs can differ significantly in their runtimes. A simple way to
          think about this is running the same program on two distinct input
          sets, where one input is smaller (1 MB) as compared to the other
          which is 10 GB in size. In such a case the two jobs will having
          significantly differing run times. When such jobs are clustered
          using horizontal clustering, the benefits of job clustering may be
          lost if all smaller jobs get clustered together, while the larger
          jobs are clustered together. In such scenarios it would be
          beneficial to be able to cluster jobs together such that all
          clustered jobs have similar runtimes.</p>
<p>In case of runtime clustering, jobs in the workflow are
          associated with a level. The levels of the workflow are determined
          in the same manner as in horizontal clustering. For each level the
          jobs are grouped by the site on which they have been scheduled by
          the Site Selector. Only jobs of same type (txnamespace, txname,
          txversion) can be clustered into a larger job. To use runtime
          clustering the user needs to set the <span class="bold"><strong>--cluster</strong></span> option of <span class="bold"><strong>pegasus-plan to horizontal</strong></span>, and set the
          Pegasus property <span class="bold"><strong>pegasus.clusterer.preference</strong></span> to <span class="bold"><strong>Runtime</strong></span>.</p>
<p>Runtime clustering supports two modes of operation.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
<p>Clusters jobs together such that the clustered job's
              runtime does not exceed a user specified maxruntime.</p>
<p>Basic Algorithm of grouping jobs into clusters is as
              follows</p>
<pre class="programlisting">// cluster.maxruntime - Is the maximum runtime for which the clustered job should run.
// j.runtime - Is the runtime of the job j.
1. Create a set of jobs of the same type (txnamespace, txname, txversion), and that run on the same site.
2. Sort the jobs in decreasing order of their runtime.
3. For each job j, repeat
  a. If j.runtime &gt; cluster.maxruntime then 
        ignore j.
  // Sum of runtime of jobs already in the bin + j.runtime &lt;= cluster.maxruntime
  b. If j can be added to any existing bin (clustered job) then 
        Add j to bin
     Else
        Add a new bin
        Add job j to newly added bin</pre>
<p>The runtime of a job, and
              the maximum runtime for which a clustered jobs should run is
              determined by the value of two parameters associated with the
              jobs.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="bold"><strong>runtime</strong></span></p>
<p>expected runtime for a job</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>clusters.maxruntime</strong></span></p>
<p>maxruntime for the clustered job i.e. Group as many
                    jobs as possible into a cluster, as long as the clustered
                    jobs' runtime does not exceed clusters.maxruntime.</p>
</li>
</ul></div>
</li>
<li class="listitem">
<p>Clusters all the into a fixed number of clusters
              (clusters.num), such that the runtimes of the clustered jobs are
              similar.</p>
<p>Basic Algorithm of grouping jobs into clusters is as
              follows</p>
<pre class="programlisting">// cluster.num - Is the number of clustered jobs to create.
// j.runtime - Is the runtime of the job j.
1. Create a set of jobs of the same type (txnamespace, txname, txversion), and that run on the same site.
2. Sort the jobs in decreasing order of their runtime.
3. Create a heap containing clusters.num number of clustered jobs.
4. For each job j, repeat
  a. Get cluster job cj, having the shortest runtime
  b. Add job j to clustered job cj </pre>
<p>The runtime of a job, and the number of clustered jobs to
              create is determined by the value of two parameters associated
              with the jobs.</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="bold"><strong>runtime</strong></span></p>
<p>expected runtime for a job</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>clusters.num</strong></span></p>
<p>clusters.num factor denotes how many clustered jobs
                  does the user want to see per level per site</p>
</li>
</ul></div>
</li>
</ol></div>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>Users should either specify clusters.maxruntime or
              clusters.num. If both of them are specified, then clusters.num
              profile will be ignored by the clustering engine.</p>
</div>
<p>All of these parameters are specified by the use of a PEGASUS
          namespace profile keys. The keys can be specified at any of the
          placeholders for the profiles (abstract transformation in the DAX,
          site in the site catalog, transformation in the transformation
          catalog). The normal overloading semantics apply i.e. profile in
          transformation catalog overrides the one in the site catalog and
          that in turn overrides the one in the DAX. The two parameters are
          described below.</p>
<pre class="programlisting"># multiple line text-based transformation catalog: 2014-09-30T16:09:40.610-07:00
#Cluster all jobs of type B at siteX, into 2 clusters such that the 2 clusters have similar runtimes
tr B {
        site siteX {
                profile pegasus "clusters.num" "2" 
                profile pegasus "runtime" "100" 
                pfn "/shared/PEGASUS/bin/jobB"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

#Cluster all jobs of type C at siteX, such that the duration of the clustered job does not exceed 300.
tr C {
        site siteX {
                profile pegasus "clusters.maxruntime" "300" 
                profile pegasus "runtime" "100" 
                pfn "/shared/PEGASUS/bin/jobC"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

</pre>
<div class="figure">
<a name="idp68166880"></a><p class="title"><b>Figure 10.3. Clustering by runtime</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><img src="images/advanced-clustering-5.png" align="middle" height="750" alt="Clustering by runtime"></div></div>
</div>
<br class="figure-break"><p>In the above case the jobs referring to logical transformation
          B scheduled on siteX will be clustered such that all clustered jobs
          will run approximately for the same duration specified by the
          clusters.maxruntime property. In the above case we assume all jobs
          referring to transformation B run for 100 seconds. For jobs with
          significantly differing runtime, the runtime property will be
          associated with the jobs in the DAX.</p>
<p>In addition to the above two profiles, we need to inform
          pegasus-plan to use runtime clustering. This is done by setting the
          following property .</p>
<pre class="programlisting"><span class="bold"><strong> pegasus.clusterer.preference          Runtime</strong></span> </pre>
<p></p>
</div>
<div class="section">
<div class="titlepage"><div><div><h5 class="title">
<a name="label_clustering"></a>10.2.1.1.3. Label Clustering</h5></div></div></div>
<p>In label based clustering, the user labels the workflow. All
          jobs having the same label value are clustered into a single
          clustered job. This allows the user to create clusters or use a
          clustering technique that is specific to his workflows. If there is
          no label associated with the job, the job is not clustered and is
          executed as is</p>
<div class="figure">
<a name="idp68158896"></a><p class="title"><b>Figure 10.4. Label-based clustering</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><img src="images/advanced-clustering-3.png" align="middle" height="750" alt="Label-based clustering"></div></div>
</div>
<p><br class="figure-break"></p>
<p>Since, the jobs in a cluster in this case are not independent,
          it is important the jobs are executed in the correct order. This is
          done by doing a topological sort on the jobs in each cluster. To use
          label based clustering the user needs to set the <span class="bold"><strong>--cluster</strong></span> option of <span class="bold"><strong>pegasus-plan</strong></span> to label.</p>
<div class="section">
<div class="titlepage"><div><div><h6 class="title">
<a name="idp68151904"></a>10.2.1.1.3.1. Labelling the Workflow</h6></div></div></div>
<p>The labels for the jobs in the workflow are specified by
            associated <span class="bold"><strong>pegasus</strong></span> profile keys
            with the jobs during the DAX generation process. The user can
            choose which profile key to use for labeling the workflow. By
            default, it is assumed that the user is using the PEGASUS profile
            key label to associate the labels. To use another key, in the
            <span class="bold"><strong>pegasus</strong></span> namespace the user needs
            to set the following property</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>pegasus.clusterer.label.key</p></li></ul></div>
<p>For example if the user sets <span class="bold"><strong>pegasus.clusterer.label.key </strong></span>to <span class="bold"><strong>user_label</strong></span> then the job description in the
            DAX looks as follows</p>
<pre class="programlisting">&lt;adag &gt;
...
  &lt;job id="ID000004" namespace="app" name="analyze" version="1.0" level="1" &gt;
    &lt;argument&gt;-a bottom -T60  -i &lt;filename file="user.f.c1"/&gt;  -o &lt;filename file="user.f.d"/&gt;&lt;/argument&gt;
    &lt;profile namespace="pegasus" key="user_label"&gt;p1&lt;/profile&gt;
    &lt;uses file="user.f.c1" link="input" register="true" transfer="true"/&gt;
    &lt;uses file="user.f.c2" link="input" register="true" transfer="true"/&gt;
    &lt;uses file="user.f.d" link="output" register="true" transfer="true"/&gt;
  &lt;/job&gt;
...
&lt;/adag&gt;</pre>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>The above states that the <span class="bold"><strong>pegasus</strong></span> profiles with key as <span class="bold"><strong>user_label</strong></span> are to be used for
                designating clusters.</p></li>
<li class="listitem"><p>Each job with the same value for <span class="bold"><strong>pegasus</strong></span> profile key <span class="bold"><strong>user_label </strong></span>appears in the same
                cluster.</p></li>
</ul></div>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h5 class="title">
<a name="idp68141568"></a>10.2.1.1.4. Recursive Clustering</h5></div></div></div>
<p>In some cases, a user may want to use a combination of
          clustering techniques. For e.g. a user may want some jobs in the
          workflow to be horizontally clustered and some to be label
          clustered. This can be achieved by specifying a comma separated list
          of clustering techniques to the<span class="bold"><strong>
          --cluster</strong></span> option of <span class="bold"><strong>pegasus-plan</strong></span>. In this case the clustering
          techniques are applied one after the other on the workflow in the
          order specified on the command line.</p>
<p>For example</p>
<pre class="programlisting">$ <span class="emphasis"><em>pegasus-plan --dax example.dax --dir ./dags --cluster label,horizontal -s siteX --output local --verbose</em></span></pre>
<div class="figure">
<a name="idp68137200"></a><p class="title"><b>Figure 10.5. Recursive clustering</b></p>
<div class="figure-contents"><div class="mediaobject" align="center"><img src="images/advanced-clustering-4.png" align="middle" height="1000" alt="Recursive clustering"></div></div>
</div>
<br class="figure-break">
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp68126864"></a>10.2.1.2. Execution of the Clustered Job</h4></div></div></div>
<p>The execution of the clustered job on the remote site, involves
        the execution of the smaller constituent jobs either</p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="bold"><strong>sequentially on a single node of the
            remote site</strong></span></p>
<p>The clustered job is executed using <span class="bold"><strong>pegasus-cluster</strong></span>, a wrapper tool written in
            C that is distributed as part of the PEGASUS. It takes in the jobs
            passed to it, and ends up executing them sequentially on a single
            node. To use pegasus-cluster for executing any clustered job on a
            siteX, there needs to be an entry in the transformation catalog
            for an executable with the logical name seqexec and namespace as
            pegasus.</p>
<pre class="programlisting"><span class="bold"><strong>#site  transformation   pfn            type                 architecture    profiles</strong></span>

siteX    pegasus::seqexec     /usr/pegasus/bin/pegasus-cluster INSTALLED       INTEL32::LINUX NULL</pre>
<p>If the entry is not specified, Pegasus will attempt create a
            default path on the basis of the environment profile PEGASUS_HOME
            specified in the site catalog for the remote site.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>On multiple nodes of the remote site
            using MPI based task management tool called Pegasus MPI Cluster
            (PMC)</strong></span></p>
<p>The clustered job is executed using <span class="bold"><strong>pegasus-mpi-cluster</strong></span>, a wrapper MPI program
            written in C that is distributed as part of the PEGASUS. A PMC job
            consists of a single master process (this process is rank 0 in MPI
            parlance) and several worker processes. These processes follow the
            standard master-worker architecture. The master process manages
            the workflow and assigns workflow tasks to workers for execution.
            The workers execute the tasks and return the results to the
            master. Communication between the master and the workers is
            accomplished using a simple text-based protocol implemented using
            MPI_Send and MPI_Recv. PMC relies on a shared filesystem on the
            remote site to manage the individual tasks stdout and stderr and
            stage it back to the submit host as part of it's own
            stdout/stderr.</p>
<p>The input format for PMC is a DAG based format similar to
            Condor DAGMan's. PMC follows the dependencies specified in the DAG
            to release the jobs in the right order and executes parallel jobs
            via the workers when possible. The input file for PMC is
            automatically generated by the Pegasus Planner when generating the
            executable workflow. PMC allows for a finer grained control on how
            each task is executed. This can be enabled by associating the
            following pegasus profiles with the jobs in the DAX</p>
<div class="table">
<a name="idp68124112"></a><p class="title"><b>Table 10.1. Table : Pegasus Profiles that can be associated with jobs
              in the DAX for PMC</b></p>
<div class="table-contents"><table summary="Table : Pegasus Profiles that can be associated with jobs
              in the DAX for PMC" border="1">
<colgroup>
<col>
<col>
</colgroup>
<tbody>
<tr>
<td><span class="bold"><strong>Key</strong></span></td>
<td><span class="bold"><strong>Description</strong></span></td>
</tr>
<tr>
<td>pmc_request_memory</td>
<td>This key is used to set the -m option for
                    pegasus-mpi-cluster. It specifies the amount of memory in
                    MB that a job requires. This profile is usually set in the
                    DAX for each job.</td>
</tr>
<tr>
<td>pmc_request_cpus</td>
<td>This key is used to set the -c option for
                    pegasus-mpi-cluster. It specifies the number of cpu's that
                    a job requires. This profile is usually set in the DAX for
                    each job.</td>
</tr>
<tr>
<td>pmc_priority</td>
<td>This key is used to set the -p option for
                    pegasus-mpi-cluster. It specifies the priority for a job .
                    This profile is usually set in the DAX for each job.
                    Negative values are allowed for priorities.</td>
</tr>
<tr>
<td>pmc_task_arguments</td>
<td>The key is used to pass any extra arguments to the
                    PMC task during the planning time. They are added to the
                    very end of the argument string constructed for the task
                    in the PMC file. Hence, allows for overriding of any
                    argument constructed by the planner for any particular
                    task in the PMC job.</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>Refer to the pegasus-mpi-cluster man page in the <a class="link" href="cli.php#pegasus-cli-chapter">command line tools chapter</a> to
            know more about PMC and how it schedules individual tasks.</p>
<p>It is recommended to have a pegasus::mpiexec entry in the
            transformation catalog to specify the path to PMC on the remote
            and specify the relevant globus profiles such as xcount,
            host_xcount and maxwalltime to control size of the MPI job.</p>
<pre class="programlisting"># multiple line text-based transformation catalog: 2014-09-30T16:11:11.947-07:00
tr pegasus::mpiexec {
        site siteX {
                profile globus "host_xcount" "1" 
                profile globus "xcount" "32" 
                pfn "/usr/pegasus/bin/pegasus-mpi-cluster"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}</pre>
<p>the entry is not specified, Pegasus will attempt create a
            default path on the basis of the environment profile PEGASUS_HOME
            specified in the site catalog for the remote site.</p>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Tip</h3>
<p>Users are encouraged to use label based clustering in
              conjunction with PMC</p>
</div>
</li>
</ul></div>
<div class="section">
<div class="titlepage"><div><div><h5 class="title">
<a name="idp68110640"></a>10.2.1.2.1. Specification of Method of Execution for Clustered
          Jobs</h5></div></div></div>
<p>The method execution of the clustered job(whether to launch
          via mpiexec or seqexec) can be specified</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
<p><span class="bold"><strong>globally in the properties
              file</strong></span></p>
<p>The user can set a property in the properties file that
              results in all the clustered jobs of the workflow being executed
              by the same type of executable.</p>
<pre class="programlisting"><span class="bold"><strong>#PEGASUS PROPERTIES FILE</strong></span>
pegasus.clusterer.job.aggregator seqexec|mpiexec</pre>
<p>In the above example, all the clustered jobs on the remote
              sites are going to be launched via the property value, as long
              as the property value is not overridden in the site
              catalog.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>associating profile key
              job.aggregator with the site in the site
              catalog</strong></span></p>
<pre class="programlisting">&lt;site handle="siteX" gridlaunch = "/shared/PEGASUS/bin/kickstart"&gt;
    &lt;profile namespace="env" key="GLOBUS_LOCATION" &gt;/home/shared/globus&lt;/profile&gt;
    &lt;profile namespace="env" key="LD_LIBRARY_PATH"&gt;/home/shared/globus/lib&lt;/profile&gt;
    &lt;profile namespace="pegasus" key="job.aggregator" &gt;seqexec&lt;/profile&gt;
    &lt;lrc url="rls://siteX.edu" /&gt;
    &lt;gridftp  url="gsiftp://siteX.edu/" storage="/home/shared/work" major="2" minor="4" patch="0" /&gt;
    &lt;jobmanager universe="transfer" url="siteX.edu/jobmanager-fork" major="2" minor="4" patch="0" /&gt;
    &lt;jobmanager universe="vanilla" url="siteX.edu/jobmanager-condor" major="2" minor="4" patch="0" /&gt;
    &lt;workdirectory &gt;/home/shared/storage&lt;/workdirectory&gt;
  &lt;/site&gt;</pre>
<p>In the above example, all the clustered jobs on a siteX
              are going to be executed via seqexec, as long as the value is
              not overridden in the transformation catalog.</p>
</li>
<li class="listitem">
<p><span class="bold"><strong>associating profile key
              job.aggregator with the transformation that is being clustered,
              in the transformation catalog</strong></span></p>
<pre class="programlisting"># multiple line text-based transformation catalog: 2014-09-30T16:11:52.230-07:00
tr B {
        site siteX {
                profile pegasus "clusters.size" "3" 
                profile pegasus "job.aggregator" "mpiexec" 
                pfn "/shared/PEGASUS/bin/jobB"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}
</pre>
<p>In the above example, all the clustered jobs that consist
              of transformation B on siteX will be executed via
              mpiexec.</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p><span class="bold"><strong> The clustering of jobs on a site
                only happens only if </strong></span></p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem"><p>there exists an entry in the transformation
                      catalog for the clustering executable that has been
                      determined by the above 3 rules</p></li>
<li class="listitem"><p>the number of jobs being clustered on the site are
                      more than 1</p></li>
</ul></div>
</div>
</li>
</ol></div>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp68095968"></a>10.2.1.3. Outstanding Issues</h4></div></div></div>
<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem">
<p><span class="bold"><strong>Label Clustering</strong></span></p>
<p>More rigorous checks are required to ensure that the
            labeling scheme applied by the user is valid.</p>
</li></ol></div>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="optimization.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="optimization.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="large_workflows.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 10. Optimizing Workflows for Efficiency and Scalability </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> 10.3. How to Scale Large Workflows</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
