<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="submit_directory.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="example_workflows.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="chapter" title="Chapter 8. Monitoring, Debugging and Statistics">
<div class="titlepage"><div><div><h2 class="title">
<a name="monitoring_debugging_stats"></a>Chapter 8. Monitoring, Debugging and Statistics</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="monitoring_debugging_stats.php#workflow_status">8.1. Workflow Status</a></span></dt>
<dt><span class="section"><a href="monitoring_debugging_stats.php#plotting_statistics">8.2. Plotting and Statistics</a></span></dt>
<dt><span class="section"><a href="monitoring_debugging_stats.php#idp16239184">8.3. Dashboard</a></span></dt>
</dl></div>
<p>Pegasus comes bundled with useful tools that help users debug
  workflows and generate useful statistics and plots about their workflow
  runs. These tools internally parse the Condor log files and have a similar
  interface. With the exception of pegasus-monitord (see below), all tools
  take in the submit directory as an argument. Users can invoke the tools
  listed in this chapter as follows:</p>
<pre class="programlisting">$ pegasus-[toolname]   &lt;path to the submit directory&gt;</pre>
<p>All these utilities query a database ( usually a sqllite in the
  workflow submit directory ) that is populated by the monitoring daemon
  <span class="bold"><strong>pegasus-monitord</strong></span> .</p>
<div class="section" title="8.1. Workflow Status">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="workflow_status"></a>8.1. Workflow Status</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="monitoring_debugging_stats.php#monitoring_pegasus-status">8.1.1. pegasus-status</a></span></dt>
<dt><span class="section"><a href="monitoring_debugging_stats.php#monitoring_pegasus-analyzer">8.1.2. pegasus-analyzer</a></span></dt>
<dt><span class="section"><a href="monitoring_debugging_stats.php#monitoring_pegasus-remove">8.1.3. pegasus-remove</a></span></dt>
<dt><span class="section"><a href="monitoring_debugging_stats.php#idp18200544">8.1.4. Resubmitting failed workflows</a></span></dt>
</dl></div>
<p>As the number of jobs and tasks in workflows increase, the ability
    to track the progress and quickly debug a workflow becomes more and more
    important. Pegasus comes with a series of utilities that can be used to
    monitor and debug workflows both in real-time as well as after execution
    is already completed.</p>
<div class="section" title="8.1.1. pegasus-status">
<div class="titlepage"><div><div><h3 class="title">
<a name="monitoring_pegasus-status"></a>8.1.1. pegasus-status</h3></div></div></div>
<p>To monitor the execution of the workflow run the
      <span class="command"><strong>pegasus-status</strong></span> command as suggested by the output of
      the <span class="command"><strong>pegasus-run</strong></span> command.
      <span class="command"><strong>pegasus-status</strong></span> shows the current status of the Condor
      Q as pertaining to the master workflow from the workflow directory you
      are pointing it to. In a second section, it will show a summary of the
      state of all jobs in the workflow and all of its sub-workflows.</p>
<p>The details of <span class="command"><strong>pegasus-status</strong></span> are described in
      its respective <a class="link" href="cli-pegasus-status.php" title="pegasus-status">manual page</a>.
      There are many options to help you gather the most out of this tool,
      including a watch-mode to repeatedly draw information, various modes to
      add more information, and legends if you are new to it, or need to
      present it.</p>
<pre class="programlisting"><span class="command"><strong>$ pegasus-status /Workflow/dags/directory</strong></span>
STAT  IN_STATE  JOB
Run      05:08  level-3-0
Run      04:32   |-sleep_ID000005
Run      04:27   \_subdax_level-2_ID000004
Run      03:51      |-sleep_ID000003
Run      03:46      \_subdax_level-1_ID000002
Run      03:10         \_sleep_ID000001
Summary: 6 Condor jobs total (R:6)

UNREADY   READY     PRE  QUEUED    POST SUCCESS FAILURE %DONE
      0       0       0       6       0       3       0  33.3
Summary: 3 DAGs total (Running:3)</pre>
<p>Without the <em class="parameter"><code>-l</code></em> option, the only a summary
      of the workflow statistics is shown under the current queue status.
      However, with the <em class="parameter"><code>-l</code></em> option, it will show each
      sub-workflow separately:</p>
<pre class="programlisting"><span class="command"><strong>$ pegasus-status -l /Workflow/dags/directory</strong></span>
STAT  IN_STATE  JOB
Run      07:01  level-3-0
Run      06:25   |-sleep_ID000005
Run      06:20   \_subdax_level-2_ID000004
Run      05:44      |-sleep_ID000003
Run      05:39      \_subdax_level-1_ID000002
Run      05:03         \_sleep_ID000001
Summary: 6 Condor jobs total (R:6)

UNRDY READY   PRE  IN_Q  POST  DONE  FAIL %DONE STATE   DAGNAME
    0     0     0     1     0     1     0  50.0 Running level-2_ID000004/level-1_ID000002/level-1-0.dag
    0     0     0     2     0     1     0  33.3 Running level-2_ID000004/level-2-0.dag
    0     0     0     3     0     1     0  25.0 Running *level-3-0.dag
    0     0     0     6     0     3     0  33.3         TOTALS (9 jobs)
Summary: 3 DAGs total (Running:3)</pre>
<p>The following output shows a successful workflow of workflow
      summary after it has finished.</p>
<pre class="programlisting"><span class="command"><strong>$ pegasus-status work/2011080514</strong></span>
(no matching jobs found in Condor Q)
UNREADY   READY     PRE  QUEUED    POST SUCCESS FAILURE %DONE
      0       0       0       0       0   7,137       0 100.0
Summary: 44 DAGs total (Success:44)</pre>
<div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Warning</h3>
<p>For large workflows with many jobs, please note that
          <span class="command"><strong>pegasus-status</strong></span> will take time to compile state
          from all workflow files. This typically affects the initial run, and
          sub-sequent runs are faster due to the file system's buffer cache.
          However, on a low-RAM machine, thrashing is a possibility.</p>
</div>
<p>The following output show a failed workflow after no more
      jobs from it exist. Please note how no active jobs are shown, and the
      failure status of the total workflow.</p>
<pre class="programlisting"><span class="command"><strong>$ pegasus-status work/submit</strong></span>
(no matching jobs found in Condor Q)
UNREADY   READY     PRE  QUEUED    POST SUCCESS FAILURE %DONE
     20       0       0       0       0       0       2   0.0
Summary: 1 DAG total (Failure:1)</pre>
</div>
<div class="section" title="8.1.2. pegasus-analyzer">
<div class="titlepage"><div><div><h3 class="title">
<a name="monitoring_pegasus-analyzer"></a>8.1.2. pegasus-analyzer</h3></div></div></div>
<p>Pegasus-analyzer is a command-line utility for parsing several
      files in the workflow directory and summarizing useful information to
      the user. It should be used after the workflow has already finished
      execution. pegasus-analyzer quickly goes through the jobstate.log file,
      and isolates jobs that did not complete successfully. It then parses
      their submit, and kickstart output files, printing to the user detailed
      information for helping the user debug what happened to his/her
      workflow.</p>
<p>The simplest way to invoke pegasus-analyzer is to simply give it a
      workflow run directory, like in the example below:</p>
<pre class="programlisting">$ pegasus-analyzer  /home/user/run0004
pegasus-analyzer: initializing...

************************************Summary*************************************

 Total jobs         :     26 (100.00%)
 # jobs succeeded   :     25 (96.15%)
 # jobs failed      :      1 (3.84%)
 # jobs unsubmitted :      0 (0.00%)

******************************Failed jobs' details******************************

============================register_viz_glidein_7_0============================

 last state: POST_SCRIPT_FAILURE
       site: local
submit file: /home/user/run0004/register_viz_glidein_7_0.sub
output file: /home/user/run0004/register_viz_glidein_7_0.out.002
 error file: /home/user/run0004/register_viz_glidein_7_0.err.002

-------------------------------Task #1 - Summary--------------------------------

site        : local
executable  : /lfs1/software/install/pegasus/default/bin/rc-client
arguments   : -Dpegasus.user.properties=/lfs1/work/pegasus/run0004/pegasus.15181.properties \
-Dpegasus.catalog.replica.url=rlsn://smarty.isi.edu --insert register_viz_glidein_7_0.in
exitcode    : 1
working dir : /lfs1/work/pegasus/run0004

---------Task #1 - pegasus::rc-client - pegasus::rc-client:1.0 - stdout---------

2009-02-20 16:25:13.467 ERROR [root] You need to specify the pegasus.catalog.replica property
2009-02-20 16:25:13.468 WARN  [root] non-zero exit-code 1</pre>
<p>In
      the case above, pegasus-analyzer's output contains a brief summary
      section, showing how many jobs have succeeded and how many have failed.
      After that, pegasus-analyzer will print information about each job that
      failed, showing its last known state, along with the location of its
      submit, output, and error files. pegasus-analyzer will also display any
      stdout and stderr from the job, as recorded in its kickstart record.
      Please consult pegasus-analyzer's man page for more examples and a
      detailed description of its various command-line options.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>Starting with 4.0 release, by default pegasus analyzer queries
        the database to debug the workflow. If you want it to use files in the
        submit directory , use the <span class="bold"><strong>--files</strong></span>
        option.</p>
</div>
</div>
<div class="section" title="8.1.3. pegasus-remove">
<div class="titlepage"><div><div><h3 class="title">
<a name="monitoring_pegasus-remove"></a>8.1.3. pegasus-remove</h3></div></div></div>
<p>If you want to abort your workflow for any reason you can use the
      pegasus-remove command listed in the output of pegasus-run invocation or
      by specifying the Dag directory for the workflow you want to
      terminate.</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-remove /PATH/To/WORKFLOW DIRECTORY</strong></span></pre>
</div>
<div class="section" title="8.1.4. Resubmitting failed workflows">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp18200544"></a>8.1.4. Resubmitting failed workflows</h3></div></div></div>
<p>Pegasus will remove the DAGMan and all the jobs related to the
      DAGMan from the condor queue. A rescue DAG will be generated in case you
      want to resubmit the same workflow and continue execution from where it
      last stopped. A rescue DAG only skips jobs that have completely
      finished. It does not continue a partially running job unless the
      executable supports checkpointing.</p>
<p>To resubmit an aborted or failed workflow with the same submit
      files and rescue Dag just rerun the pegasus-run command</p>
<pre class="programlisting"><span class="bold"><strong>$ pegasus-run /Path/To/Workflow/Directory</strong></span></pre>
</div>
</div>
<div class="section" title="8.2. Plotting and Statistics">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="plotting_statistics"></a>8.2. Plotting and Statistics</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="monitoring_debugging_stats.php#idp9871424">8.2.1. pegasus-statistics</a></span></dt>
<dt><span class="section"><a href="monitoring_debugging_stats.php#idp15186112">8.2.2. pegasus-plots</a></span></dt>
</dl></div>
<p>Pegasus plotting and statistics tools queries the Stampede database
    created by pegasus-monitord for generating the output.The stampede scheme
    can be found <a class="link" href="reference.php#stampede-schema">here</a>.</p>
<p>The statistics and plotting tools use the following terminology for
    defining tasks, jobs etc. Pegasus takes in a DAX which is composed of
    tasks. Pegasus plans it into a Condor DAG / Executable workflow that
    consists of Jobs. In case of Clustering, multiple tasks in the DAX can be
    captured into a single job in the Executable workflow. When DAGMan
    executes a job, a job instance is populated . Job instances capture
    information as seen by DAGMan. In case DAGMan retires a job on detecting a
    failure , a new job instance is populated. When DAGMan finds a job
    instance has finished , an invocation is associated with job instance. In
    case of clustered job, multiple invocations will be associated with a
    single job instance. If a Pre script or Post Script is associated with a
    job instance, then invocations are populated in the database for the
    corresponding job instance.</p>
<div class="section" title="8.2.1. pegasus-statistics">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp9871424"></a>8.2.1. pegasus-statistics</h3></div></div></div>
<p>Pegasus statistics can compute statistics over one or more than
      one workflow run.</p>
<p>Command to generate statistics over a single run is as shown
      below.</p>
<pre class="programlisting">$ <span class="emphasis"><em><span class="bold"><strong>pegasus-statistics /scratch/grid-setup/run0001/ -s all</strong></span> </em></span>


#
# Pegasus Workflow Management System - http://pegasus.isi.edu
#
# Workflow summary:
#   Summary of the workflow execution. It shows total
#   tasks/jobs/sub workflows run, how many succeeded/failed etc.
#   In case of hierarchical workflow the calculation shows the
#   statistics across all the sub workflows.It shows the following
#   statistics about tasks, jobs and sub workflows.
#     * Succeeded - total count of succeeded tasks/jobs/sub workflows.
#     * Failed - total count of failed tasks/jobs/sub workflows.
#     * Incomplete - total count of tasks/jobs/sub workflows that are
#       not in succeeded or failed state. This includes all the jobs
#       that are not submitted, submitted but not completed etc. This
#       is calculated as  difference between 'total' count and sum of
#       'succeeded' and 'failed' count.
#     * Total - total count of tasks/jobs/sub workflows.
#     * Retries - total retry count of tasks/jobs/sub workflows.
#     * Total+Retries - total count of tasks/jobs/sub workflows executed
#       during workflow run. This is the cumulative of retries,
#       succeeded and failed count.
# Workflow wall time:
#   The walltime from the start of the workflow execution to the end as
#   reported by the DAGMAN.In case of rescue dag the value is the
#   cumulative of all retries.
# Workflow cumulative job wall time:
#   The sum of the walltime of all jobs as reported by kickstart.
#   In case of job retries the value is the cumulative of all retries.
#   For workflows having sub workflow jobs (i.e SUBDAG and SUBDAX jobs),
#   the walltime value includes jobs from the sub workflows as well.
# Cumulative job walltime as seen from submit side:
#   The sum of the walltime of all jobs as reported by DAGMan.
#   This is similar to the regular cumulative job walltime, but includes
#   job management overhead and delays. In case of job retries the value
#   is the cumulative of all retries. For workflows having sub workflow
#   jobs (i.e SUBDAG and SUBDAX jobs), the walltime value includes jobs
#   from the sub workflows as well.
------------------------------------------------------------------------------
Type           Succeeded Failed  Incomplete  Total     Retries   Total+Retries
Tasks          4         0       0           4         0         4            
Jobs           17        0       0           17        0         17           
Sub-Workflows  0         0       0           0         0         0            
------------------------------------------------------------------------------

Workflow wall time                               : 5 mins, 18 secs
Workflow cumulative job wall time                : 4 mins, 2 secs
Cumulative job walltime as seen from submit side : 4 mins, 10 secs

</pre>
<p>By default the output gets generated to a statistics folder inside
      the submit directory. The output that is generated by pegasus-statistics
      is based on the value set for command line option 's'(statistics_level).
      In the sample run the command line option 's' is set to 'all' to
      generate all the statistics information for the workflow run. Please
      consult the pegasus-statistics man page to find a detailed description
      of various command line options.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>In case of hierarchal workflows, the metrics that are displayed
        on stdout take into account all the jobs/tasks/sub workflows that make
        up the workflow by recursively iterating through each sub
        workflow.</p>
</div>
<p></p>
<p>Command to generate statistics over all workflow runs populated in
      a single database is as shown below.</p>
<pre class="programlisting">$ <span class="emphasis"><em><span class="bold"><strong>pegasus-statistics -Dpegasus.monitord.output='mysql://s_user:s_user123@127.0.0.1:3306/stampede' -o /scratch/workflow_1_2/statistics -s all --multiple-wf</strong></span> </em></span>


#
# Pegasus Workflow Management System - http://pegasus.isi.edu
#
# Workflow summary:
#   Summary of the workflow execution. It shows total
#   tasks/jobs/sub workflows run, how many succeeded/failed etc.
#   In case of hierarchical workflow the calculation shows the
#   statistics across all the sub workflows.It shows the following
#   statistics about tasks, jobs and sub workflows.
#     * Succeeded - total count of succeeded tasks/jobs/sub workflows.
#     * Failed - total count of failed tasks/jobs/sub workflows.
#     * Incomplete - total count of tasks/jobs/sub workflows that are
#       not in succeeded or failed state. This includes all the jobs
#       that are not submitted, submitted but not completed etc. This
#       is calculated as  difference between 'total' count and sum of
#       'succeeded' and 'failed' count.
#     * Total - total count of tasks/jobs/sub workflows.
#     * Retries - total retry count of tasks/jobs/sub workflows.
#     * Total+Retries - total count of tasks/jobs/sub workflows executed
#       during workflow run. This is the cumulative of retries,
#       succeeded and failed count.
# Workflow wall time:
#   The walltime from the start of the workflow execution to the end as
#   reported by the DAGMAN.In case of rescue dag the value is the
#   cumulative of all retries.
# Workflow cumulative job wall time:
#   The sum of the walltime of all jobs as reported by kickstart.
#   In case of job retries the value is the cumulative of all retries.
#   For workflows having sub workflow jobs (i.e SUBDAG and SUBDAX jobs),
#   the walltime value includes jobs from the sub workflows as well.
# Cumulative job walltime as seen from submit side:
#   The sum of the walltime of all jobs as reported by DAGMan.
#   This is similar to the regular cumulative job walltime, but includes
#   job management overhead and delays. In case of job retries the value
#   is the cumulative of all retries. For workflows having sub workflow
#   jobs (i.e SUBDAG and SUBDAX jobs), the walltime value includes jobs
#   from the sub workflows as well.
------------------------------------------------------------------------------
Type           Succeeded Failed  Incomplete  Total     Retries   Total+Retries
Tasks          8         0       0           8         0         8            
Jobs           34        0       0           34        0         34           
Sub-Workflows  0         0       0           0         0         0            
------------------------------------------------------------------------------

Workflow cumulative job wall time                : 8 mins, 5 secs
Cumulative job walltime as seen from submit side : 8 mins, 35 secs

</pre>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>When computing statistics over multiple workflows, please
          note,</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>All workflow run information should be populated in a
              single STAMPEDE database.</p></li>
<li class="listitem"><p>The --output argument must be specified.</p></li>
<li class="listitem"><p>Job statistics information is not computed.</p></li>
<li class="listitem"><p>Workflow wall time information is not computed.</p></li>
</ol></div>
</div>
<p>Pegasus statistics can also compute statistics over a few
      specified workflow runs, by specifying the either the submit
      directories, or the workflow UUIDs.</p>
<pre class="programlisting">pegasus-statistics -Dpegasus.monitord.output='&lt;DB_URL&gt;' -o &lt;OUTPUT_DIR&gt; &lt;SUBMIT_DIR_1&gt; &lt;SUBMIT_DIR_2&gt; .. &lt;SUBMIT_DIR_n&gt;

OR

pegasus-statistics -Dpegasus.monitord.output='&lt;DB_URL&gt;' -o &lt;OUTPUT_DIR&gt; <span class="bold"><strong>--isuuid</strong></span> &lt;UUID_1&gt; &lt;UUID_2&gt; .. &lt;UUID_n&gt;

</pre>
<p>pegasus-statistics summary which is printed on the stdout contains
      the following information.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
<p><span class="bold"><strong>Workflow summary</strong></span> - Summary of
          the workflow execution. In case of hierarchical workflow the
          calculation shows the statistics across all the sub workflows.It
          shows the following statistics about tasks, jobs and sub
          workflows.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="circle">
<li class="listitem"><p><span class="bold"><strong>Succeeded</strong></span> - total count
              of succeeded tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Failed</strong></span> - total count of
              failed tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Incomplete</strong></span> - total count
              of tasks/jobs/sub workflows that are not in succeeded or failed
              state. This includes all the jobs that are not submitted,
              submitted but not completed etc. This is calculated as
              difference between 'total' count and sum of 'succeeded' and
              'failed' count.</p></li>
<li class="listitem"><p><span class="bold"><strong>Total</strong></span> - total count of
              tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Retries</strong></span> - total retry
              count of tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Total Run</strong></span> - total count
              of tasks/jobs/sub workflows executed during workflow run. This
              is the cumulative of total retries, succeeded and failed
              count.</p></li>
</ul></div>
</li>
<li class="listitem"><p><span class="bold"><strong>Workflow wall time</strong></span> - The
          walltime from the start of the workflow execution to the end as
          reported by the DAGMAN.In case of rescue dag the value is the
          cumulative of all retries.</p></li>
<li class="listitem"><p><span class="bold"><strong>Workflow cummulate job wall
          time</strong></span> - The sum of the walltime of all jobs as reported by
          kickstart. In case of job retries the value is the cumulative of all
          retries. For workflows having sub workflow jobs (i.e SUBDAG and
          SUBDAX jobs), the walltime value includes jobs from the sub
          workflows as well. This value is multiplied by the multiplier_factor
          in the job instance table.</p></li>
<li class="listitem"><p><span class="bold"><strong>Cumulative job walltime as seen from
          submit side</strong></span> - The sum of the walltime of all jobs as
          reported by DAGMan. This is similar to the regular cumulative job
          walltime, but includes job management overhead and delays. In case
          of job retries the value is the cumulative of all retries. For
          workflows having sub workflow jobs (i.e SUBDAG and SUBDAX jobs), the
          walltime value includes jobs from the sub workflows. This value is
          multiplied by the multiplier_factor in the job instance
          table.</p></li>
</ul></div>
<p>pegasus-statistics generates the following statistics files based
      on the command line options set.</p>
<p><span class="bold"><strong>Workflow statistics file per workflow
      [workflow.txt]</strong></span></p>
<p>Workflow statistics file per workflow contains the following
      information about each workflow run. In case of hierarchal workflows,
      the file contains a table for each sub workflow. The file also contains
      a 'Total' table at the bottom which is the cumulative of all the
      individual statistics details.</p>
<p>A sample table is shown below. It shows the following statistics
      about tasks, jobs and sub workflows.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>Workflow retries</strong></span> - number of
          times a workflow was retried.</p></li>
<li class="listitem"><p><span class="bold"><strong>Succeeded</strong></span> - total count of
          succeeded tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Failed</strong></span> - total count of
          failed tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Incomplete</strong></span> - total count of
          tasks/jobs/sub workflows that are not in succeeded or failed state.
          This includes all the jobs that are not submitted, submitted but not
          completed etc. This is calculated as difference between 'total'
          count and sum of 'succeeded' and 'failed' count.</p></li>
<li class="listitem"><p><span class="bold"><strong>Total</strong></span> - total count of
          tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Retries</strong></span> - total retry count
          of tasks/jobs/sub workflows.</p></li>
<li class="listitem"><p><span class="bold"><strong>Total Run</strong></span> - total count of
          tasks/jobs/sub workflows executed during workflow run. This is the
          cumulative of total retries, succeeded and failed count.</p></li>
</ul></div>
<div class="table">
<a name="idp10104000"></a><p class="title"><b>Table 8.1. Workflow Statistics</b></p>
<div class="table-contents"><table summary="Workflow Statistics" border="1">
<colgroup>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th align="center">#</th>
<th align="center">Type</th>
<th align="center">Succeeded</th>
<th align="center">Failed</th>
<th align="center">Incomplete</th>
<th align="center">Total</th>
<th align="center">Retries</th>
<th align="center">Total Run</th>
<th align="center">Workflow Retries</th>
</tr></thead>
<tbody>
<tr>
<td align="center">2a6df11b-9972-4ba0-b4ba-4fd39c357af4</td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center">0</td>
</tr>
<tr>
<td align="center"> </td>
<td align="center">Tasks</td>
<td align="center">4</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">4</td>
<td align="center">0</td>
<td align="center">4</td>
<td align="center"> </td>
</tr>
<tr>
<td align="center"> </td>
<td align="center">Jobs</td>
<td align="center">13</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">13</td>
<td align="center">0</td>
<td align="center">13</td>
<td align="center"> </td>
</tr>
<tr>
<td align="center"> </td>
<td align="center">Sub Workflows</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center"> </td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p><span class="bold"><strong>Job statistics file per workflow
      [jobs.txt]</strong></span></p>
<p>Job statistics file per workflow contains the following details
      about the job instances in each workflow. A sample file is shown
      below.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>Job</strong></span> - the name of the job
          instance</p></li>
<li class="listitem"><p><span class="bold"><strong>Try</strong></span> - the number representing
          the job instance run count.</p></li>
<li class="listitem"><p><span class="bold"><strong>Site</strong></span> - the site where the job
          instance ran.</p></li>
<li class="listitem"><p><span class="bold"><strong>Kickstart(sec.)</strong></span> - the actual
          duration of the job instance in seconds on the remote compute
          node.</p></li>
<li class="listitem"><p><span class="bold"><strong>Mult</strong></span> - multiplier factor from
          the job instance table for the job.</p></li>
<li class="listitem"><p><span class="bold"><strong>Kickstart_Mult</strong></span> - value of the
          Kickstart column multiplied by Mult.</p></li>
<li class="listitem"><p><span class="bold"><strong>CPU-Time</strong></span> - remote CPU time
          computed as the stime + utime (when Kickstart is not used, this is
          empty).</p></li>
<li class="listitem"><p><span class="bold"><strong>Post(sec.)</strong></span> - the postscript
          time as reported by DAGMan.</p></li>
<li class="listitem"><p><span class="bold"><strong>CondorQTime(sec.)</strong></span> - the time
          between submission by DAGMan and the remote Grid submission. It is
          an estimate of the time spent in the condor q on the submit node
          .</p></li>
<li class="listitem"><p><span class="bold"><strong>Resource(sec.)</strong></span> - the time
          between the remote Grid submission and start of remote execution .
          It is an estimate of the time job instance spent in the remote queue
          .</p></li>
<li class="listitem"><p><span class="bold"><strong>Runtime(sec.)</strong></span> - the time
          spent on the resource as seen by Condor DAGMan . Is always
          &gt;=kickstart .</p></li>
<li class="listitem"><p><span class="bold"><strong>Seqexec(sec.)</strong></span> - the time
          taken for the completion of a clustered job instance .</p></li>
<li class="listitem"><p><span class="bold"><strong>Seqexec-Delay(sec.)</strong></span> - the
          time difference between the time for the completion of a clustered
          job instance and sum of all the individual tasks kickstart time
          .</p></li>
</ul></div>
<div class="table">
<a name="idp16089104"></a><p class="title"><b>Table 8.2. Job statistics</b></p>
<div class="table-contents"><table summary="Job statistics" border="1">
<colgroup>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th align="center">Job</th>
<th align="center">Try</th>
<th align="center">Site</th>
<th align="center">Kickstart</th>
<th align="center">Mult</th>
<th align="center">Kickstart_Mult</th>
<th align="center">CPU-Time</th>
<th align="center">Post</th>
<th align="center">CondorQTime</th>
<th align="center">Resource</th>
<th align="center">Runtime</th>
<th align="center">Seqexec</th>
<th align="center">Seqexec-Delay</th>
</tr></thead>
<tbody>
<tr>
<td align="center">analyze_ID0000004</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">60.002</td>
<td align="center">1</td>
<td align="center">60.002</td>
<td align="center">59.843</td>
<td align="center">5.0</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">62.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">create_dir_diamond_0_local</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.027</td>
<td align="center">1</td>
<td align="center">0.027</td>
<td align="center">0.003</td>
<td align="center">5.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">findrange_ID0000002</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">60.001</td>
<td align="center">10</td>
<td align="center">600.01</td>
<td align="center">59.921</td>
<td align="center">5.0</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">60.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">findrange_ID0000003</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">60.002</td>
<td align="center">10</td>
<td align="center">600.02</td>
<td align="center">59.912</td>
<td align="center">5.0</td>
<td align="center">10.0</td>
<td align="center">-</td>
<td align="center">61.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">preprocess_ID0000001</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">60.002</td>
<td align="center">1</td>
<td align="center">60.002</td>
<td align="center">59.898</td>
<td align="center">5.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">60.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">register_local_1_0</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.459</td>
<td align="center">1</td>
<td align="center">0.459</td>
<td align="center">0.432</td>
<td align="center">6.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">register_local_1_1</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.338</td>
<td align="center">1</td>
<td align="center">0.338</td>
<td align="center">0.331</td>
<td align="center">5.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">register_local_2_0</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.348</td>
<td align="center">1</td>
<td align="center">0.348</td>
<td align="center">0.342</td>
<td align="center">5.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">stage_in_local_local_0</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.39</td>
<td align="center">1</td>
<td align="center">0.39</td>
<td align="center">0.032</td>
<td align="center">5.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">stage_out_local_local_0_0</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.165</td>
<td align="center">1</td>
<td align="center">0.165</td>
<td align="center">0.108</td>
<td align="center">5.0</td>
<td align="center">10.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">stage_out_local_local_1_0</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.147</td>
<td align="center">1</td>
<td align="center">0.147</td>
<td align="center">0.098</td>
<td align="center">7.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">stage_out_local_local_1_1</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.139</td>
<td align="center">1</td>
<td align="center">0.139</td>
<td align="center">0.089</td>
<td align="center">5.0</td>
<td align="center">6.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">stage_out_local_local_2_0</td>
<td align="center">1</td>
<td align="center">local</td>
<td align="center">0.145</td>
<td align="center">1</td>
<td align="center">0.145</td>
<td align="center">0.101</td>
<td align="center">5.0</td>
<td align="center">5.0</td>
<td align="center">-</td>
<td align="center">0.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p><span class="bold"><strong>Transformation statistics file per workflow
      [breakdown.txt]</strong></span></p>
<p>Transformation statistics file per workflow contains information
      about the invocations in each workflow grouped by transformation name. A
      sample file is shown below.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>Transformation</strong></span> - name of the
          transformation.</p></li>
<li class="listitem"><p><span class="bold"><strong>Count</strong></span> - the number of times
          invocations with a given transformation name was executed.</p></li>
<li class="listitem"><p><span class="bold"><strong>Succeeded</strong></span> - the count of
          succeeded invocations with a given logical transformation name
          .</p></li>
<li class="listitem"><p><span class="bold"><strong>Failed</strong></span> - the count of failed
          invocations with a given logical transformation name .</p></li>
<li class="listitem"><p><span class="bold"><strong>Min (sec.)</strong></span> - the minimum
          runtime value of invocations with a given logical transformation
          name times the multipler_factor.</p></li>
<li class="listitem"><p><span class="bold"><strong>Max (sec.)</strong></span> - the minimum
          runtime value of invocations with a given logical transformation
          name times the multiplier_factor.</p></li>
<li class="listitem"><p><span class="bold"><strong>Mean (sec.)</strong></span> - the mean of the
          invocation runtimes with a given logical transformation name times
          the multiplier_factor.</p></li>
<li class="listitem"><p><span class="bold"><strong>Total (sec.)</strong></span> - the cumulative
          of runtime value of invocations with a given logical transformation
          name times the multiplier_factor.</p></li>
</ul></div>
<div class="table">
<a name="idp15594240"></a><p class="title"><b>Table 8.3. Transformation Statistics</b></p>
<div class="table-contents"><table summary="Transformation Statistics" border="1">
<colgroup>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th align="center">Transformation</th>
<th align="center">Count</th>
<th align="center">Succeeded</th>
<th align="center">Failed</th>
<th align="center">Min</th>
<th align="center">Max</th>
<th align="center">Mean</th>
<th align="center">Total</th>
</tr></thead>
<tbody>
<tr>
<td align="center">dagman::post</td>
<td align="center">13</td>
<td align="center">13</td>
<td align="center">0</td>
<td align="center">5.0</td>
<td align="center">7.0</td>
<td align="center">5.231</td>
<td align="center">68.0</td>
</tr>
<tr>
<td align="center">diamond::analyze</td>
<td align="center">1</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">60.002</td>
<td align="center">60.002</td>
<td align="center">60.002</td>
<td align="center">60.002</td>
</tr>
<tr>
<td align="center">diamond::findrange</td>
<td align="center">2</td>
<td align="center">2</td>
<td align="center">0</td>
<td align="center">600.01</td>
<td align="center">600.02</td>
<td align="center">600.02</td>
<td align="center">1200.03</td>
</tr>
<tr>
<td align="center">diamond::preprocess</td>
<td align="center">1</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">60.002</td>
<td align="center">60.002</td>
<td align="center">60.002</td>
<td align="center">60.002</td>
</tr>
<tr>
<td align="center">pegasus::dirmanager</td>
<td align="center">1</td>
<td align="center">1</td>
<td align="center">0</td>
<td align="center">0.027</td>
<td align="center">0.027</td>
<td align="center">0.027</td>
<td align="center">0.027</td>
</tr>
<tr>
<td align="center">pegasus::pegasus-transfer</td>
<td align="center">5</td>
<td align="center">5</td>
<td align="center">0</td>
<td align="center">0.139</td>
<td align="center">0.39</td>
<td align="center">0.197</td>
<td align="center">0.986</td>
</tr>
<tr>
<td align="center">pegasus::rc-client</td>
<td align="center">3</td>
<td align="center">3</td>
<td align="center">0</td>
<td align="center">0.338</td>
<td align="center">0.459</td>
<td align="center">0.382</td>
<td align="center">1.145</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p><span class="bold"><strong>Time statistics file
      [time.txt]</strong></span></p>
<p>Time statistics file contains job instance and invocation
      statistics information grouped by time and host. The time grouping can
      be on day/hour. The file contains the following tables Job instance
      statistics per day/hour, Invocation statistics per day/hour, Job
      instance statistics by host per day/hour and Invocation by host per
      day/hour. A sample Invocation statistics by host per day table is shown
      below.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>Job instance statistics per
          day/hour</strong></span> - the number of job instances run, total runtime
          sorted by day/hour.</p></li>
<li class="listitem"><p><span class="bold"><strong>Invocation statistics per
          day/hour</strong></span> - the number of invocations , total runtime
          sorted by day/hour.</p></li>
<li class="listitem"><p><span class="bold"><strong>Job instance statistics by host per
          day/hour</strong></span> - the number of job instances run, total runtime
          on each host sorted by day/hour.</p></li>
<li class="listitem"><p><span class="bold"><strong>Invocation statistics by host per
          day/hour</strong></span> - the number of invocations , total runtime on
          each host sorted by day/hour.</p></li>
</ul></div>
<div class="table">
<a name="idp15177584"></a><p class="title"><b>Table 8.4. Invocation statistics by host per day</b></p>
<div class="table-contents"><table summary="Invocation statistics by host per day" border="1">
<colgroup>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th align="center">Date [YYYY-MM-DD]</th>
<th align="center">Host</th>
<th align="center">Count</th>
<th align="center">Runtime (Sec.)</th>
</tr></thead>
<tbody><tr>
<td align="center">2011-07-15</td>
<td align="center">butterfly.isi.edu</td>
<td align="center">54</td>
<td align="center">625.094</td>
</tr></tbody>
</table></div>
</div>
<br class="table-break">
</div>
<div class="section" title="8.2.2. pegasus-plots">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp15186112"></a>8.2.2. pegasus-plots</h3></div></div></div>
<p>Pegasus-plots generates graphs and charts to visualize workflow
      execution. To generate graphs and charts run the command as shown
      below.</p>
<pre class="programlisting">$ <span class="emphasis"><em>pegasus-plots  -p all  /scratch/grid-setup/run0001/</em></span>


...

******************************************** SUMMARY ********************************************

Graphs and charts generated by pegasus-plots can be viewed by opening the generated html file in the web browser  : 
/scratch/grid-setup/run0001/plots/index.html
 
**************************************************************************************************</pre>
<p>By default the output gets generated to plots folder inside the
      submit directory. The output that is generated by pegasus-plots is based
      on the value set for command line option 'p'(plotting_level).In the
      sample run the command line option 'p' is set to 'all' to generate all
      the charts and graphs for the workflow run. Please consult the
      pegasus-plots man page to find a detailed description of various command
      line options.pegasus-plots generates an index.html file which provides
      links to all the generated charts and plots. A sample index.html page is
      show below.</p>
<div class="figure">
<a name="idp15189952"></a><p class="title"><b>Figure 8.1. pegasus-plot index page</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/pegasus_plots_index.png" width="100%" alt="pegasus-plot index page"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>pegasus-plots generates the following plots and charts.</p>
<p><span class="bold"><strong>Dax Graph</strong></span></p>
<p>Graph representation of the DAX file. A sample page is shown
      below.</p>
<div class="figure">
<a name="idp15194544"></a><p class="title"><b>Figure 8.2. DAX Graph</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dax_page.png" width="100%" alt="DAX Graph"></td></tr></table></div></div>
</div>
<br class="figure-break"><p><span class="bold"><strong>Dag Graph</strong></span></p>
<p>Graph representation of the DAG file. A sample page is shown
      below.</p>
<div class="figure">
<a name="idp16216064"></a><p class="title"><b>Figure 8.3. DAG Graph</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dag_page.png" width="100%" alt="DAG Graph"></td></tr></table></div></div>
</div>
<br class="figure-break"><p><span class="bold"><strong>Gantt workflow execution
      chart</strong></span></p>
<p>Gantt chart of the workflow execution run. A sample page is shown
      below.</p>
<div class="figure">
<a name="idp16220240"></a><p class="title"><b>Figure 8.4. Gantt Chart</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/gantt_chart_page.png" width="100%" alt="Gantt Chart"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The toolbar at the top provides zoom in/out , pan
      left/right/top/bottom and show/hide job name functionality.The toolbar
      at the bottom can be used to show/hide job states. Failed job instances
      are shown in red border in the chart. Clicking on a sub workflow job
      instance will take you to the corresponding sub workflow chart.</p>
<p><span class="bold"><strong>Host over time chart</strong></span></p>
<p>Host over time chart of the workflow execution run. A sample page
      is shown below.</p>
<div class="figure">
<a name="idp16225296"></a><p class="title"><b>Figure 8.5. Host over time chart</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/host_chart_page.png" width="100%" alt="Host over time chart"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The toolbar at the top provides zoom in/out , pan
      left/right/top/bottom and show/hide host name functionality.The toolbar
      at the bottom can be used to show/hide job states. Failed job instances
      are shown in red border in the chart. Clicking on a sub workflow job
      instance will take you to the corresponding sub workflow chart.</p>
<p><span class="bold"><strong>Time chart</strong></span></p>
<p>Time chart shows job instance/invocation count and runtime of the
      workflow run over time. A sample page is shown below.</p>
<div class="figure">
<a name="idp16230400"></a><p class="title"><b>Figure 8.6. Time chart</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/time_chart_page.png" width="100%" alt="Time chart"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The toolbar at the top provides zoom in/out and pan
      left/right/top/bottom functionality. The toolbar at the bottom can be
      used to switch between job instances/ invocations and day/hour
      filtering.</p>
<p><span class="bold"><strong>Breakdown chart</strong></span></p>
<p>Breakdown chart shows invocation count and runtime of the workflow
      run grouped by transformation name. A sample page is shown below.</p>
<div class="figure">
<a name="idp16235376"></a><p class="title"><b>Figure 8.7. Breakdown chart</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/breakdown_chart_page.png" width="100%" alt="Breakdown chart"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The toolbar at the bottom can be used to switch between invocation
      count and runtime filtering. Legends can be clicked to get more
      details.</p>
</div>
</div>
<div class="section" title="8.3. Dashboard">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="idp16239184"></a>8.3. Dashboard</h2></div></div></div>
<div class="toc"><dl><dt><span class="section"><a href="monitoring_debugging_stats.php#idp16240672">8.3.1. Workflow Dashboard</a></span></dt></dl></div>
<p>As the number of jobs and tasks in workflows increase, the ability
    to track the progress and quickly debug a workflow becomes more and more
    important. The dashboard provides users with a tool to monitor and debug
    workflows both in real-time as well as after execution is already
    completed, through a browser.</p>
<div class="section" title="8.3.1. Workflow Dashboard">
<div class="titlepage"><div><div><h3 class="title">
<a name="idp16240672"></a>8.3.1. Workflow Dashboard</h3></div></div></div>
<p>Pegasus Workflow Dashboard is bundled with the Pegasus service
      layer. This is available as a separate project in <a class="ulink" href="https://github.com/pegasus-isi/pegasus-service" target="_top">Github</a>. The
      pegasus-service-server is developed in Python and uses the Flask
      framework to implement the web interface.The users can then connect to
      this server using a browser to monitor/debug workflows.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>the workflow dashboard can only monitor workflows which have
          been executed using Pegasus 4.2.0 and above.</p>
</div>
<p>By default, the server is configured to listen on all network
      interfaces on port 5000. A user can view the dashboard on
      http://&lt;IP_ADDRESS&gt;:5000/</p>
<p>By default, the dashboard server can only monitor workflows run by
      the current user i.e. the user who is running the
      pegasus-service-server.</p>
<p>The Dashboard's home page lists all workflows, which have been run
      by the current-user. The home page shows the status of each of the
      workflow i.e. Running/Successful/Failed. The home page lists only the
      top level workflows (Pegasus supports hierarchical workflows i.e.
      workflows within a workflow). The rows in the table are color
      coded</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>Green</strong></span>: indicates workflow
          finished successfully.</p></li>
<li class="listitem"><p><span class="bold"><strong>Red</strong></span>: indicates workflow
          finished with a failure.</p></li>
<li class="listitem"><p><span class="bold"><strong>Blue</strong></span>: indicates a workflow is
          currently running.</p></li>
</ul></div>
<div class="figure">
<a name="idp16250368"></a><p class="title"><b>Figure 8.8. Dashboard Home Page</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_home.png" width="100%" alt="Dashboard Home Page"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>To view details specific to a workflow, the user can click on
      corresponding workflow label. The workflow details page lists workflow
      specific information like workflow label, workflow status, location of
      the submit directory, etc. The details page also displays pie charts
      showing the distribution of jobs based on status.</p>
<p>In addition, the details page displays a tab listing all
      sub-workflows and their statuses. Additional tabs exist which list
      information for all running, failed, and successful jobs.</p>
<p>The information displayed for a job depends on it's status. For
      example, the failed jobs tab displays the job name, exit code, links to
      available standard output, and standard error contents.</p>
<div class="figure">
<a name="idp16255328"></a><p class="title"><b>Figure 8.9. Dashboard Workflow Page</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_workflow_details.png" width="100%" alt="Dashboard Workflow Page"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>To view details specific to a job the user can click on the
      corresponding job's job label. The job details page lists information
      relevant to a specific job. For example, the page lists information like
      job name, exit code, run time, etc.</p>
<p>The job details page also shows tab's for failed, and successful
      task invocations (Pegasus allows users to group multiple smaller task's
      into a single job i.e. a job may consist of one or more tasks)</p>
<div class="figure">
<a name="idp6627152"></a><p class="title"><b>Figure 8.10. Dashboard Job Description Page</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_job_details.png" width="100%" alt="Dashboard Job Description Page"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The task invocation details page provides task specific
      information like task name, exit code, duration etc. Task details differ
      from job details, as they are more granular in nature.</p>
<div class="figure">
<a name="idp6630704"></a><p class="title"><b>Figure 8.11. Dashboard Invocation Page</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_invocation_details.png" width="100%" alt="Dashboard Invocation Page"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The dashboard also has web pages for workflow statistics and
      workflow charts, which graphically renders information provided by the
      pegasus-statistics and pegasus-plots command respectively.</p>
<p>The Statistics page shows the following statistics.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Workflow level statistics</p></li>
<li class="listitem"><p>Job breakdown statistics</p></li>
<li class="listitem"><p>Job specific statistics</p></li>
</ol></div>
<div class="figure">
<a name="idp6638400"></a><p class="title"><b>Figure 8.12. Dashboard Statistics Page</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_statistics.png" width="100%" alt="Dashboard Statistics Page"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The Charts page shows the following charts.</p>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem"><p>Job Distribution by Count/Time</p></li>
<li class="listitem"><p>Time Chart by Job/Invocation</p></li>
<li class="listitem"><p>Workflow Execution Gantt Chart</p></li>
</ol></div>
<p>The chart below shows the invocation distribution by count or
      time.</p>
<div class="figure">
<a name="idp6645984"></a><p class="title"><b>Figure 8.13. Dashboard Plots - Job Distribution</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_plots_job_dist.png" width="100%" alt="Dashboard Plots - Job Distribution"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The time chart shown below shows the number of jobs/invocations in
      the workflow and their total runtime</p>
<div class="figure">
<a name="idp6649456"></a><p class="title"><b>Figure 8.14. Dashboard Plots - Time Chart</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_plots_time_charts.png" width="100%" alt="Dashboard Plots - Time Chart"></td></tr></table></div></div>
</div>
<br class="figure-break"><p>The workflow gantt chart lays out the execution of the jobs in the
      workflow over time.</p>
<div class="figure">
<a name="idp6652912"></a><p class="title"><b>Figure 8.15. Dashboard Plots - Workflow Gantt Chart</b></p>
<div class="figure-contents"><div class="mediaobject"><table border="0" summary="manufactured viewport for HTML img" cellspacing="0" cellpadding="0" width="100%"><tr><td><img src="images/dashboard_plots_wf_gantt.png" width="100%" alt="Dashboard Plots - Workflow Gantt Chart"></td></tr></table></div></div>
</div>
<br class="figure-break">
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="submit_directory.php">Prev</a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="example_workflows.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 7. Submit Directory Details </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> Chapter 9. Example Workflows</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
