<?php  
            include_once( $_SERVER['DOCUMENT_ROOT']."/static/includes/common.inc.php" );
            do_html_header("Documentation");
        ?><div id="content">
<div class="navheader">
<table width="100%" summary="Navigation header"><tr>
<td width="20%" align="left">
<a accesskey="p" href="notifications.php">Prev</a> </td>
<td width="60%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="20%" align="right"> <a accesskey="n" href="execution_environments.php">Next</a>
</td>
</tr></table>
<hr>
</div>
<div class="section" title="6.5. Monitoring Database">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="monitoring"></a>6.5. Monitoring Database</h2></div></div></div>
<div class="toc"><dl>
<dt><span class="section"><a href="monitoring.php#monitoring_pegasus-monitord">6.5.1. pegasus-monitord</a></span></dt>
<dt><span class="section"><a href="monitoring.php#stampede_schema_overview">6.5.2. Overview of the Workflow Database Schema.</a></span></dt>
</dl></div>
<p>Pegasus launches a monitoring daemon called pegasus-monitord per
    workflow ( a single daemon is launched if a user submits a hierarchal
    workflow ) . pegasus-monitord parses the workflow and job logs in the
    submit directory and populates to a database. This chapter gives an
    overview of the pegasus-monitord and describes the schema of the runtime
    database.</p>
<div class="section" title="6.5.1. pegasus-monitord">
<div class="titlepage"><div><div><h3 class="title">
<a name="monitoring_pegasus-monitord"></a>6.5.1. pegasus-monitord</h3></div></div></div>
<p><span class="bold"><strong>Pegasus-monitord</strong></span> is used to
      follow workflows, parsing the output of DAGMan's dagman.out file. In
      addition to generating the jobstate.log file, which contains the various
      states that a job goes through during the workflow execution, <span class="bold"><strong>pegasus-monitord</strong></span> can also be used to mine
      information from jobs' submit and output files, and either populate a
      database, or write a file with NetLogger events containing this
      information. <span class="bold"><strong>Pegasus-monitord</strong></span> can also
      send notifications to users in real-time as it parses the workflow
      execution logs.</p>
<p><span class="bold"><strong>Pegasus-monitord</strong></span> is automatically
      invoked by <span class="bold"><strong>pegasus-run</strong></span>, and tracks
      workflows in real-time. By default, it produces the jobstate.log file,
      and a SQLite database, which contains all the information listed in the
      <a class="link" href="monitoring.php#stampede-schema">Stampede schema</a>. When a workflow
      fails, and is re-submitted with a rescue DAG, <span class="bold"><strong>pegasus-monitord</strong></span> will automatically pick up from
      where it left previously and continue to write the jobstate.log file and
      populate the database.</p>
<p>If, after the workflow has already finished, users need to
      re-create the jobstate.log file, or re-populate the database from
      scratch, <span class="bold"><strong>pegasus-monitord</strong></span>'s <span class="bold"><strong>--replay</strong></span> option should be used when running it
      manually.</p>
<div class="section" title="6.5.1.1. Populating to different backend databases">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp43110944"></a>6.5.1.1. Populating to different backend databases</h4></div></div></div>
<p>In addition to SQLite, <span class="bold"><strong>pegasus-monitord</strong></span> supports other types of
        databases, such as MySQL and Postgres. Users will need to install the
        low-level database drivers, and can use the <span class="bold"><strong>--dest</strong></span> command-line option, or the <span class="bold"><strong>pegasus.monitord.output</strong></span> property to select
        where the logs should go.</p>
<p>As an example, the command:</p>
<pre class="programlisting">$ pegasus-monitord -r diamond-0.dag.dagman.out</pre>
<p>will launch <span class="bold"><strong>pegasus-monitord</strong></span> in
        replay mode. In this case, if a jobstate.log file already exists, it
        will be rotated and a new file will be created. It will also
        create/use a SQLite database in the workflow's run directory, with the
        name of diamond-0.stampede.db. If the database already exists, it will
        make sure to remove any references to the current workflow before it
        populates the database. In this case, <span class="bold"><strong>pegasus-monitord</strong></span> will process the workflow
        information from start to finish, including any restarts that may have
        happened.</p>
<p>Users can specify an alternative database for the events, as
        illustrated by the following examples:</p>
<pre class="programlisting">$ pegasus-monitord -r -d mysql://username:userpass@hostname/database_name diamond-0.dag.dagman.out</pre>
<pre class="programlisting">$ pegasus-monitord -r -d sqlite:////tmp/diamond-0.db diamond-0.dag.dagman.out</pre>
<p>In the first example, <span class="bold"><strong>pegasus-monitord</strong></span> will send the data to the
        <span class="bold"><strong>database_name</strong></span> database located at
        server <span class="bold"><strong>hostname</strong></span>, using the <span class="bold"><strong>username</strong></span> and <span class="bold"><strong>userpass</strong></span> provided. In the second example,
        <span class="bold"><strong>pegasus-monitord</strong></span> will store the data
        in the /tmp/diamond-0.db SQLite database.</p>
<div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>For absolute paths four slashes are required when specifying
          an alternative database path in SQLite.</p>
</div>
<p>Users should also be aware that in all cases, with the exception
        of SQLite, the database should exist before <span class="bold"><strong>pegasus-monitord</strong></span> is run (as it creates all
        needed tables but does not create the database itself).</p>
<p>Finally, the following example:</p>
<pre class="programlisting">$ pegasus-monitord -r --dest diamond-0.bp diamond-0.dag.dagman.out</pre>
<p>sends events to the diamond-0.bp file. (please note that in
        replay mode, any data on the file will be overwritten).</p>
<p>One important detail is that while processing a workflow,
        <span class="bold"><strong>pegasus-monitord</strong></span> will automatically
        detect if/when sub-workflows are initiated, and will automatically
        track those sub-workflows as well. In this case, although <span class="bold"><strong>pegasus-monitord</strong></span> will create a separate
        jobstate.log file in each workflow directory, the database at the
        top-level workflow will contain the information from not only the main
        workflow, but also from all sub-workflows.</p>
</div>
<div class="section" title="6.5.1.2. Monitoring related files in the workflow directory">
<div class="titlepage"><div><div><h4 class="title">
<a name="monitoring-files"></a>6.5.1.2. Monitoring related files in the workflow directory</h4></div></div></div>
<p><span class="bold"><strong>Pegasus-monitord</strong></span> generates a
        number of files in each workflow directory:</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="bold"><strong>jobstate.log</strong></span>: contains a
            summary of workflow and job execution.</p></li></ul></div>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><span class="bold"><strong>monitord.log</strong></span>: contains any
            log messages generated by <span class="bold"><strong>pegasus-monitord</strong></span>. It is not overwritten
            when it restarts. This file is not generated in replay mode, as
            all log messages from <span class="bold"><strong>pegasus-monitord</strong></span> are output to the console.
            Also, when sub-workflows are involved, only the top-level workflow
            will have this log file. Starting with release 4.0 and 3.1.1,
            monitord.log file is rotated if it exists already.</p></li>
<li class="listitem"><p><span class="bold"><strong>monitord.started</strong></span>: contains
            a timestamp indicating when <span class="bold"><strong>pegasus-monitord</strong></span> was started. This file get
            overwritten every time <span class="bold"><strong>pegasus-monitord</strong></span> starts.</p></li>
<li class="listitem"><p><span class="bold"><strong>monitord.done</strong></span>: contains a
            timestamp indicating when <span class="bold"><strong>pegasus-monitord</strong></span> finished. This file is
            overwritten every time <span class="bold"><strong>pegasus-monitord</strong></span> starts.</p></li>
<li class="listitem"><p><span class="bold"><strong>monitord.info</strong></span>: contains
            <span class="bold"><strong>pegasus-monitord</strong></span> state
            information, which allows it to resume processing if a workflow
            does not finish properly and a rescue dag is submitted. This file
            is erased when <span class="bold"><strong>pegasus-monitord</strong></span>
            is executed in replay mode.</p></li>
<li class="listitem"><p><span class="bold"><strong>monitord.recover</strong></span>: contains
            <span class="bold"><strong>pegasus-monitord</strong></span> state
            information that allows it to detect that a previous instance of
            <span class="bold"><strong>pegasus-monitord</strong></span> failed (or was
            killed) midway through parsing a workflow's execution logs. This
            file is only present while <span class="bold"><strong>pegasus-monitord</strong></span> is running, as it is
            deleted when it ends and the <span class="bold"><strong>monitord.info</strong></span> file is generated.</p></li>
<li class="listitem"><p><span class="bold"><strong>monitord.subwf.db</strong></span>: contains
            information that aids <span class="bold"><strong>pegasus-monitord</strong></span> to track when
            sub-workflows fail and are re-planned/re-tried. It is overwritten
            when <span class="bold"><strong>pegasus-monitord</strong></span> is started
            in replay mode.</p></li>
<li class="listitem"><p><span class="bold"><strong>monitord-notifications.log</strong></span>:
            contains the log file for notification-related messages. Normally,
            this file only includes logs for failed notifications, but can be
            populated with all notification information when <span class="bold"><strong>pegasus-monitord</strong></span> is run in verbose mode via
            the <span class="bold"><strong>-v</strong></span> command-line
            option.</p></li>
</ul></div>
</div>
</div>
<div class="section" title="6.5.2. Overview of the Workflow Database Schema.">
<div class="titlepage"><div><div><h3 class="title">
<a name="stampede_schema_overview"></a>6.5.2. Overview of the Workflow Database Schema.</h3></div></div></div>
<p>Pegasus takes in a DAX which is composed of tasks. Pegasus plans
      it into a Condor DAG / Executable workflow that consists of Jobs. In
      case of Clustering, multiple tasks in the DAX can be captured into a
      single job in the Executable workflow. When DAGMan executes a job, a job
      instance is populated . Job instances capture information as seen by
      DAGMan. In case DAGMan retires a job on detecting a failure , a new job
      instance is populated. When DAGMan finds a job instance has finished ,
      an invocation is associated with job instance. In case of clustered job,
      multiple invocations will be associated with a single job instance. If a
      Pre script or Post Script is associated with a job instance, then
      invocations are populated in the database for the corresponding job
      instance.</p>
<p>The current schema version is <span class="bold"><strong>4.0</strong></span>
      that is stored in the schema_info table.</p>
<div class="figure">
<a name="idp43162624"></a><p class="title"><b>Figure 6.16. Workflow Database Schema</b></p>
<div class="figure-contents"><div class="mediaobject"><img src="images/stampede-schema-small.png" alt="Workflow Database Schema"></div></div>
</div>
<br class="figure-break"><div class="section" title="6.5.2.1. Stampede Schema Upgrade Tool">
<div class="titlepage"><div><div><h4 class="title">
<a name="schema_upgrade_tool"></a>6.5.2.1. Stampede Schema Upgrade Tool</h4></div></div></div>
<p>Starting Pegasus 4.x the monitoring and statistics database
        schema has changed. If you want to use the pegasus-statistics,
        pegasus-analyzer and pegasus-plots against a 3.x database you will
        need to upgrade the schema first using the schema upgrade tool
        /usr/share/pegasus/sql/schema_tool.py or
        /path/to/pegasus-4.x/share/pegasus/sql/schema_tool.py</p>
<p>Upgrading the schema is required for people using the MySQL
        database for storing their monitoring information if it was setup with
        3.x monitoring tools.</p>
<p>If your setup uses the default SQLite database then the new
        databases run with Pegasus 4.x are automatically created with the
        correct schema. In this case you only need to upgrade the SQLite
        database from older runs if you wish to query them with the newer
        clients.</p>
<p>To upgrade the database</p>
<pre class="programlisting">For SQLite Database

<span class="bold"><strong>cd /to/the/workflow/directory/with/3.x.monitord.db</strong></span>

Check the db version<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -c connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db</strong></span>
2012-02-29T01:29:43.330476Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.init |
2012-02-29T01:29:43.330708Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start |
2012-02-29T01:29:43.348995Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema
                                   | Current version set to: 3.1.
2012-02-29T01:29:43.349133Z ERROR  netlogger.analysis.schema.schema_check.SchemaCheck.check_schema
                                   | Schema version 3.1 found - expecting 4.0 - database admin will need to run upgrade tool.


Convert the Database to be version 4.x compliant<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -u connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db
</strong></span>2012-02-29T01:35:35.046317Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.init |
2012-02-29T01:35:35.046554Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start |
2012-02-29T01:35:35.064762Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema
                                  | Current version set to: 3.1.
2012-02-29T01:35:35.064902Z ERROR  netlogger.analysis.schema.schema_check.SchemaCheck.check_schema
                                  | Schema version 3.1 found - expecting 4.0 - database admin will need to run upgrade tool.
2012-02-29T01:35:35.065001Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.upgrade_to_4_0
                                  | Upgrading to schema version 4.0.

Verify if the database has been converted to Version 4.x<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -c connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db</strong></span>
2012-02-29T01:39:17.218902Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.init |
2012-02-29T01:39:17.219141Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start |
2012-02-29T01:39:17.237492Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Current version set to: 4.0.
2012-02-29T01:39:17.237624Z INFO   netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Schema up to date.

For upgrading a MySQL database the steps remain the same. The only thing that changes is the connection String to the database
E.g.<span class="bold"><strong>

/usr/share/pegasus/sql/schema_tool.py -u connString=mysql://username:password@server:port/dbname

</strong></span></pre>
<p>After the database has been upgraded you can use either 3.x or
        4.x clients to query the database with <span class="bold"><strong>pegasus-statistics</strong></span>, as well as <span class="bold"><strong>pegasus-plots </strong></span>and <span class="bold"><strong>pegasus-analyzer.</strong></span></p>
</div>
<div class="section" title="6.5.2.2. Storing of Exitcode in the database">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp33330480"></a>6.5.2.2. Storing of Exitcode in the database</h4></div></div></div>
<p>Kickstart records capture raw status in addition to the exitcode
        . The exitcode is derived from the raw status. Starting with Pegasus
        4.0 release, all exitcode columns ( i.e invocation and job instance
        table columns ) are stored with the raw status by pegasus-monitord. If
        an exitcode is encountered while parsing the dagman log files , the
        value is converted to the corresponding raw status before it is
        stored. All user tools, pegasus-analyzer and pegasus-statistics then
        convert the raw status to exitcode when retrieving from the
        database.</p>
</div>
<div class="section" title="6.5.2.3. Multiplier Factor">
<div class="titlepage"><div><div><h4 class="title">
<a name="idp33332432"></a>6.5.2.3. Multiplier Factor</h4></div></div></div>
<p>Starting with the 4.0 release, there is a multiplier factor
        associated with the jobs in the job_instance table. It defaults to
        one, unless the user associates a Pegasus profile key named <span class="bold"><strong>cores</strong></span> with the job in the DAX. The factor can
        be used for getting more accurate statistics for jobs that run on
        multiple processors/cores or mpi jobs.</p>
<p>The multiplier factor is used for computing the following
        metrics by pegasus statistics.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>In the summary, the workflow cumulative job wall time</p></li>
<li class="listitem"><p>In the summary, the cumulative job wall time as seen from
            the submit side</p></li>
<li class="listitem"><p>In the jobs file, the multiplier factor is listed along-with
            the multiplied kickstart time.</p></li>
<li class="listitem"><p>In the breakdown file, where statistics are listed per
            transformation the mean, min , max and average values take into
            account the multiplier factor.</p></li>
</ul></div>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="notifications.php">Prev</a> </td>
<td width="20%" align="center"><a accesskey="u" href="monitoring_debugging_stats.php">Up</a></td>
<td width="40%" align="right"> <a accesskey="n" href="execution_environments.php">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">6.4. Notifications </td>
<td width="20%" align="center"><a accesskey="h" href="index.php">Table of Contents</a></td>
<td width="40%" align="right" valign="top"> Chapter 7. Execution Environments</td>
</tr>
</table>
</div>
</div><?php  
            do_html_footer();
        ?>
