Stat Aggregation : Stat PDI Jobs/Transformations - Example Configuration
  

Stat PDI Jobs/Transformations - Example Configuration

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<kettle-jobs>
<job filename="qmatic/visit-transaction.ktr" max-events-before-processing="10000" batch-limit="10000" max-time-before-processing="10" >
<event name="VISIT_END" />
<event name="VISIT_END_TRANSACTION" />
<event name="VISIT_REMOVE" />
<event name="END_SERVICE" />
<event name="VISIT_TRANSFER_FROM_QUEUE" />
<event name="VISIT_NOSHOW" />
</job>
<job filename="qmatic/service-point-session.ktr" max-events-before-processing="10000" batch-limit="10000" max-time-before-processing="10" >
<event name="SERVICE_POINT_CLOSE" />
</job>
<job filename="qmatic/staff.ktr" max-events-before-processing="10000" batch-limit="10000" max-time-before-processing="10" >
<event name="USER_SESSION_END" />
</job>
</kettle-jobs>
 
The root element <kettle-jobs> contains all configurations. This element contains a list of <job> elements that define which type of jobs and transformations are configured in the system. Job definitions can be either kettle jobs (.kjb files) or kettle transformations (.ktr files). Each job definition also contains settings for how often the job is supposed to run, and optionally a list of <event> sub-elements which define which events that trigger the particular job.
The transformation file to process when the event arrives is configured with attribute filename that expects the file to exist relative to Orchestra’s conf/stat-jobs folder. Here filename is qmatic/visit-transaction.ktr which means the file must be in folder conf/stat-jobs/qmatic. If any configured file is missing at Orchestra start up, stat will fail with an error message.
Attribute max-events-before-processing controls how many events should pass through the system before the job is executed. If this is set to 1, or left out entirely, the job will execute for every event (this is not advisable for events with a high throughput). All events defined for the job are counted towards this max-events-before-processing setting.
The attribute max-time-before-processing controls the maximum amount of time the system should wait before triggering the job execution.
The job will be executed when the configured (max-events-before-processing setting) number of events have passed through the system, or when the configured time (max-time-before-processing) has passed. Each time the job is triggered, the counter for max-events-before-processing is reset.
The setting batch-size is a setting that will be sent to the job as a parameter, which will indicate to the job how many events it should process in one execution.

Qmatic and Custom jobs/transformations

There are two separate folders in the conf/stat-jobs directory. These are qmatic and custom. The qmatic folder contains all internal Orchestra jobs/transformations that are required by Stat, to get correct system functionality.
These (or their configuration) must not be altered in any way.
Any custom requirements are met by creating custom jobs/transformations that shall be added to the custom directory. Custom jobs/transformations are referenced in stat-jobs.xml by prepending the job filename with the “custom/” folder.
Example:
<job filename=”custom/test.ktr”/>
 

Processing events

The following example illustrates how event and job/transformation processing is handled:
<kettle-jobs>
<job filename="qmatic/visit-transaction.ktr" max-events-before-processing="10" batch-limit="1+" max-time-before-processing="10" >
<event name="VISIT_END" />
<event name="VISIT_END_TRANSACTION" />
</job>
<job filename="custom/myJob.ktr" max-events-before-processing="1" max-time-before-processing="10" >
<event name="VISIT_NOSHOW" />
</job>
<job filename="custom/myOtherJob.ktr" max-time-before-processing="10" />
</kettle-jobs>
 
In this example, every time a visit is completed (VISIT_END) or a multi-service-visit has a service completed (VISIT_END_TRANSACTION), the counter for the visit-transaction.ktr is incremented. When this counter reaches 10, (configured by max-events-before-processing) the job will be executed.
If no (or fewer than 10) events are processed in a 10 minute interval (configured by max-time-before-processing), the job will be executed anyway. Each time the job executes the counter and timer is reset.
Here is a timing example:
00:00 – system initializes – timer is started for job to run at 00:10.
00:08 – 10 events have been processed, job is started. Counter is reset to 0 and timer is reset to start at 00:18.
00:18 – 5 events have been processed. Job is started by timer, counter is reset to 0 and timer is reset to start at 00:28.
 
The job custom/myJob.ktr in this example has max-events-before-processing set to 1, meaning it will trigger on all events with name VISIT_NOSHOW. It is not recommended to use this setup with an event that is very common, as it could severely impact performance.
The job custom/myOtherJob.ktr does not have any events configured for it, meaning the job will be run on schedule only. In this configuration, the job will run once every 10 minutes. This type of configuration is suitable when no reasonable event exists that could trigger the job, and certain calculations / actions need to be done periodically.
Jobs that fail due to errors in the job/transformation itself are logged but not processed further. They will be stored in the failed_jobs table in the database. For more information, see Stat REST API in the connector SDK for the rest endpoint to replay failed aggregations. Also, see “Failed jobs” .

Filtering on event type

Certain events are common across multiple domains, an example of this is the CONNECT event, which is sent both when Queue Agents connect and when devices (printers) and device controllers (gw1745) connect.
In order to be able to separate these events, one can use the event-type attribute to separate between these different types of event domains.
The defined event-types are:
AGENT_DAEMON
AGENT_JIQL
DEVICE_CONTROLLER
DEVICE
STAFF
SERVICE_POINT
VISIT
APPOINTMENT
An example configuration could be:
 
<event name="CONNECT">
<job filename="custom/device-connect.ktr" event-type="DEVICE" />
<job filename="custom/agent-connect.ktr" event-type="AGENT_JIQL" />
</event>
 
The different event types are used in the following scenarios:
DEVICE: Printer connect and disconnect events. Printer status events (e.g. OUT_OF_PAPER)
DEVICE_CONTROLLER: Device controller (gw1745) connect and disconnect events
AGENT_JIQL: Publish events. End-of-day events sent from Queue Agents. Branch events sent from central (at e.g. Branch update or delete). Queue Agent connect / disconnect events.
AGENT_DAEMON: End-of-day events sent from central. Retire events sent from central (when Queue Agent is deleted). Daemon connect / disconnect events.
STAFF: Staff events (session start / end etc).
SERVICE_POINT: Service point events (service point open / close).
VISIT: Visit events (create, call, end etc).
APPOINTMENT: Appointment events sent from central (create, update, cancel).

Transaction validation

The transaction count validation performed before running a job/transformation whenever the attribute validate-transaction-count-before-execution is set to true does the following:
1. Count the number of entries in fact_visit_transaction for the Staff/Service Point transaction in question.
2. Count the number of entries in failed_events for the same transaction.
3. If the sum of these two totals are the same as the number of transactions that the Staff/Service Point has served during the session, then the validation will succeed.
If the validation fails (e.g. an event belonging to the transaction has not yet arrived to Stat), the following steps will occur:
1. The job will be rescheduled for execution 10 seconds later (default value, can be configured).
2. The job will be rescheduled as long as the validation fails, up to 5 times before giving up (default value, can be configured).
3. If the validation has failed the maximum number of times, an entry will be added to the failed_jobs table.
The following stat-jobs.xml entry gives an example on how to reconfigure the rescheduling values mentioned above:
<event name="USER_SESSION_END">
<job filename="qmatic/test_user_end_session.ktr" validate-transaction-count-before-execution="true" maximum-validation-failure-retry-count="1" validation-failure-retry-interval-seconds="20"/>
</event>
 

Failed events

To list events that have failed for some reason:
GET <host:port>/stat/rest/events/listFailedEvents
To process failed events (using the result of the above REST call as input):
POST <host:port>/stat/rest/events/processFailedEvents

Failed jobs

To return a list of failed jobs:
GET <host:port>/stat/rest/kettleJobs/failed
To take a list of failed jobs as input to replay - this command is preferrably combined with the above REST call:
POST <host:port>/stat/rest/kettleJobs/replay
 
Example of how this can be used:
GET localhost:8080/stat/rest/kettleJobs/failed
Answer: [1,3,4]
POST localhost:8080/stat/rest/kettleJobs/replay
Body: [1,3,4]
Outcome: Will try to replay jobs 1,3 and 4.

Failover Behaviour

If a message cannot be processed, it will get saved to the failed_events table, with some information that enables the message to be re-played in the future.
In the case of a serious system error, e.g. the database cannot be reached, the message will not be able to be saved to the failed_events table.
In this scenario, the following will happen:
The message listener will stop processing messages for a while (default 5 minutes).
The failed message will be put back on the queue and will be processed later.
In the case that the event(s) in the message has been successfully saved to the database, and only the aggregation is missing, the message will only be re-sent 10 times.
After 10 times, the message will end up in a "dead letter" queue, e.g. statVisitEventQueueDeadLetterQueue for visit events.
Messages that have not had their event(s) saved to the relevant event table(s) will not be put on the dead letter queue, however if there are multiple failures where a new instance of the message cannot be re-sent to the queue the message might end up in the dead letter queue eventually.
Settings to control the behaviour are defined in stat.conf, except for the limit of 10 consecutive retries, which is in standalone-full.xml (as it is a hornetq setting).