Configuring RCABot and ML Models > Event Correlation

Event Correlation

Event Correlation is a sub-module that helps customers optimize their time while investigating potential downtimes and failures inside the application. 

The correlation module helps by analyzing a plethora of alert streams from different sources and correlating them by various factors which include data and domain and reducing the noise.

This helps in reducing the false positives to the maximum extent and suppresses the events/ alerts fatigue which greatly helps operators and respective teams to improve the MTTR.

Add Workspace

Click on vuCoreML on the Toggle Menu and navigate to the RCA Workspace section.

The workspaces page shows a list of previously configured Workspaces. Click on the + icon to create a new Workspace.

You can now configure the workspace; the workspace comprises 3 major sections

  1. Basic Details
  2. Event Sources
  3. Settings

Basic Details

Enter the Workspace Name, and Description, and select the Category as Event Correlation.

Click on Save to create the Workspace.

Event Sources

Once Workspace is created, you will be directed to the Event Sources page, where you can add events by selecting the Event Data Model.

  • Select Event Data Model: Choose a Data Model from the drop-down.
  • Enter Description (Optional): Provide an optional description.
  • + Add Events: You can add multiple events by clicking on the + Add Events button
  • Delete: Click on the Delete button to delete an Event Data Model 

Click on Save to move to the next step.

Settings

After successfully configuring Event Sources, you will be directed to the Settings page. 

It has 4 major sections.

  • General Configuration
  • Hyperparameter Configuration
  • First Time Training
  • Schedule

General Configuration

This is the first section and it allows you to configure notification types. It supports Email and WhatsApp notifications.

Email: Enter the Recipient’s email address. Use commas to add multiple IDs. You could also add an Email group to notify a set of people.

WhatsApp: Enter the Recipient’s mobile. Additionally, you can add a WhatsApp group to notify a set of people.

Hyperparameter Configuration

This is the second section of the Settings page. It has 2 main segments – Training and Inference.

  • Training: The training phase learns from data to create and adapt rules based on which events/alerts are to be correlated. The hyperparameters listed here can be tuned and they have a direct effect on the rules that the algorithm creates.

    • Window Length: The length of the window in days within which events will be considered for learning the clusters. Default to 1 day. The training will be done in a scheduled fashion.
    • Overlap Length: the length of the window in days by which to have an overlap between event data for two consecutive days. Default to 0.5 days. Overlap helps to reduce end-of-day cut-off effects.
    • Filter Noisy Nodes: Events from nodes that frequently generate non-meaningful events will be filtered out before clustering and marked as such.
    • Scale Affinity: If true, a 0-1 scaling is applied to the affinity matrix which is internally estimated by the correlation engine. Scaling prioritizes larger cluster formation while sacrificing slight information on graph node closeness. Enable this if you often see smaller non-meaningful correlated events.

  • Inference: The inference phase utilizes the rule created during training to correlate events in real time. The hyperparameters listed here can be tuned and they have a direct effect on the correlated events/alerts that are created.

    • Cluster Confidence Threshold: clustering rules having lesser confidence than the threshold will be deprioritized when generating correlated events. Defaults to 40% which is a good default. Higher confidence can only be achieved when the correlation engine is enhanced with feedback. Hence, setting a high value here may result in low to no correlated events getting created.
    • Detect Noisy Nodes: Select this option to detect nodes that frequently generate non-meaningful events
    • Cluster Noisy Nodes: Select this option to cluster events from nodes that frequently generate non-meaningful events

Note: A new user may choose to leave the default settings unchanged

First Time Training

This is the third section of the Settings page. You must choose the start time and end time of the data that must be utilized to train the algorithm. 

Please select the larger range of data for the first run so that the algorithm is able to learn the rules.

Note: The larger the data training the more the algorithm will take a significant amount of time to learn the rules.

Schedule

This is the last section of the settings page. The event correlation algorithm runs in a scheduled fashion. You can use this page to adjust how frequently training and inference jobs must run.

Save

Click on the Save button to complete the Event Correlation configuration.

Alert Console

The resultant correlated and suppressed alerts/events will be visible on the vuSmartMaps’ Alerts Console.

Further Reading:

  1. RCA Bot
  2. Offline RCA Bot
  3. Time Series Analysis

Resources

Browse through our resources to learn how you can accelerate digital transformation within your organisation.

Unveiling our all powerful IBMB Observability ExperienceCenter. Click Here