Docs > Configuration > RCA Workspace

RCA Workspace

Introduction

vuRCA Bot is a comprehensive solution designed to simplify incident detection and root cause analysis. It comprises four distinct layers, each contributing to its functionality.

Overview of vuRCA Bot configuration workflow

vuRCA Bot consists of 4 main layers which are loosely connected to complete the setup.



  1. Data Store: In this layer, the user specifies the configuration details and how to get the actual data (time series data/metrics) collected from the various touch points and it is inserted and stored here. This data store could be their internal database or a data lake or cloud in some cases.

    Since the system needs to access the data in real-time, it requires specific permissions to read and access the data stores. These details typically include host info, user credentials, keys, and tokens if the data is in the cloud. These details will be stored in an encrypted file internally.
  2. Data Model: A data model is an abstraction of each golden signal with its dimension, transformations, and associated properties. Users have to define or configure the data model definition which might also include certain domain-level information such as critical thresholds, and upper & lower bounds of that signal as per business requirement, and create a data model on top of this information.

  3. Workspace: At a high level, the Workplace provides a logical separation between various journey graphs. This way it helps provide vuRCA as per the specified information at the workspace level. Some of the key specifics that can be defined in the workspace are as follows.
    • The role of each signal as leading indicators or user experience signals, operational metrics, influencers, etc.
    • Encapsulation of multiple signals into components and applications
    • Relationship and dependencies among signals. The dependencies capture the strength and directions between them.

  4.  Incidents: This layer is the homepage for all the incidents detected by the vuRCA Bot. Incidents across all active workspaces for various applications at the org level will be displayed here.

Prerequisites

The user must have configured Data Models and Data Store on vuSmartMaps.

Workspaces

This step is required for reading the information provided from the Data Model and mapping them as part of an application or business journey.

vuRCA Bot will use this information to detect incidents and provide the probable root cause details for particular outages and incidents.
We have three main features here:

  1. RCA Bot: It performs operations on your data and finds the probable root cause of an incident.
  2. Time Series Analysis: It performs operations on your data and detects anomalies, provides insights on the Anomalies, and helps you with Auto Baselining and finding incident clusters.
  3. Event Correlation: It is a sub-module that helps customers optimize their time while investigating potential downtimes and failures inside the application.

Further Reading

FAQs

The RCA Workspace enables proactive monitoring and analysis of system metrics, allowing Operations Managers to detect anomalies and potential issues before they escalate into outages.

Data Analysts can leverage the Schema to define and categorize metrics based on their significance, such as lead indicators for business impact or operational indicators for system performance.

You can leverage the RCA Workspace to configure schemas, categorize metrics, and create visual representations of business journeys. By analyzing data from multiple sources and exploring relationships between metrics, you can gain deeper insights into system performance and make data-driven decisions.

Incident Response Teams can benefit from the RCA Workspace by gaining timely access to actionable insights through the Incidents module. By correlating data from various sources and applying advanced algorithms, the vuRCABot helps Incident Response Teams detect and resolve system incidents promptly, fostering informed decision-making and proactive issue resolution

Time Series Analysis employs forecasting techniques to identify anomalies in data, enabling proactive issue detection and resolution

The Schema allows you to define metrics and categorize them based on their significance, providing insights into system performance and potential issues

Storyboards offer insights into anomaly scoring, masking, and text insights on anomalies, empowering you to make informed decisions and prioritize actions.

ML Alert Correlation helps operations teams optimize their time by reducing noise and false positives in alert streams. By correlating events from various sources, it streamlines the investigation process, enabling faster identification and resolution of potential downtimes and failures.

By analyzing multiple alert streams and correlating them based on various factors, it reduces alert fatigue and enables engineers to focus on critical issues, thereby enhancing overall system reliability and performance.

ML Alert Correlation empowers system administrators to maintain system uptime by providing a more efficient way to identify and address potential downtimes and failures. By correlating alerts and reducing noise, it enables administrators to proactively manage system health and prevent disruptions, ensuring uninterrupted service delivery.

ML Alert Correlation provides data analysts with valuable insights by correlating alerts from different sources and identifying patterns or trends in alert data. By analyzing correlated alerts, data analysts can uncover hidden insights, identify recurring issues, and optimize system performance, contributing to data-driven decision-making and continuous improvement.

Absolutely! ML Alert Correlation provides business stakeholders with visibility into the impact of incidents on business operations by correlating alerts and identifying critical issues. By analyzing correlated alerts, business stakeholders can assess the severity of incidents, prioritize response efforts, and minimize the impact on business continuity, thereby safeguarding revenue and reputation.

Resources

Browse through our resources to learn how you can accelerate digital transformation within your organisation.

Quick Links