DocsData Ingestion and Processing > ContextStreams

Getting Started with ContextStreams

Introduction

ContextStream, also sometimes referred to as Data Streams, are an integral part of the data flow process within the vuSmartMaps™ platform. ContextStream is a comprehensive data processing engine that receives the data, processes, enriches, massages, and transforms the data, and then writes it into a data store for the end user’s consumption. It is highly scalable, capable of handling multi-million records, and engineered to ensure there are no single points of failure and no data loss.

Data pipelines are constructed with multiple data transformation plugins such as session plugin, data manipulation, ISO-8583, etc. These pipelines are highly versatile, enabling users to transform and enrich data in various ways to suit their specific needs. The primary function of Data pipelines is to read data from an input stream, process it, and then send it to another Output Stream/database.

Prerequisites

Ensure data acquisition is functioning correctly and data is being ingested into vuSmartMaps from the target system. This may require verifying that the respective O11ySource is configured correctly.

Understanding ContextStreams

ContextStreams: Transforming Data for Comprehensive Observability

ContextStreams within vuSmartMaps serve as sophisticated orchestrators that not only transport raw data but also transform it with rich context, making it meaningful for the observability journey. The process of contextualization involves the enhancement of raw data by adding metadata, categorizing information, and ensuring it aligns with the specific requirements of observability.

Why Contextualization Matters

Raw data, in its unstructured form, often lacks the necessary context for effective analysis. Contextualization bridges this gap by adding layers of information that provide insights into the who, what, when, where, and why of each data point. For example, associating a transaction with a specific user, timestamp, and geographic location. By contextualizing data, organizations can unlock a deeper understanding of their systems, transactions, and user interactions.

The Journey of Raw Data through ContextStreams

  1. Ingestion: The journey begins with the ingestion of raw data from diverse sources, such as network devices, servers, applications, and more. This data may vary widely in formats, structures, and levels of granularity.
  2. Transformation: As data enters the ContextStream, parsing mechanisms break down complex structures, and standardization processes ensure uniformity. This step is crucial for harmonizing diverse data formats.
  3. Enrichment: Enrichment techniques come into play, adding contextual information to the data. This can include augmenting data with details about the source, applying geolocation information, or correlating data with external databases.
  4. Categorization: ContextStreams categorize data based on predefined rules or machine learning algorithms. This step helps in organizing data into logical groups, simplifying subsequent analysis.
  5. Routing: Once enriched and categorized, data is intelligently routed to the appropriate storage indices/tables, ensuring that it is accessible for specific use cases, whether it be performance observability, security analysis, or business intelligence.
  6. Storage Indices: The processed and enriched data is stored in data storage indices through data connectors. It enables seamless retrieval and analysis for actionable insights on the consumption layer with visualizations, storyboards, alerts, and reports.

Examples of Contextualization

  1. Transaction Context: For a banking application, ContextStreams may enrich transaction logs with details such as user identities, transaction types, and timestamps, providing a holistic view of each transaction’s journey.
  2. Geospatial Context: In a network observability scenario, IP addresses can be enriched with geospatial information, enabling organizations to visualize network traffic patterns across regions.
  3. Service Correlation: For a microservices architecture, ContextStreams may correlate logs from different services, providing a cohesive narrative of complex transactions spanning multiple components.

ContextStreams Features and Capabilities

Components of ContextStreams

The data is collected from the target system through Observability Sources. The data then undergoes a significant transformation during the data processing phase, thanks to ContextStreams. This phase occurs in the distinct sections: I/O Streams, Data Pipeline, and DataStore Connectors, with each having a unique role in processing data. Additionally, the  Flows tab gives a display of the data processing journey.

  1. I/O Streams: The data ingested from the customer environment enters the vuSmartMaps platform through an input stream within the I/O Streams. This initial step is vital for preparing the data, and setting the foundation for subsequent transformations. After processing within the I/O Streams and data pipeline, the structured data seamlessly flows into the output stream, ready for further utilization in the platform.
  2. Data Pipeline: Subsequently, the processed data from the input Streams flows through a series of transformations within the Data Pipeline. This phase is pivotal for data processing, refining raw data into a structured and meaningful format. After processing in the Data pipeline, the data is directed to an output stream for further utilization.
  3. DataStore Connectors: The role of DataStore Connectors is to facilitate the transfer of processed data to designated storage destinations. These connectors play a crucial role in directing data to various storage solutions such as Elasticsearch, or the  Hyperscale Data Store.
  4. Flows: The Flows section visually represents the entire data flow journey, acting as a valuable reference to understand how data moves from the Input Stream through the Data Pipeline and DataStore Connectors to its ultimate storage destination. This visual representation is an essential feature for system observability, providing insights into the flow and transformation of data within the platform.

To visualize the journey of data through ContextStreams, consider the following diagrammatic overview:

This diagram provides a visual narrative of how ContextStreams orchestrate the entire journey of raw data, transforming it into a valuable asset for comprehensive observability. Now Let’s look into the live data pipeline, showcasing a real-world example of the contextualization of raw logs from a typical Internet and mobile banking application through a sophisticated ensemble of data adapters and parsers.

Debugging Feature: I/O Streams offer a “Capture” option, enabling users to verify and download streamed data, facilitating quick debugging. Data pipelines provide three debug options — For Published pipeline, Draft pipeline, and at the Block level—allowing users to inspect the output at different stages, aiding in effective debugging and diagnosis of data processing workflows.

In the subsequent sections of this user guide, we will explore ContextStreams in more detail, including their configuration, key functionalities, and how they contribute to an enhanced data flow experience.

Supported Data Transformation Plugins

In the data transformation process within vuSmartMaps, a diverse set of plugins empowers users to shape and enrich the data with Contextstreams. These plugins play a pivotal role in tailoring the data flow to specific needs. Explore the following supported plugins to understand their functionalities and discover how they can enhance your data processing experience.

  1. Aggregation Plugin: Aggregates real-time streaming data based on a set of bucketing fields, enabling dynamic data summarization.
  2. Arithmetic Plugin: Used to find the difference between values of successive events.
  3. C24 Plugin: Parses logs from specific formats (RuPay/ IMPS formatter/ SHC/ L7LB) and extracts required information for further processing.
  4. Correlate Plugin: Correlates two independent sources based on specified criteria, facilitating the correlation of related data.
  5. Data Enrichment Plugin: Enriches the incoming stream using static values, enhancing the data with additional context.
  6. Data Manipulate Plugin: Enables manipulation of the incoming stream using Java/JS functions, allowing for dynamic transformations.
  7. Date Time Plugin: Casts, converts, or updates date-time values based on ISO or custom formats and timezones, ensuring consistent date-time handling.
  8. Elapsed Plugin: Tracks the duration of an event, providing insights into the time taken for specific processes.
  9. Event Clone Plugin: Clones events, providing duplication for specific use cases such as parallel processing or backup scenarios.
  10. Event Split Plugin: Splits an event into multiple events based on a field present in the configuration, facilitating event segmentation.
  11. Filter Plugin: Filters the incoming stream based on a valid Java-based condition, allowing users to focus on specific data subsets.
  12. GeoIP Plugin: Adds information about the geographical location of IP addresses, enriching data with location-based insights.
  13. GSub Plugin: Replaces all occurrences of a string in another string based on the input Java regex pattern, aiding in string manipulation.
  14. Grok Plugin: Parses unstructured log data into a structured and queryable format, facilitating efficient log analysis.
  15. ISO 8583 Plugin: Parses an ISO message into different formats, supporting interoperability with ISO 8583-based systems.
  16. KV Plugin: Extracts key-value pairs from a message (or mentioned field) based on a specified delimiter, aiding in data extraction.
  17. Session Plugin: Tracks the progress and status of transactions from logs, providing insights into transactional workflows.
  18. Stream Key Plugin: Changes the Kafka message key of an incoming event, allowing for key-based event processing.
  19. Stream Timestamp Plugin: Parses dates from fields and utilizes them as Kafka timestamps, aiding in time-sensitive event processing.
  20. Truncate Plugin: Allows the truncation of fields longer than a given length, operating on byte values for efficient field length control.
  21. User Agent Plugin: Adds information about the user agent, including name, version, operating system, and device, enhancing user-related data.
  22. XML Plugin: Takes a field containing XML and expands it into an actual data structure, enabling efficient handling of XML data.
  23. Flattening JSON Plugin: Flattens specified or all fields within a JSON message using a customizable delimiter, simplifying nested data structures.

These plugins collectively offer a versatile toolkit for users to shape and transform their ContextStreams according to specific needs and scenarios, enhancing the flexibility and efficiency of data processing within the vuSmartMaps platform. For more specific details on each plugin and its usage, refer to the vuSmartMaps Plugin Documentation.

ContextStreams Workflows

vuSmartMaps provides two distinct methods for data processing to cater to diverse user requirements. The first method involves seamless data contextualization using standard O11ySources. The second approach allows users to create ContextStreams based on their unique requirements.

Automated Data Management with Standard O11ySources

Opting for data ingestion through standard O11ySources provides a seamless and automated data contextualization experience. The system takes care of creating the essential components — I/O streams, data pipelines, and DataStore Connectors — specific to the chosen O11ySource. Enabling the O11ySource triggers the automatic creation of corresponding I/O Streams, initialization of the necessary Data Pipeline, and establishment of required DataStore Connectors. This automated workflow ensures efficient organization, transformation, and secure storage of data, eliminating the need for manual intervention. If required, you can enhance the data processing and contextualization logic by modifying the data pipeline configuration.

Refer to the O11ySources user guide to understand the specific workflow for O11ySources. Detailed instructions for managing ContextStreams can be found in subsequent sections of this user guide.

Hyper-configurable ContextStreams Creation

The second approach involves creating a configurable ContextStream tailored to specific requirements, requiring users to meticulously analyze the entire data flow journey and consider all touchpoints. The creation process involves defining the input stream, designing the data pipeline with necessary intermediate blocks for processing (each containing multiple plugins), and establishing an output stream. Additionally, DataStore Connectors are created to forward processed data to storage. Before initiating this creation process, a thorough analysis is crucial to accurately identify the required blocks. Depending on processing requirements, the design of the pipeline, blocks, and plugins should be ready before moving on to creating the ContextStream on the platform. The steps for creating, debugging, and managing the ContextStreams are discussed in detail in subsequent sections of this user guide.

Accessing ContextStreams UI

The ContextStreams page can be accessed from the platform left navigation menu by navigating to Data IngestionContextStreams.

The ContextStreams landing page will look like this where you can create/configure the I/O streams, data pipelines, or DataStore Connectors with the different options.

The user interface of the ContextStream section is composed of four primary tabs (I/O Streams, Data Pipeline, DataStore Connectors, Flows), each designed to facilitate specific actions and configurations, enhancing your ability to harness the full potential of ContextStream management.

Please refer to the subsequent sections for creating, debugging, and managing the ContextStreams.

Further Reading

FAQs

ContextStreams adds metadata, categorizes information, and enriches raw data, making it more meaningful for analysis. ContextStreams is a powerful data processing engine within the vuSmartMaps platform that ingests, enriches, and transforms raw data for comprehensive observability. It ensures scalable, reliable data handling without loss.

ContextStreams offers several debugging options, including previewing I/O Streams and debugging pipelines at the block, draft, and published pipeline levels.

If you’re not seeing any data flow in your I/O Stream:

  • Check Input Source Configuration: Ensure that your input source is correctly configured with the right parameters and endpoint settings. Verify that the data source is actively sending data to the specified endpoint.
  • Debug Pipeline Blocks: If the above steps don’t resolve the issue, use the debugging features to inspect each block in your pipeline. This can help identify if data is being processed but not passing through a specific block.

Investigate potential bottlenecks in your pipeline configuration. Adjust the number of instances or threads per instance to optimize processing. Check for any heavy processing blocks that may need reconfiguration.

Use multiple I/O Streams to ingest data from different sources and configure a data pipeline to merge and transform this data into a unified output stream.

Maintain a logical order of transformations and validate intermediate results using the Preview feature. Ensure each transformation plugin is correctly configured and tested.

Start by checking the configurations of each stage in your data pipeline. Use the Preview feature to inspect the output at each stage, helping you identify where the unexpected data is introduced.

Verify the configuration of each transformation plugin. Use the Debug feature to test and validate the transformations. Ensure that the input data format matches the expected format for the transformations.

The Aggregation Plugin allows you to aggregate real-time streaming data based on a set of bucketing fields such as transaction type, customer ID, or transaction date. This enables dynamic data summarization and helps you create comprehensive daily reports. By configuring the aggregation plugin to bucket transactions by day, you can easily obtain daily summaries and visualize the aggregated data using Storyboards.

The Elapsed Plugin tracks the duration of events, allowing you to measure the time taken for each transaction from initiation to completion. By setting up the elapsed plugin, you can monitor transaction times in real-time and identify any transactions that exceed the SLA limits. Use Storyboards to visualize the transaction durations and set up alerts for transactions that are taking too long.

The Correlate Plugin allows you to correlate two independent data sources based on specified criteria. By correlating login attempt logs with transaction logs, you can detect patterns such as multiple login attempts followed by large transactions, which may indicate fraud. Configure the correlate plugin to match login attempts with transactions using common fields like user ID or session ID.

Resources

Browse through our resources to learn how you can accelerate digital transformation within your organisation.

Quick Links