Data Enrichment Techniques

Introduction

Data enrichment is a powerful process that takes your raw data and makes it even more valuable by adding relevant contextual information. It’s like enhancing your data’s context and depth.

To make data enrichment work, you need a unique key that exists in both your incoming raw data and the enrichment table, usually created manually. This key serves as the magic link that connects them.  For example, it can transform IP addresses into branch names or decode postal codes into geographical information. In these cases, IP addresses/ postal codes will act as unique key. If the unique key present in raw data is also available in the addition source, the raw data is enriched with datasets available in the additional source. If the unique key is not present, raw data is not enriched. If there’s no match, the data remains unchanged and isn’t enriched. So, having that unique key in both places is essential for success.

Data enrichment is incredibly versatile. It can turn cryptic codes into easily understandable names. By using data enrichment, you’ll gain deeper insights and harness your data’s full potential. The extra context it provides makes data analysis a breeze. By applying data enrichment, users can gain deeper insights and make better use of their data. The enriched information provides additional context and enhances the understanding and analysis of the data.

Prerequisites

Ensure that ContextStream is configured to implement data enrichment details using plugins in the data pipeline.

Data Enrichment Example

A real-world example of data enrichment is upgrading geolocation data using an Enrichment Table. By linking a pincode (unique key) with values such as address, city name, and geo-IP, users can supercharge their geolocation data. When you provide a Key (i.e. pincode), the enrichment process fetches all the contextual data (i.e.ddress, city name, and geo-IP). This contextual data further helps in enabling a dynamic geographic map. A sample of geolocation data demonstrating the relationship between pincodes and geographical information is shown in the image below:

Here’s how data enrichment is done:

  1. Data Setup: The pincode, along with its corresponding data (like address, city name, and geo-IP), is stored in an Enrichment Table. Think of it as a lookup table where the pincode is the key, and the other details are the values.
  2. Easy Enrichment: When the pincode comes into the input stream, the magic of data enrichment happens. It automatically fetches the associated information like address, city name, and GeoIP (latitude and longitude). This enriches your data with meaningful context.
  3. Map Magic: This enriched data can be used to create dynamic geographic maps in the user interface. Imagine pincode inputs coming from a data source, and the map showing the exact locations.

In essence, the key-value pair system enriches your data by adding contextual details based on the provided key (in this case, the pincode). It’s like giving your data a power-up, and it opens up exciting possibilities, like creating interactive maps with ease.

Workflow for Data Enrichment

Performing data enrichment in vuSmartMaps™ is made simple with a clear step-by-step process. Let’s break it down:

  1. Enrichment Configuration:
    Enrichment Configuration is the process where we create a lookup with the keys and values along with their corresponding data stored in it.
    • Creating Enrichment Table
      Begin by establishing a new enrichment table. This is where you’ll store the key-value pairs that will enhance your data. Clearly specify the keys (identifiers) and their corresponding values (information) that are relevant to your data. These keys and values act like a schema for your data enrichment.
    • Updating Data in Enrichment Table
      You can update data in the enrichment table through either manual entry or by uploading a spreadsheet. For larger data updates or when you have data stored in a spreadsheet, you can effortlessly upload the spreadsheet.
  2. Using Enrichment in ContextStream:
    After the enrichment configuration, you can incorporate the enrichment feature into your data pipeline. This ensures that the data flowing through the pipeline undergoes the enrichment process.
    • Creation of Input-Output Streams:
      To enable the enrichment pipeline, you need to create the data streams that act as channels for data flow, to specify input and output streams for the enrichment process.
    • Creation of Pipeline:
      The next step involves configuring the pipeline within the application. This includes defining the transformation details from the input data to the output data and incorporating the enrichment step into the pipeline flow.

The visual representation below illustrates the enrichment pipeline. Input data is transformed using configured enrichment settings, and the enriched output data is stored in an output stream. This allows users to enhance their data with additional context and insights for improved analysis and decision-making.

This simplified workflow guide ensures that you can easily enrich your data, making it more valuable and insightful for your analyses and decision-making processes.

Further Reading

FAQs

To set up an enrichment table, navigate to Data Ingestion > Data Enrichment. Click the ‘+‘ icon to add a new table, specify the keys and values, and then save it.

Yes, you can update the enrichment table data manually by entering the data one by one through the Enrichment Configuration Details icon. For larger datasets, it is recommended to use the spreadsheet upload feature.

Prepare a spreadsheet with the required keys and values, ensuring the sheet name matches the enrichment table name. Navigate to the Data Enrichment page and use the Import button to upload your spreadsheet.

If the unique key in your raw data is not found in the enrichment table, the handling depends on your enrichment plugin configuration:

  • Default Values: If you have defined default values in the enrichment plugin configuration, these values will be used to enrich the data when the unique key is not found.
  • Unchanged Data: If no default values are specified, the raw data will remain unchanged and will not be enriched.

Yes, vuSmartMaps supports multi-key enrichment. The data will be enriched only when both keys match the entries in the enrichment table.

If your spreadsheet exceeds the 5 MB limit, try splitting the data into multiple smaller spreadsheets and upload them separately.

The types of fields include Enum, IP Address, Numeric, and String. Each field type has specific constraints and requirements which can be set during the table creation.

You can edit the keys and values using the Edit icon in the Actions column. However, certain parameters like “Type” and “Field Name” cannot be changed after initial setup.

Ensure that the unique key in your raw data matches the key in the enrichment table. Check the pipeline configuration for correct setup and verify that the enrichment table is correctly referenced.

Check the following:

  • Ensure the unique keys in the raw data and enrichment table match correctly.
  • Verify that the enrichment table is correctly referenced in the pipeline configuration.
  • Review the data format and constraints in the enrichment table.
  • Inspect the logs for any errors during the enrichment process.

Yes, data enrichment can be applied to streaming data in real-time by incorporating the enrichment process into the ContextStreams. This ensures that incoming data is enriched on-the-fly, providing immediate contextual information.

For enhancing customer experience, raw data fields might include Transaction ID, Transaction Amount, and Merchant Code. In the enrichment table:

  • Key: Transaction ID
  • Values: Merchant Details, Transaction Category, Geographical Location of Transaction.

Implementing this, the online banking platform can:

  • Provide detailed transaction histories with contextual information, making it easier for users to understand their spending.
  • Offer insights and trends based on transaction categories, helping users manage their finances better.
  • Alert users about potential fraudulent transactions by providing clear context around each transaction.

For monitoring network performance, raw data fields might include Device ID, Timestamp, and CPU Utilization. In the enrichment table:

  • Key: Device ID
  • Values: Device Type, Location, Manufacturer, Device Specifications

Implementing this, the organization can:

  • Easily identify which devices are experiencing high CPU utilization.
  • Correlate performance issues with specific device types or locations.
  • Optimize network resources by redistributing workloads based on device capabilities and performance data.

The network operations team can diagnose and resolve error codes from various network devices quickly. For diagnosing error codes, raw data fields might include Error Code, Device ID, and Timestamp. In the enrichment table:

  • Key: Error Code
  • Values: Error Description, Possible Causes, Recommended Actions, Severity Level

Implementing this, the network operations team can:

  • Quickly understand what each error code signifies by referencing the error description.
  • Identify common causes and recommend actions to resolve the error based on historical data.
  • Prioritize responses based on the severity level of the error code.
  • Correlate errors with specific devices to determine if certain device types or models are prone to particular issues.

Example: Error Code: ‘404’ can be enriched with the following values:

  • Error Description: Not Found
  • Possible Causes: Invalid URL, Resource Removed
  • Recommended Actions: Check URL, Verify Resource Availability
  • Severity Level: Medium

Resources

Browse through our resources to learn how you can accelerate digital transformation within your organisation.

Quick Links