Data Enrichment
Introduction
Data enrichment is a powerful process that takes your raw data and makes it even more valuable by adding relevant contextual information. It's like enhancing your data's context and depth.
To make data enrichment work, you need a unique key that exists in both your incoming raw data and the enrichment table, usually created manually. This key serves as the magic link that connects them. For example, it can transform IP addresses into branch names or decode postal codes into geographical information. In these cases, IP addresses/ postal codes will act as unique key. If the unique key present in raw data is also available in the addition source, the raw data is enriched with datasets available in the additional source. If the unique key is not present, raw data is not enriched. If there's no match, the data remains unchanged and isn't enriched. So, having that unique key in both places is essential for success.
Data enrichment is incredibly versatile. It can turn cryptic codes into easily understandable names. By using data enrichment, you'll gain deeper insights and harness your data's full potential. The extra context it provides makes data analysis a breeze. By applying data enrichment, users can gain deeper insights and make better use of their data. The enriched information provides additional context and enhances the understanding and analysis of the data.
Use Case
A real-world example of data enrichment is upgrading geolocation data using an Enrichment Table. By linking a pincode (unique key) with values such as address, city name, and geo-IP, users can supercharge their geolocation data. When you provide a key (i.e. pincode), the enrichment process fetches all the contextual data (i.e. address, city name, and geo-IP). This contextual data further helps in enabling a dynamic geographic map. A sample of geolocation data demonstrating the relationship between pincodes and geographical information is shown in the image below:
Here's how data enrichment is done:
- Data Setup: The pincode, along with its corresponding data (like address, city name, and geo-IP), is stored in an Enrichment Table. Think of it as a lookup table where the pincode is the key, and the other details are the values.
- Easy Enrichment: When the pincode comes into the input stream, the magic of data enrichment happens. It automatically fetches the associated information like address, city name, and GeoIP (latitude and longitude). This enriches your data with meaningful context.
- Map Magic: This enriched data can be used to create dynamic geographic maps in the user interface. Imagine pincode inputs coming from a data source, and the map showing the exact locations.
In essence, the key-value pair system enriches your data by adding contextual details based on the provided key (in this case, the pincode). It's like giving your data a power-up, and it opens up exciting possibilities, like creating interactive maps with ease.
Workflow for Data Enrichment
Performing data enrichment in vuSmartMaps is made simple with a clear step-by-step process. Let's break it down:
- Enrichment Configuration:
Enrichment Configuration is the process where we create a lookup with the keys and values along with their corresponding data stored in it.- Creating Enrichment Table
Begin by establishing a new enrichment table. This is where you'll store the key-value pairs that will enhance your data. Clearly specify the keys (identifiers) and their corresponding values (information) that are relevant to your data. These keys and values act like a schema for your data enrichment. - Updating Data in Enrichment Table
You can update data in the enrichment table through either manual entry or by uploading a spreadsheet. For larger data updates or when you have data stored in a spreadsheet, you can effortlessly upload the spreadsheet.
- Creating Enrichment Table
- Using Enrichment in ContextStream:
After the enrichment configuration, you can incorporate the enrichment feature into the ContextStream. This ensures that the data flowing through the pipeline undergoes the enrichment process.- Creation of Input-Output Streams
To enable the enrichment pipeline, you need to create the I/O streams that act as channels for data flow, to specify input and output streams for the enrichment process. - Creation of Pipeline
The next step involves configuring the pipeline within the ContextStream. This includes defining the transformation details from the input data to the output data and incorporating the enrichment step into the pipeline flow.
- Creation of Input-Output Streams
The visual representation below illustrates the enrichment pipeline. Input data is transformed using configured enrichment settings, and the enriched output data is stored in an output stream. This allows users to enhance their data with additional context and insights for improved analysis and decision-making.
This simplified workflow ensures that you can easily enrich your data, making it more valuable and insightful for your analyses and decision-making processes.
FAQs
How do I set up an enrichment table in vuSmartMaps?
To set up an enrichment table, navigate to Data Ingestion > Data Enrichment. Click the '+' icon to add a new table, specify the keys and values, including their labels, types, and field names, then save the configuration.
Can I update the enrichment table data manually?
How can I import a large dataset into an enrichment table?
Prepare a spreadsheet (Excel file) where:
- Sheet name matches the enrichment table name
- Include an action column with values like
upsert
,delete
, or leave blank - File size ≤ 5 MB
Click the Import button from the Data Enrichment page to upload it.
What happens if the unique key in my raw data is not found in the enrichment table?
If the unique key in your raw data is not found in the enrichment table, the handling depends on your enrichment plugin configuration:
- Default Values: If you have defined default values in the enrichment plugin configuration, these values will be used to enrich the data when the unique key is not found.
- Unchanged Data: If no default values are specified, the raw data will remain unchanged and will not be enriched.
Can I use multiple keys for data enrichment?
Yes, vuSmartMaps supports multi-key enrichment. The data will be enriched only when all key values match the entries in the enrichment table.
What should I do if my spreadsheet exceeds the 5 MB limit for upload?
If your spreadsheet exceeds the 5 MB limit, try splitting the data into multiple smaller spreadsheets and upload them separately.
What are the types of fields that can be used in enrichment tables?
The types of fields include Enum, IP Address, Numeric, and String. Each field type has specific constraints and requirements which can be set during the table creation.
Can I edit the keys and values in an enrichment table after creating it?
You can edit the keys and values using the Edit icon in the Actions column. However, Field Name, Type, Required, and Uppercase cannot be changed once saved.
How do I incorporate data enrichment into my data pipeline?
After configuring the enrichment table, you need to create input and output streams and data pipeline in your ContextStream and incorporate the enrichment plugin into the pipeline configuration.
What should I do if the enriched data does not appear as expected in the output stream?
Ensure that the unique key in your raw data matches the key in the enrichment table. Check the pipeline configuration for the correct setup of the enrichment plugin and verify that the enrichment table is correctly referenced.
How do I troubleshoot issues where the enriched data is incorrect or incomplete?
Check the following:
- Ensure the unique keys in the raw data and enrichment table match correctly.
- Verify that the enrichment table is correctly referenced in the pipeline configuration.
- Review the data format and constraints in the enrichment table.
- Inspect the logs for any errors during the enrichment process.
Can data enrichment be applied to streaming data in real-time?
Yes, data enrichment can be applied to streaming data in real-time by incorporating the enrichment process into the ContextStreams. This ensures that incoming data is enriched on-the-fly, providing immediate contextual information.
How can data enrichment enhance the customer experience in online banking platforms using raw data fields: transaction ID, transaction amount, and merchant code?
For enhancing customer experience, raw data fields might include Transaction ID, Transaction Amount, and Merchant Code. In the enrichment table:
- Key: Transaction ID
- Values: Merchant Details, Transaction Category, Geographical Location of Transaction.
Implementing this, the online banking platform can:
- Provide detailed transaction histories with contextual information, making it easier for users to understand their spending.
- Offer insights and trends based on transaction categories, helping users manage their finances better.
- Alert users about potential fraudulent transactions by providing clear context around each transaction.
How can data enrichment help in monitoring network performance using raw data fields: device ID, timestamp, and CPU utilization?
For monitoring network performance, raw data fields might include Device ID, Timestamp, and CPU Utilization. In the enrichment table:
- Key: Device ID
- Values: Device Type, Location, Manufacturer, Device Specifications
Implementing this, the organization can:
- Easily identify which devices are experiencing high CPU utilization.
- Correlate performance issues with specific device types or locations.
- Optimize network resources by redistributing workloads based on device capabilities and performance data.
How can data enrichment help in diagnosing error codes from network devices using raw data fields: error code, device ID, and timestamp?
The network operations team can diagnose and resolve error codes from various network devices quickly. For diagnosing error codes, raw data fields might include Error Code, Device ID, and Timestamp. In the enrichment table:
- Key: Error Code
- Values: Error Description, Possible Causes, Recommended Actions, Severity Level
Implementing this, the network operations team can:
- Quickly understand what each error code signifies by referencing the error description.
- Identify common causes and recommend actions to resolve the error based on historical data.
- Prioritize responses based on the severity level of the error code.
- Correlate errors with specific devices to determine if certain device types or models are prone to particular issues.
Example: Error Code: ‘404’ can be enriched with the following values:
- Error Description: Not Found
- Possible Causes: Invalid URL, Resource Removed
- Recommended Actions: Check URL, Verify Resource Availability
- Severity Level: Medium