Docs > Configuration > Data Model > Working with Data Model
To add a new Data Model, simply click the + icon on the Data Models listing page.
Data Model creation is a 4 step process. These steps are as follows:
You’ll land on the Data Model’s configuration page, where you provide basic information about the data model.
It is a helpful feature that offers users a quick view of the information stored in a Data Store. It provides information about tables or indices, and their schema. This feature assists users in selecting and configuring the correct field in the data model and workspace stages. Manage your data stores directly from the listing page. Check it out here for more details.
This step is where you can prepare your queries to fetch the required data.
The syntax for building and writing queries could be different based on the selected data store. For each data store, you can either build your queries with a query builder or write the raw queries. Here are the tasks you can do in this section:
A model query can be built in two ways:
To start configuring the query, you must first select the Data Store. For more details on managing Data Stores, please see the Data Store section.
💡Note: In the below example, Elasticsearch Data Store is considered.
A data model can have multiple queries based on your needs. For example, if you want to use data from different Data Stores, you can create more than one query and use transformations in the subsequent steps to combine them.
Example: If you use Hyperscale DB for Query 1 and ElasticSearch Data Store for Query 2, simply use a Join transform in step 3 to merge them according to your needs.
Now, you can create a database query using the Query Builder. This tool helps you build basic select queries for various supported databases using Vunet’s Query Language.
In the “Select Table” section, choose the table or index from which you want to retrieve data.
If your data is time-series data, you want the Data Model to return data for the time range selected in your dashboard. In such a case, select the field containing the timestamp corresponding to your time series data here. This dynamically serves the Global Time Filter (Last 15 min, 24 hours, or 7 days, and so on) as input for the given query. If no field is selected, the query will retrieve all the data from the Data Store, regardless of the chosen time range.
You can add the filter to fetch the data from the Data Store. It must follow the proper Elasticsearch Syntax. Please note that this is an optional field and is only visible when using Elasticsearch Data Sources.
It uses a key:value syntax and the following is an example:
(sev:error AND (state:Prog OR state:Comp))
💡Note:
1. If we choose Hyperscale Data Store then Hyperscale Filter Parameters must follow proper SQL syntax.
Example: CPU usage >= 70 AND memory usage >= 70
If the selected Datastore is elasticsearch, we’ll have a few more fields that involve:
2. If we choose Hyperscale Data Store then you can optionally use VQL (VuNet Query Language) Filters. VQL is an easy-to-use text-based query language for filtering data. Use VQL to filter documents based on substring match, value match, where a value for a field exists and similar such simple and compound conditions.
Before using the VQL filter:
Output:
Click on Run Query to view the result.
After using the VQL filter:
Output:
Click on Run Query to view the result.
This is an option to limit the number of rows displayed in the preview.
Click on the Menu button next to the Enable toggle button. Click on Limit the Preview Size toggle button to activate it. Drag the horizontal bar to set the limit. Alternatively, you can enter the number directly.
You can specifically use this option when the Aggregation Query is turned on. When the ‘Aggregation Query’ is turned on, the ‘Number of Rows’ section is disabled.
If not, you can type a number directly in the “Number of Rows” section. The data model will consider whichever number is smaller.
The Parameters section in the Data Modelling Workspace helps users define and filter data effectively based on various field types.
By using string, number, and date-time fields, along with macros for dynamic values, users can create precise filters and customize data views according to specific requirements.
Aggregation in ElasticSearch
Aggregations
Choose an Aggregation from the dropdown based on your requirements.
To add a new Aggregation, click on the + button.
There are different aggregations possible and you can select an aggregation from the dropdown. Please note that all aggregation fetches data based on the time selected in the Global Time Selector. Here is a list of aggregations supported:
a. Min: This allows you to fetch the minimum value of the selected field.
b. Average: Calculates the average value for a given field.
c. Count: Provides the total number of documents in that index.
d. Max: The max aggregation returns the maximum value of a numeric field. Example: Max aggregate for the ‘duration’ field will return the highest value of the ‘duration’ field in all the records.
e. Percentiles: Computes a specific percentile for all the values in a given field. For example, the 10th percentile for ‘timetodetect’ is the value that is greater than the ‘timetodetect’ value of 10% of the transactions.
💡Note: If you choose HyperScale DB, you can also select “Calculation Type” for Percentiles which was otherwise possible in Write Query.
f. Sum: The sum aggregation returns the total sum of a numeric field.
g. Top Hits: Top Hits allows you to get the most recent data for the selected field. For example: Top Hits for a status field of a service will always return the most recent status of the service.
h. Unique Count: Provides a distinct count for selected fields. For example: A unique count of the host field will provide the number of unique hosts available in the selected table/index.
Group By (Optional)
You can group the aggregation in different buckets by using methods like Data Histogram, Histogram, and Term. Click on ‘Add New Group By’ to configure a bucket.
Click on Add Group By to include the Buckets.
Following Buckets supported by Group By.
a. Date Histogram: You can use the “Date Histogram” to group time series data into continuous intervals:
In the Date Histogram, we further have 3 field that needs to be filled to make term aggregation work which involves:
b. Histogram: It is used to group numerical data into nearby intervals based on the frequency of occurrence. The “Interval” represents the size of these intervals in seconds.
c. Terms: Terms are used to group data based on the value of the selected field.
Under “Terms,” there are four fields you need to fill in to make the term aggregation work. These fields include:
Advanced Settings: You can select a specific time zone by enabling Advanced Settings.
For example, if the time timezone configured at the Data Store level is UTC, but for some reason, one particular table/index in that Data Store follows an Asia/Kolkata timezone, then enable Advanced Settings and override the timezone to Asia/Kolkata for that particular query on that table.
💡Note: The display timezone (i.e. the timezone in which data will be displayed in previews, etc.) will always be the local timezone.
Check out how to configure this here.
You can specify a raw database query for any data source if you want to use more advanced source-based queries. To do this, click on Write Query and enter the raw query. For example, a sample SQL query can be used to calculate the top failed transactions.
Here we are selecting span_attributes_http_target from the table click.apm_span. The query counts the Failed transactions.
Script: SELECT span_attributes_http_target as Transaction, count() as Failed FROM click.apm_span WHERE $__timeFilter(“@timestamp”) AND span_attributes_http_target!=” AND status_code>0 and $__dynamicFilter(‘resource_attributes_application’, ‘Application’) GROUP BY span_attributes_http_target ORDER BY Failed DESC |
💡Note:
After completing this query, you can preview, validate, and move on to the next step.
You can now preview any query that is created. There are two ways to preview,
Run Query: Click on the run query option to get a preview of the selected query.
The results will be displayed as shown below.
To run all the queries, click on the Run all queries option at a model level to preview all the queries in a data model.
The results will be displayed as shown below.
Once the preview looks accurate click on Validate and Next to move to the third step.
During the creation of a data model where data is currently unavailable in the data store, queries cannot be validated. In such instances, clicking the ‘Validate and Next’ button will result in an error. To bypass data checks and proceed to the next step, you can enable ‘Skip data checks’ before clicking on ‘Validate and Next’.
You can skip the validation by clicking Skip Data Checks as shown below.
Such skipped Data Models will have a warning tag before the name on the landing page as shown below.
You can use other operations to Copy, Delete, and Disable a query.
Copy: This allows you to copy an existing query to create a new query within the same data model.
Saved Search: This allows you to populate a query from a Saved Search.
Click on Proceed to auto-populate the query.
Disable: You can disable a query in a data model. When a data model is used somewhere, a disabled query won’t be executed.
Delete: You can delete a query both at the local and global levels. This means you can remove a query entirely from your data model.
Collapsible arrow: You can click on the button to hide the query content and keep only one of them to focus on that.
After successfully validating the query section, you’ll enter the transformation section. The transformations are used for data manipulation. You can combine data from two tables or data stores by using a join transformation when required.
On this page, you’ll see a list of supported transformations. Each transformation here comes with a description of its highlights. Clicking on a transformation will take you to the respective transformation screen, where you can create and configure the transformation.
Single Query Transform:
Multiple Query Transform: In addition to incorporating all the single query transformations, two additional transformations are available for handling multiple queries.
The “Organize Fields” transformation allows you to rename fields or columns in a query. Here’s how to use it:
The “Add Field” transformation allows you to add a new field to the output of a query. You can do this using arithmetic operations, custom values, or expressions.
Select a query in the Query dropdown. Choose a mode:
With the “Arithmetic Operation” mode, you can create a new field based on arithmetic operations between existing fields. Here’s how to do it:
The existing fields will stay the same, and a new field will be added based on your specifications.
Output
In the “Custom” mode, you can create a new field with a static value in the Query. The value can be a number (integer or float) or a string. Here’s how to do it:
Example: You can give a field value of 100 and name it New Field 1. You can add a new “Add Field” transform to find the ratio of Average Duration to Count using Arithmetic Operation mode. The resultant ratio is stored in the Alias name “Divide”
Now the result “Divide” column is multiplied by the static value “New Field 1” and the result is stored in the Alias name “Percentage”
Output:
Once you’ve entered this information, click the Run Transform button to see the changes reflected in the preview.
In the output, you’ll discover columns such as “New Field 1” with a static value, the “Divide” column representing the result of avg_duration divided by count, and finally, the “Percentage” column showcasing the multiplication of the Divide column and New Field 1.
In the “Expressions” mode, you can create a new field based on a user-defined expression that involves operations like addition, subtraction, multiplication, or division between two or more existing fields or static floats/integers in the same Query. Here’s how to do it:
Example: Here we are calculating the percentage of average duration to the count using the expression {Avg_Duration}/{count}*100 and the result is stored under the Alias name New 1
💡Note:
Once you’ve entered your expressions and aliases, click the Run Transformation button to see the changes reflected in the preview. The existing fields will remain unaffected, and new fields will be added according to the specifications of your expressions.
The “Filter By” transformation allows you to filter or remove specific portions of the data in a frame.
Here’s how to use it:
This transformation helps you refine your data by including specific records based on the defined conditions and matching values.
Example: You are filtering out the Avg_Duration value which is greater than 0.5.
Table before filtering
Filter-By Transformation
Table after Filtering
The “Aggregate” transformation is used to summarize or aggregate the output of a query. It also offers time-bucketing capabilities.
Here’s how to use it:
The “Aggregate” transformation allows you to condense and analyze your data, providing you with insights and summaries based on your chosen fields and grouping criteria. Time bucketing helps in organizing data based on time intervals, which can be particularly useful for time-series data.
1. Aggregate without Time Bucketing
This process allows you to aggregate data based on the chosen source field, perform various calculations, and create new fields with summarized information. You can add and remove multiple aggregations as needed.
Example: We are calculating the Unique Count for the source field event_id and the result is stored under the alias name Unique which is grouped by Severity.
Output:
After clicking on Run Transform you will find the Unique Counts of the Event ID grouped by the severity.
b) Aggregate with Time Bucketing
This process allows you to perform time-based aggregation on data, summarizing information within specified time intervals. The other aggregation settings, like source field and calculation methods, are similar to aggregating without time bucketing.
Example: You are calculating the Unique Count for the source field event_id and the result is stored under the alias name Unique which is grouped by Severity.
You are bucketing the start_time with intervals of 10mins and the result is stored under Bucket Label.
Output:
After clicking on Run Transform you will find the Unique Counts of the Event ID grouped by the severity with the start_time interval of 10 minutes.
Sort By transformation is used to sort a query output by the selected field. When reverse is enabled, the values are sorted in descending order.
To use the Sort By transformation, select a query and then choose the field for sorting.
Ascending Sort-By: Select a query and choose the field. The reverse button is by default disabled to sort them in ascending order.
Descending Sort-By: Click the Reverse button to sort them in descending order.
This transformation helps you choose the top N rows from a data frame, allowing for further operations.
This transformation allows you to join the outputs of two queries into a single frame, functioning like a typical SQL join.
💡Note: Please find the individual outputs of Query 1 and Query 2.
Here before using the Join transformation, you configure 2 queries in the Prepare Data Query section. Where Query 1 gives you the data of the count of Respcode.
The output of Query 1.
Query 2 displays data on the “number of respcode” per transaction category.
The output of Query 2
Once done, choose the type of Join you want to perform. Available options are:
Left join: It returns all records from the left table and the matched records from the right table.
Right join: It returns all records from the right table and the matched records from the left table.
Inner join: It returns records that have matching values in both tables
Outer join: It returns records that have unmatching values in both tables.
Click on the Run Transform button to see the changes in the preview.
💡Note: You can join multiple fields across multiple queries using this transformation.
Concatenate combines the results of two Queries into one frame.
Choose two Queries in any order from the dropdown.
Pick either Horizontal (side by side) or Vertical (stacked) concatenation.
Click Run Transform to see the merged results.
a) Horizontal Concatenate
You are Horizontally concatenating Query 1 and Query 2 to create a new table.
As seen above, this is aggregated data from two or more sources having a similar number of rows in them.
Output:
b) Vertical Concatenate
You are Vertically concatenating Query 1 and Query 2 to create a new table.
As seen above, this is time series data with a similar schema coming from two different queries.
To concatenate these into a single stage/frame, use Vertical Concatenate.
Output:
💡Note: Preview is restricted to a max of 20 records
You can use the ‘Set Properties’ step to select the metrics and buckets that should be exposed for this data model. Additionally, you can set properties for ML modeling and threshold for alerting purposes.
Click Add Metric Attribute to add Metric Name, Severity Rules, and other attributes. These definitions are used by Visualizations, Alerting, Reporting, and Machine Learning modules, to handle the metric and their monitoring/reporting.
Select Metrics: Pick a field representing the Metric or key data point. You have the following 2 sections to make the required configurations.
1. Severity Rules: A severity rule is a value range at which a metric might require monitoring or alerting.
2. Other Attributes (Optional): The attributes presented in this section are tailored for Data Models utilized within the RCA Workspace.
Each metric can now be given with information on the following properties.
💡Note: All these settings are very useful when the Data Model is used in the RCA Workspace.
Define Time Fields in Data Model
This helps you enable the timestamp for the given Data Model. It is data that is recorded over consistent intervals of time. You should enable this whenever the Final Data is time-series data. Time series data could come when bucketed data is fetched, or raw time series data is queried.
Define unique keys for which the metrics from this Data Model are to be tracked
Unique Keys are fields in the data model output that help in distinguishing different categories of data in the data model. Alerting and Machine Learning Modules use this to track the metrics separately for each value of unique keys.
Run Set Properties: Once you’ve finished configuring your metrics, simply click the Run Set Properties button at the top right. This will validate the metric properties, and the Data Model will perform a background check based on the new information you’ve provided.
Click on the blue eye icon to preview the results.
Once validation is complete, the save button gets activated at the bottom right. Click on it to save the Data model.
With this, The Data Model creation is now complete.
Dynamic filters allow you to set up custom filters within your queries using the “$” symbol with a variable.
For Hyperscale DB using Build Query:
Enable the Toggle button to configure the Dynamic Filters in the Build Query section of the Data Model.
Select the Field name, based on which you want to filter the data, and provide the Variable Name. The same variable name will be used to configure ‘variable’ in the dashboard and use it as a Filter. (Variable name is case sensitive). Click on the + New Filter button to add multiple filters.
This is what the whole Build Query configuration looks like for a sample ibmb_transactions table.
For Timescale Datastore (PSQL):
💡Note:
Browse through our resources to learn how you can accelerate digital transformation within your organisation.
VuNet’s Business-Centric Observability platform, vuSmartMaps™ seamlessly links IT performance to business metrics and business journey performance. It empowers SRE and IT Ops teams to improve service success rates and transaction response times, while simultaneously providing business teams with critical, real-time insights. This enables faster incident detection and response.