Alerts and Notification > Programmable Alerts Conditions and Use Cases

Programmable Alerts Conditions and Use Cases

vuSmartMaps lets you use Python scripts to create programmable alerts. Using an evaluation script, you can generate alerts for breaching any business logic. Below is a typical alert engine execution workflow and where the evaluation script is used.

The evolution script runs after metrics are checked and thresholds are applied, allowing you to customize alert behavior. Apart from implementing business logic to generate the alert, you can also tweak alert notification content and channel settings like who gets notified.

Input Parameters for the Script

vuSmartMaps makes various parameters like a list of metrics, grouping values, and metadata in the evaluation script for you to use in your programming logic. The following table lists the input parameters available to the evaluation script

Parameter

Type

Description

How to Access

R<n>

Boolean

True or False indicating whether Rule-n thresholds matched or not

The result of Rule-1 is available as variable R1, Rule-2 as R2, etc

D

List

The Data Model results list

D[0] contains Data Model values for Rule-1, D[1] for Rule-2 etc. Please note the array like indexing.

grouping_values

List

Grouping values associated with the notification. This corresponds to buckets configured in Data Model.

grouping_value[0] for first bucketing value, grouping_value[1] for second etc.

META_DATA

Dictionary

Metadata associated with this alarm

 

META_DATA[‘duration’]

Seconds

Duration for which this alarm has been active

META_DATA.get(‘duration, 0)

META_DATA[‘history’]

Dictionary

History of this alarm

Please see the example in the upcoming sections

Output Parameters available for the script

The following table lists the output parameters available to the evaluation script for controlling the behavior of the alarm.

Parameter

Type

Description

How to Access

RESULT

Boolean

Setting this value to True results mean alert should be generated.  while setting this to False will not generate the alerts. state.

Eg:

RESULT = True

or

RESULT = False

R<n>

Boolean

True or False indicating whether Rule-n thresholds matched or not

The result of Rule-1 is available as variable R1, Rule-2 as R2, etc

D

List

The Data Model results list

D[0] contains Data Model values for Rule-1, D[1] for Rule-2 etc

META_DATA

Dictionary

Metadata associated with this alarm

 

META_DATA[‘force_update’]

Boolean

True or False indicating whether to send an update notification for this alert or not

 

Changing Evaluation Formula

By default, vuSmartMaps generates an alert only when all conditions in an alert rule are true. So if there are two rules R1 and R2, an alert will be generated if both R1 and R2 are True. 

In the evaluation script, each rule’s result is represented by variables like R1, R2, etc., where True is represented as “R1 = True.” You can use logical operators in the evaluation script. So the above two rules evaluation can be represented in the evaluation script as:

if R1 and R2:

    return True;

You can use “and” and “or” logic to create complex conditions, like “R1 and (R2 or R3) and not R4.” So for instance, the evaluation script will look like the following if either R1 or R2 condition is to be met:

if R1 or R2:

    return True;

This gives you control over when alerts are generated based on your specific criteria and combinations of conditions.

Accessing Metric Values

In the evaluation script, you can access the current values of the metrics used in your alert conditions. Each metric can be accessed through a multi-level Python data structure.

To simplify this process, an accessor function is available to help the evaluation script retrieve the values of specific metric columns from the data model used in your alert rule. This allows you to make informed decisions in your script about whether or not to trigger an alert.

  D               – Rule result dictionary

    rule_identifier – Identifier of the rule  (1, 2 etc)

    metric_column   – Metric column

Example: get_DM_value(D, 1, ‘success_rate’)

Internally, the accessor function validates the presence of the metric and returns the value from the multi-level dictionary hierarchy.def 

 

current_success = get_DM_value(D, 1, ‘success_rate’)

daily_avg = get_DM_value(D, 1, ‘daily_average’)

RESULT = True

if current_success is not None and daily_avg is not None:

    Ratio = current_success/daily_avg

    If ratio < .75:

        RESULT = False

The metric values can be used to decide on whether to generate an alert or not.

If we want to generate a warning alert if the value is between 80 and 90 and a critical alert if the value is > 90 and should be sent to a few extra folks. So a data model will be created with a threshold of 80. Then we will write Python code to do this check and update META_DATA also to include more people in critical cases.

bw = get_DM_value(D, 1, ‘input_bandwidth’)

If bw and bw >= 80 and bw < 90:

    RESULT = True

    severity = warning

elif bw and bw >= 90:

    RESULT = True

    severity = critical

else:

    RESULT = False

Controlling Alert Generation

Suppose you want to generate an alert only when the transaction success rate is 25% lower than the daily average. You can achieve this by creating a custom evaluation script.

For example, if you have a Data Model named “Success Rates” with metrics “Current Rate” and “Daily Average,” the following script can be used:

current_success = get_DM_value(D, 1, ‘Current Rate’)

daily_avg = get_DM_value(D, ‘1’, ‘Daily Average’)

RESULT = True

if current_success is not None and daily_avg is not None:

    if daily_av:

         # avoid division by zero

        ratio = current_success/daily_avg

    else:

        ratio = 0

    If ratio < .75:

        RESULT = False

💡Note: In the script provided earlier, the variable RESULT plays a crucial role in determining whether an alert should be generated. If you set RESULT to True, an alert will be triggered for the specific situation. If RESULT is set to False, no alert will be generated. If the script doesn’t modify the value of RESULT, no alert will be generated by default.

In summary, RESULT serves as the output variable that allows the evaluation script to control when alert notifications are generated based on the conditions and logic you define.

Accessing Grouping Values

You can also access grouping values for a specific alert being evaluated in your scripts. This is useful if your notifications involve multiple grouping levels. For instance, if you’re grouping alerts by hostname and interface name, you can access these values within your script as demonstrated in the example provided.

if grouping_values[0] === ‘AppServer’ and 

   grouping_values[1] === ‘serial-1-1’

    RESULT = False

else:

    RESULT = True

In the example above, we use grouping values to avoid generating alerts for the serial interface on the host “AppServer.” The script accesses grouping values through the “grouping_values” list, which contains the values for each level of grouping. You can access these values using the Python syntax, such as “grouping_values[0]” and “grouping_values[1].” This allows you to customize alert generation based on specific grouping criteria.

Adding New Fields

You can add new fields to the notifications generated by the system using the evaluation script. 

For instance, if you need to include a new field or category with values based on the transaction success rate metric, you can achieve this with the following script snippet.

success_rate = get_DM_value(D, 1, ‘Success Rate’)

if success_rate and success_rate > 90: 

    DYNAMIC_FIELDS[‘category’] = ‘Normal’

else:

    DYNAMIC_FIELDS[‘category’] = ‘Need Investigation’

RESULT = True

As can be seen, any field to be added to the notification generated can be specified in the DYNAMIC_FIELDS dictionary with the corresponding key and value.

Controlling Notification Channels and Recipients

You can control the notification channels and recipients in the evaluation script using the ALERT_CHANNELS list. You can add or remove items from this list to include or exclude specific channels for a particular alert. Here are the keywords for different channels:

  • alertByEmail
  • alertByReport
  • alertByRunBook
  • alertByWhatsapp
  • alertByTicket

For example, the following code snippet can be used to add Email as one of the channels and remove the Ticketing system as a channel based on conditions.

success_rate = get_DM_value(D, 1, ‘Success Rate’)

if success_rate and success_rate > 90: 

    ALERT_CHANNELS.append(‘alertByEmail’)

    EMAIL_ID_LIST = [‘[email protected]’,’[email protected]’]

elif success_rate < 80:

    ALERT_CHANNELS.append(‘alertByTicket’)

RESULT = True

Within each channel, a similar facility is available to control the recipients. In the above, two email addresses are configured as recipients.

The list of controls available for different channels is shown below

Field

Channel

Description

EMAIL_ID_LIST

Email

List of email addresses. Eg: [email protected]

EMAIL_GROUP_LIST

Email

List email group names. Eg: Support

REPORT_LIST

Report

List of Reports. Eg: the “CXO Report”

PHONE_NUMBER_LIST

WhatsApp

List of phone numbers. Eg: 9881 234 567

RUNBOOK_SCRIPT

Runbook

Runbook script name. Eg: service_restart

Controlling Severity

The severity of the alert can be modified using an evaluation script. For example, the severity of the alert is increased to Critical for a certain range of values.

success_rate = get_DM_value(D, 1, “Success Rate”)

if success_rate and success_rate > 90:

    DYNAMIC_FIELDS[‘severity’] = ‘critical’

RESULT = True

Modifying Summary and Description

Similar to severity, summary and description fields can be modified using an evaluation script.

success_rate = get_DM_value(D, 1, ‘Success Rate’)

if success_rate and success_rate > 90: 

    DYNAMIC_FIELDS[‘summary’] = ‘Resource Usage High for %g’

    DYNAMIC_FIELDS[‘description’] = “Investigation of this server …..” 

RESULT = True

As can be seen in the above example, the summary and description created by the script can make use of the format specifiers supported by the system. Please refer to Step 1 of alert creation in this manual for more details on this.

Accessing Duration

The duration for which the current alarm condition has been active is available in the META_DATA dictionary. This can be used for escalating alerts based on the active duration.

In the below example, the script implements an escalation of the alarm condition by sending out a notification to a larger group, if the alarm condition has continued for more than 6 hours. We are keeping it in.

if META_DATA[‘duration’] > 6 *60: 

    ALERT_CHANNELS.append(‘alertByEmail’)

    EMAIL_ID_LIST = [‘[email protected]’,’[email protected]’]

RESULT = True

Accessing the History of Alarm

An evaluation script can make use of the history of this alarm condition to decide on the alert behavior. The below example does escalation notification if the condition has been activated more than X number of times in the last 1 week.

if len(META_DATA[‘history’][‘list’]) > 10: 

    ALERT_CHANNELS.append(‘alertByEmail’)

    EMAIL_ID_LIST = [‘[email protected]’,’[email protected]’]

RESULT = True

Using Enrichments in Alerts

We can use the enrichments available in vuSmartMaps to enrich the alert document or to use those enrichments in evaluation scripts for any other purpose.

We have a lookup function available for this named “get_value_from_lookup_file”. It takes the following arguments – 

  • tenant_id – Tenant Id
  • bu_id – Bu Id
  • lookup_file_name – Name of the lookup file
  • key – Key to be used to do the lookup. This can also be a list of keys in case of a multi-level lookup

    The given “key” can either be a single key or a list of keys

    – In case of multi-level lookups

    – For ex: key = [“circle1”, “region2”, “code”] in a lookup

    – circle1:

        region1:

            code: ‘255’

        region2:

            code: ‘254’

    – will give ‘254’ as output

RESULT = False

if R1:

    add_fields = D[0][‘Alert for WAN Link Down – BGP State Change SNMP – North’][‘metrics’][‘BGP Peer State’][‘includes’]

    M1 = get_DM_value(D, 2, “LinkUsage”)

    M2 = get_DM_value(D, 2, “bgp_peer_remote_address”)

    M3 = get_DM_value(D, 2, “Circle”)

    M4 = get_DM_value(D, 2, “BranchName”)

    M5 = get_DM_value(D, 2, “BranchCode”)

    M6 = get_DM_value(D, 2, “DeviceIP”)

    M7 = get_DM_value(D, 2, “ISP”)

    AssignmentGroup =get_value_from_lookup_file(“1”, “1”, “Assignment-Grp.yml”, [M3, M7, “AssignmentGroup”])

    DYNAMIC_FIELDS[“Assignment_Group”] = AssignmentGroup

    DYNAMIC_FIELDS[“Assigned_Organization”] = get_value_from_lookup_file(“1”, “1”, “AssignedOrg.yml”, [AssignmentGroup,”Organization”])

    DYNAMIC_FIELDS[“code”] = get_value_from_lookup_file(“1”, “1”, “code.yml”, “Nexus”)

 

Using Time of Alert

In certain cases, the decision on alert may have to be made based on the time at which the alert is being generated.

The time of alert is available in the OBSERVATION_TIME variable.

For example, if different thresholds are to be used for business hours and non-business hours, the following logic can be used.

success_rate = get_DM_value(D, 1, ‘success_rate’)

# Time in the local time zone at which this alert is being generated

# OBSERVATION_TIME is a Python datetime object and all 

# operations/functions supported in datetime object can be used on this

# hour of the day

hour = int(OBSERVATION_TIME.strftime(“%I”))

If hour >= 9 and hour <= 17:

    threshold = 80

else

    threshold = 60

if success_rate and success_rate > threshold:

    RESULT = True

else:

    RESULT = False

If different thresholds are to be used for weekdays and weekends, the following logic can be used

success_rate = get_DM_value(D, 1, ‘Success Rate’)

# Time in the local time zone at which this alert is being generated

# OBSERVATION_TIME is a Python datetime object and all 

# operations/functions supported in datetime object can be used on this

# Day of the week

day = OBSERVATION_TIME.strftime(“%A”)

If day == ‘Sunday’ or day == ‘Saturday’:

    threshold = 50

else

    threshold = 70

if success_rate and success_rate > threshold:

    RESULT = True

else:

    RESULT = False

Internally, OBSERVATION_TIME is the last alert execution time in the local timezone

OBSERVATION_TIME = self.last_execution_end_time.astimezone(tz.tzlocal())

Controls for Notifications Behavior

In this step, you can set up how alert notifications work for this rule. You can configure notification channels, and recipients, enable or disable alarm mode, and control the intervals for active alert rule notifications.

Enable Alarm Mode

Enable Alarm Mode: When activated, the system monitors the alarm state, sending notifications when the alert condition is met and when it’s cleared. No additional notifications are sent while the condition remains active. For example, when the queue size exceeds 80%, an active alarm notification is sent, but no more are sent while the queue size is above the threshold. When it drops below 80%, a clear notification is sent.

Disable Alarm Mode: If this mode is turned off, notifications are sent at regular intervals as long as the alert condition is active. In this case, the system doesn’t track the alarm state, and no clear notifications are generated. For example, if the queue size stays above 80%, vuSmartMaps will send a notification every 5 minutes.

Non-alarm mode notifications are beneficial when operators want to receive regular updates about the status of a monitored metric or component.

Throttling

Throttling is only active when alarm mode is turned off. With throttling enabled, the system will not send repeated notifications for the same condition until the configured interval has passed. 

For instance, if you set the throttling interval to 2 hours, you will only receive a second notification about high CPU usage for a specific server 2 hours after the initial notification. This is handy to prevent constant notifications when alarm mode is disabled.

Let’s say you want notifications about the number of user logins to an application every 2 hours. To do this, set up an alert rule in non-alarm mode with a metric for the login count and a threshold greater than 0. Configure a throttling duration of 2 hours. 

This way, you’ll receive these notifications every 2 hours.

Enable Alert Notification During

This configuration is handy for avoiding notifications during periods of lower activity, such as weekends or non-business hours, ensuring that you only receive notifications when it’s most relevant.

You can specify an alert’s active period to prevent it from triggering alerts during planned maintenance activities. During this period, the rule won’t generate alert notifications, but you can still access them in the Events section for reference.

Advanced Configuration

You can use the Advanced Configuration to add new functionalities to alert rules through a YAML interface.

It’s a way to configure features that may not be available through the regular menu options. 

For instance, you can configure the alert notification level using this interface by specifying it in the advanced configuration text area.

notification_level: 0


For more details on the notification level, check out the examples section.

Related Dashboards:

To config-related dashboards of an alert, use the YAML configuration as shown below.

Tags:

To add tags during Alert configuration, enable the Advanced Configuration section and use a YAML script like the one shown in the attached snapshot below.

Resources

Browse through our resources to learn how you can accelerate digital transformation within your organisation.