009_09_22_data_analysis-min

Driving Better Decisions Through Effective Logging and Improved Observability

Introduction

Application architecture is today going through transformational changes. Distributed deployments, multiple interconnected systems and API calls have made it challenging to derive IT insights and link them to business performance.

In this context, data and specifically logs play a pivotal role, not just in delivering performance insights but in storytelling. This narrative, aligning technology with business goals, is crucial for IT and Operations leaders, and industry analysts. The secret lies in improved telemetry and advanced observability.

The Value of Observability

Observability is more than a technical requirement; it’s a strategic asset. It provides intelligence that significantly impacts customer experiences and aligns business, technology, and operations teams. But, it requires comprehensive telemetry.

The Challenge of Enterprise Data

Enterprises struggle with data quality gaps and dispersion, making it challenging to harness actionable insights. However, observability platforms with their ability to ingest diverse logs tackle this through correlation, contextualization, and insights, offering advanced capabilities in root cause analysis.

Effective Logging Practices

Effective logging is about capturing the right information at the right levels. The fundamental requirements for effective logging practice are:

  • More contextual than being more descriptive. For instance, a logline is recommended to comply with the following:
    • When: A timestamp to show when this log event occurred
    • Where: The application name and part of the application that recorded this log event (ex: module name, method name etc)
    • What: A crisp meaningful message for a user to understand what just happened in the application
    • Severity: A log level indicating the severity of the message. (For instance, error logs should detail specific failure points with error codes, while debug logs provide insights into variable states or decision paths.)
    • Identifiers:  An identifier pointing to a unique id relevant to the user or application scope as applicable (Session IDs, Trace IDs, Transaction IDs)
    • Contextual Information: Depending on the type of event for which the log is generated, it can contain specific contextual information that might provide additional info (ex: error codes, response codes, stack traces, query executed etc)
  • Use a standard format for logging or at the minimum ensure logs are easily breakable with fixed delimiters (logging libraries like log4j ensure basic compliance, however the message level compliance to include relevant context is left to the implementor)

Examples

Lets see some examples.

Unstructured Logging

An example of a good logline for unstructured logging is:

2024-02-19T13:45:32.123Z ERROR [UserService] TxnId:abc1235,
userId:12345, Message: Failed to retrieve user details,
Error: DatabaseTimeoutException for Query 
SELECT * FROM users WHERE id=12345.

The above log line contains all the required information in a single line like the timestamp, log level, execution module, specific context information like Transaction ID, User ID and the query that failed with the failure reason. Though unstructured, this is good from the completeness  standpoint and breaking this log line into meaningful fields for analytics will generally involve simple grok techniques.

Structured Logging

Structured logging uses standard formats like JSON, and XML for logging which makes consuming the information from the logs for analytics a simple task using standard parsers.

{
  "timestamp": "2024-02-19T12:35:00Z",
  "level": "INFO",
  "message": "HTTP Request Received",
  "request_details": {
    "method": "POST",
    "uri": "/api/v1/orders",
    "ip_address": "192.168.1.25",
    "user_agent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1;
 Trident/6.0)",
    "response_status": 200,
    "response_time_ms": 250
  },
  "user_id": "12345"
}

Error Logging

An example error log that captures all the required information about the error in a JSON format is below:

{
  "timestamp": "2024-02-19T12:34:56Z",
  "level": "ERROR",
  "message": "Failed to process payment",
  "error_details": {
    "exception_type": "PaymentProcessingException",
    "error_message": "Card expired",
    "error_code": "RC10",
    "stack_trace": "Traceback (most recent call last): ..."
  },
  "user_id": "12345",
  "transaction_id": "abcde12345",
  "service": "PaymentService",
  "additional_info": {
    "payment_amount": "100.00",
    "currency": "USD"
  }
}

 

Observability in Action

The incorporation of unique identifiers in logs, as per the document, enhances the traceability of transactions. This is vital in pinpointing issues and optimizing system performance.

Conclusion

The journey towards enhanced telemetry and observability, though challenging, is substantially rewarding. By adopting these practices, enterprises can ensure their data not only tells a story but narrates a compelling, insightful, and actionable narrative that aligns with their business objectives.

Raja Madhavan

About Author

Raja Madhavan – Director of Platforms at VuNet Systems. He oversees packaged integrations, sandbox environments, seamless deployment of platforms and its stability. An industry expert in Observability, AIOps and technologies associated with big data analytics with decades of experience in building and validating software systems.

RELATED Blogs