Driving Better Decisions Through Effective Logging and Improved Observability
- Feb 22, 2024
- Blogs
- 5 min read
Introduction
Application architecture is today going through transformational changes. Distributed deployments, multiple interconnected systems and API calls have made it challenging to derive IT insights and link them to business performance.
In this context, data and specifically logs play a pivotal role, not just in delivering performance insights but in storytelling. This narrative, aligning technology with business goals, is crucial for IT and Operations leaders, and industry analysts. The secret lies in improved telemetry and advanced observability.
The Value of Observability
Observability is more than a technical requirement; it’s a strategic asset. It provides intelligence that significantly impacts customer experiences and aligns business, technology, and operations teams. But, it requires comprehensive telemetry.
The Challenge of Enterprise Data
Enterprises struggle with data quality gaps and dispersion, making it challenging to harness actionable insights. However, observability platforms with their ability to ingest diverse logs tackle this through correlation, contextualization, and insights, offering advanced capabilities in root cause analysis.
Effective Logging Practices
Effective logging is about capturing the right information at the right levels. The fundamental requirements for effective logging practice are:
- More contextual than being more descriptive. For instance, a logline is recommended to comply with the following:
- When: A timestamp to show when this log event occurred
- Where: The application name and part of the application that recorded this log event (ex: module name, method name etc)
- What: A crisp meaningful message for a user to understand what just happened in the application
- Severity: A log level indicating the severity of the message. (For instance, error logs should detail specific failure points with error codes, while debug logs provide insights into variable states or decision paths.)
- Identifiers: An identifier pointing to a unique id relevant to the user or application scope as applicable (Session IDs, Trace IDs, Transaction IDs)
- Contextual Information: Depending on the type of event for which the log is generated, it can contain specific contextual information that might provide additional info (ex: error codes, response codes, stack traces, query executed etc)
- Use a standard format for logging or at the minimum ensure logs are easily breakable with fixed delimiters (logging libraries like log4j ensure basic compliance, however the message level compliance to include relevant context is left to the implementor)
Examples
Lets see some examples.
Unstructured Logging
An example of a good logline for unstructured logging is:
2024-02-19T13:45:32.123Z ERROR [UserService] TxnId:abc1235, userId:12345, Message: Failed to retrieve user details,
Error: DatabaseTimeoutException for Query SELECT * FROM users WHERE id=12345.
The above log line contains all the required information in a single line like the timestamp, log level, execution module, specific context information like Transaction ID, User ID and the query that failed with the failure reason. Though unstructured, this is good from the completeness standpoint and breaking this log line into meaningful fields for analytics will generally involve simple grok techniques.
Structured Logging
Structured logging uses standard formats like JSON, and XML for logging which makes consuming the information from the logs for analytics a simple task using standard parsers.
{ "timestamp": "2024-02-19T12:35:00Z", "level": "INFO", "message": "HTTP Request Received", "request_details": { "method": "POST", "uri": "/api/v1/orders", "ip_address": "192.168.1.25", "user_agent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1;
Trident/6.0)", "response_status": 200, "response_time_ms": 250 }, "user_id": "12345" }
Error Logging
An example error log that captures all the required information about the error in a JSON format is below:
{ "timestamp": "2024-02-19T12:34:56Z", "level": "ERROR", "message": "Failed to process payment", "error_details": { "exception_type": "PaymentProcessingException", "error_message": "Card expired", "error_code": "RC10", "stack_trace": "Traceback (most recent call last): ..." }, "user_id": "12345",
"transaction_id": "abcde12345", "service": "PaymentService", "additional_info": { "payment_amount": "100.00", "currency": "USD" } }
Observability in Action
The incorporation of unique identifiers in logs, as per the document, enhances the traceability of transactions. This is vital in pinpointing issues and optimizing system performance.
Conclusion
The journey towards enhanced telemetry and observability, though challenging, is substantially rewarding. By adopting these practices, enterprises can ensure their data not only tells a story but narrates a compelling, insightful, and actionable narrative that aligns with their business objectives.