#63: Logging libraries: auditing and troubleshooting your application
You can’t look at your application all the time. Instead, it should leave some sort of persistent trace. Such an audit log can be examined later on. However, it’s the responsibility of the application itself to log appropriately. But more importantly, the data it logs for later must be well-structured. Simply printing random words to a console is no longer sufficient.
If you start doing logging by hand, you’ll quickly build some home-grown abstraction. For example, every time you log something, the timestamp is a must. Otherwise, when looking at the logs later on, it’s impossible to tell when things have happened. Timestamps also allow measuring the time elapsed between events. Another feature is some sort of persistence. Most likely, you’ll redirect your logs to a file.
Logging is pervasive and solves the same problems in pretty much all applications. So, small libraries flourished all over. You only need to think about the message you want to store for later. The logging library takes care of:
- adding metadata, like timestamps, current thread ID, logged user
- delegating to some destination, like console, file or
syslog
- choosing which log messages are important based on their level
Especially the last point is interesting. Developers can choose whether a certain log message is something critical or just some fine-grained debugging. A logging library can, sometimes dynamically, choose to drop less important stuff.
On a busy server, multiple processes and threads are logging at the same time. Imagine you are handling hundreds of transactions per second. And each transaction produces hundreds of logs. Sadly, logs from different transactions interleave, making it impossible to track an individual transaction. So we must get clever. Fortunately, logging libraries help.
These libraries attach thread ID by default, which may be sufficient. But more complex systems tend to create their tracing ID. Such ID works across threads, and even systems! What does it mean?
Well, let’s say your transaction spans multiple applications. You can quickly discover all logs from all applications related to a single transaction. This is called distributed logging. You’ll probably need some external tool to collect logs from applications and aggregate them in some searchable form. A popular ELK stack may be used for that. ELK stands for Elasticsearch, Logstash, and Kibana. This common setup helps collect logs from several services in one place. It’s a must for any distributed system.
Another cool feature is called structured logging.
Basically, rather than logging dumb strings, we also log arbitrary key-value pairs.
It makes searching easier.
For example, we can look for a string 123
being userId
, but not productId
.
Logging is essential when troubleshooting issues in production. However, too much logging can become a problem:
- it may be expensive, especially when logs are saved to disk synchronously
- it may reveal sensitive information, for example, logging e-mails or passwords accidentally
- and finally, it’s yet another dependency that may be vulnerable
The last point applies to any library we depend on. However, recently an insanely dangerous 0-day vulnerability was found in Log4J. One of the most popular logging libraries in the wild.
That’s it, thanks for listening, bye!
More materials
syslog
- #48: Distributed tracing: find bottlenecks in complex systems
- What is the ELK Stack?
- Log4Shell: RCE 0-day exploit found in log4j 2, a popular Java logging package
- Logging libraries:
- Logging in .NET
- Logging facility for Python
- Apache Log4j 2 (Java)
- Logback Project (Java)
- uber-go/zap - structured logging in Go