Logging

Taking some notes on how to build logs for applications. I don't have a good logging system yet, and it can get hard when you have distributed microservices.

Date Created:
Last Edited:

References


  • The Log
    • Everything you ever wanted to know about structured logs, and how to build distributed systems on top of them. - Greg Brockman
  • Logging best practices
    • A short guide for how to think about logging. I wish more software followed this article's advice. - Greg Brockman
    • This link now returns 404
  • Logging Cheat Sheet / Logging Vocabulary
    • OWASP Blogs on Logging


The Log


Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures. You can't fully understand databases, NoSQL stores, key value stores, replication, paxos, hadoop, version control, or almost any software system without understanding logs; and yet, most software engineers are not familiar with them.


What is a Log?


A log is an append-only totally-ordered sequence of records ordered by time.

Log

Records are appended to the end of the log, and reads proceed left-to-right. Each entry is assigned a unique sequential log entry number. The ordering of records denotes a notion of "time" since entries to the left are defined to be older than entries to the right. Logs have a specific purpose: they record what happened and when.


Logs in Databases


The usage of logs in databases has to do with keeping in sync the variety of data structures and indexes in the presence of crashes. To make this atomic and durable, a databases uses a log to write out information about the records it will be modifying, before applying the changes to all the various data structures it maintains. The log is used as an authoritative source in restoring all other persistent structures in the event of a crash.


Logs in Distributed Systems


State Machine Replication Principle:

If two identical, deterministic processes begin in the same state and get the same inputs in the same order, they will produce the same output and end in the same state.

Deterministic means that the processing isn't timing dependent and doesn't let any other out of band input influence its results. The state of the process is whatever data remains on the machine, either in memory or on disk, at the end of processing.

Different groups of people seem to describe the uses of logs differently. Database people generally differentiate between physical and logical logging. Physical logging means logging the contents of each row that is changed. Logical logging means logging not the changed rows but the SQL commands that lead to the row changes (the insert, update, and delete statements).

The distributed log can be seen as the data structure which models the problem of consensus.

What the Log is Good For

  1. Data Integration - making all of an organization's data easily available in all its storage and processing systems
  2. Real-time data processing - computing derived data streams
  3. Distributed system design - how practical systems can be simplified with a log-centric design


Data Integration


Data integration is making all the data that an organization owns available in all its services and systems.

Event data records things that happen rather than things that are.

United Log

The data warehouse is meant to be a repository of the clean, integrated data structured to support analysis. The data warehousing methodology involves periodically extracting data from source databases, munging it into some kind of understandable form, and loading it into a central data warehouse. Having this central location that contains a clean copy of all your data is a hugely valuable asset for data-intensive analysis and processing.

At LinkedIn, we have built our event data handling in a log-centric fashion. We are using Kafka as the central, multi-subscriber event log. We have defined several hundred event types, each capturing the unique attributes about a particular type of action. This covers everything from page views, ad impressions, and searches, to service invocations and application exceptions.

Stream Processing is just processing which includes a notion of time in the underlying data being processed and does not require a static snapshot of the data so it can produce output at a user-controlled frequency instead of waiting for the "end" of the data set to be reached.


OWASP Recommendations


This is a guide to application logging mechanisms, especially related to security logging. Application event logging often provides much greater insight than infrastructure (e.g., database) logging alone. Application logging should be consistent within the application, consistent across an organization's application portfolio and use industry standards where relevant, so the logged event data can be consumed, correlated, analyzed, and managed by a wide variety of systems.


Purpose


Application logs should be used for:

  • Identifying security incidents
  • Monitoring policy violations
  • Establishing baselines
  • Assisting non-repudiation controls
  • Providing information about problems and unusual conditions
  • Contributing additional application-specific data for incident investigation which is lacking in other log sources
  • Helping defend against vulnerability identification and exploitation through attack detection


Design, Implementation and Testing


Sources of data

  • Security events
  • Business process monitoring
  • Anti-automation monitoring
  • Audit trails
  • Performance monitoring
  • Data for subsequent requests for information

The degree of confidence in the event information has to be considered when including event data from systems in a different trust zone.

Where to Record Data

Applications commonly write event log data to the file system or a database (SQL or NoSQL). When using a database to keep logs, it is preferable to utilize a separate database account that is only used for writing log data and which has very restrictive database, table, function, and command permissions.

Which Events to Log

  • Always Log
    • Input validation failures
    • Output validation failures
    • Authentication successes and failures
    • Session management failures
    • Application errors and system events
    • Application and related system start-ups and shut-downs, and logging initialization
    • Use of higher-risk functionality including:
      • User administrator actions
      • Use of systems administration privileges
      • Use of default of shared accounts
      • Access to sensitive data
      • Encryption activities such as use or rotation of cryptographic keys
      • Creation or deletion of system-level objects
      • Data import and export
      • Submission and processing of user generated content - especially file uploads
  • Optionally Log
    • Sequencing failure
    • Excessive Use
    • Data changes
    • Fraud
    • Suspicious, unacceptable, or unexpected behavior
    • Modifications to configuration
    • Application code file and/or memory changes

Event Attributes

The application logs must record when, where, who, and what for each event.

  • When
    • Log date and time
    • Event date and time
  • Where
    • Application Identifier (name and version)
    • Application Address
    • Service
    • Geolocation
    • Window/form/page
    • Code location
  • Who (human or machine user)
    • Source address (user's IP address, user's device / machine identifier)
  • What
    • Type of Event
    • Severity of Event
    • Security Flag
    • Description
  • Other Info:
    • Include other information that you think might be useful

Data to Exclude

  • Application source Code
  • Session ID
  • Access Tokens
  • Sensitive Personal Data and Some forms of personal identifiable information
  • Authentication passwords
  • DB connection strings
  • Encryption keys
  • Bank account info and payment card holder data
  • Other sensitive information

Example of Logged Event:

{
"datetime": "2021-01-01T01:01:01-0700",
"appid": "foobar.netportal_auth",
"event": "AUTHN_login_success:joebob1",
"level": "INFO",
"description": "User joebob1 login successfully",
"useragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
"source_ip": "165.225.50.94",
"host_ip": "10.12.7.9",
"hostname": "portalauth.foobar.com",
"protocol": "https",
"port": "440",
"request_uri": "/api/v2/auth/",
"request_method": "POST",
"region": "AWS-US-WEST-2",
"geo": "USA"
}


Comments

You must be logged in to post a comment!

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


Insert Chart

ESC

View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language