Scenario #1
Context:
You’ve been paged for an alert due to a sudden spike in HTTP 5xx errors in production. Your task is to quickly identify and summarize the spike from web server logs.
Objective:
Write a Python script that parses a given web server log_lines below and identifies time windows with a spike in 5xx HTTP errors.
Logs
log_lines = [
"2025-05-04 09:34:01,818 - INFO - 500 Service error",
"2025-05-04 09:39:20,912 - INFO - 500 Timeout",
"2025-05-04 09:55:45,999 - INFO - 503 Backend unavailable",
"2025-05-04 09:57:01,101 - INFO - 200 OK...",
"2025-05-04 09:58:20,912 - INFO - 500 Timeout",
"2025-05-04 09:58:45,999 - INFO - 503 Backend unavailable",
"2025-05-04 09:58:50,912 - INFO - 500 Timeout",
"2025-05-04 10:01:01,600 - INFO - 500 Service error",
"2025-05-04 10:01:20,710 - INFO - 500 Timeout"
]
Requirements:
- Read the log file and extract:
- Timestamp (minute granularity)
- HTTP status code
- Detect spike windows:
- A “spike” is defined as 2 or more requests with 5xx status codes in any one-minute window.
-
Output:
- Print all time windows (minute-resolution) that qualify as a spike.
- For each window, print the timestamp and number of 5xx errors.
- Example output:
Python Script
Note
To read a real log file instead of the hard‑coded list, just replace the log_lines loop with a file read:
How it works
| Step | What it does | Why it’s useful |
|---|---|---|
| Parse the log | Uses a regex to capture the timestamp and status code from each line. | Keeps the code simple – we only care about those two pieces of data. |
| Normalize to minutes | Converts each full timestamp to a string that contains only the year‑month‑day and hour‑minute. | A minute‑resolution key lets us aggregate counts efficiently. |
| Count 5xx errors | For each 5xx status code, increment a Counter entry keyed by the minute. |
We only want to flag minutes that hit the threshold. |
| Detect spikes | Scan the counter; any minute with 2+ errors is a spike. | The problem statement’s definition of a spike. |
| A small header followed by a friendly “Spike detected!” line for each minute. | Matches the example output you provided. |
The regular expression in plain English
| Piece | What it matches | Why we need it |
|---|---|---|
(?P<ts> … ) |
Named group called ts – the part of the string that will be retrieved as match.group('ts'). |
Lets us pull out the timestamp later without having to remember its position in the regex. |
\d{4} |
Exactly four digits (e.g., 2025). |
The year part of the timestamp. |
- |
A literal hyphen. | Separates year from month. |
\d{2} |
Exactly two digits (e.g., 05). |
The month. |
- |
Literal hyphen. | Separates month from day. |
\d{2} |
Exactly two digits (e.g., 04). |
The day. |
|
A single space. | Separates date from time. |
\d{2} |
Two digits for the hour (00‑23). | |
: |
Literal colon. | Separates hour from minute. |
\d{2} |
Two digits for the minute (00‑59). | |
: |
Literal colon. | Separates minute from second. |
\d{2} |
Two digits for the second (00‑59). | |
, |
Literal comma. | In this log format the millisecond separator is a comma, not a period. |
\d{3} |
Exactly three digits – the milliseconds. | |
) |
End of the named group ts. |
|
- .* - |
Matches the literal string " - " followed by any characters (as many as possible) and then another " - ". |
In the log line this section looks like - INFO -. We don’t care about the log level or any other text, so we just consume it with .*. The surrounding dashes act as clear boundaries. |
(?P<code>\d{3}) |
Named group called code that contains exactly three digits. |
This is the HTTP status code (e.g., 500, 503, 200). We’ll later cast it to an integer to test whether it’s a 5xx error. |
Full‑line example
(?P<ts> … )captures2025-05-04 09:34:01,818- .* -consumes- INFO -(the.*gobbles theINFO)(?P<code>\d{3})captures500- Everything after the code (
Service error) is ignored because the regex ends there.