GA4 Anomaly Detection: Automated Monitoring for 100+ Properties

1. How it started

I found out about the drop three weeks after it had begun. The site had lost 35% of its organic traffic. The client didn't know. I didn't know. Three weeks of lost traffic, three weeks in which earlier intervention would have been more effective — three weeks you don't get back.

I opened GA4 by chance that day, while checking something else. The graph was unmistakable. I checked the dates, ran the numbers, read the history. The problem had started twenty-one days earlier. I hadn't received any notification because I had no system monitoring that property. And at that point I was already managing dozens of sites.

The problem isn't a lack of data. GA4 produces it in industrial quantities — sessions, users, events, channels, conversions, for every day of every property. The problem is that nobody can open 100 properties every morning, compare yesterday's numbers against the prior week, assess whether a drop is physiological or alarming, and determine whether it comes from the calendar, a technical issue, or a real organic traffic decline. It's not a matter of discipline or laziness. It's arithmetic: the time required exceeds the time available by an order of magnitude. I chose to automate instead of ignore.

2. The right question

I started by building something simple: a notification whenever a property varied by more than 20% compared to the previous day. Simple to implement, intuitive as a concept. Within three days I had received enough Telegram messages to mute the notifications entirely.

The problem wasn't technical. It was conceptual. I had framed the wrong question. I had asked "how do I detect changes?" when I should have asked "how do I distinguish significant changes from expected ones?".

A Monday structurally carries less traffic than a Friday. August has less traffic than October. August 15th is not comparable to a Tuesday in March. An e-commerce site has weekend spikes that a B2B site never sees. A paid campaign launched the day before inflates numbers in a way that has nothing to do with the organic health of the site. A fixed threshold applied to this data produces continuous noise. And a system that generates continuous noise becomes invisible — worse than having no system at all, because it creates the habit of ignoring.

The false positive problem

A monitoring system that notifies too often is worse than no system. The habit of ignoring notifications is more dangerous than the absence of notifications.

3. Three-signal detection

The solution wasn't a more refined threshold. It was a fundamentally different approach: instead of comparing today against yesterday, cross three independent signals that capture different dimensions of the anomaly. Only when multiple signals converge in the same direction, with sufficient intensity, does the system classify an anomaly.

3.1 Same-day-of-week comparison

Today's traffic is measured against the same day of the week across prior weeks. A Monday is compared to historical Mondays, not to the adjacent Sunday or Friday. This eliminates the structural variance between days of the week, which for many properties is the primary source of noise in a fixed-threshold system.

The comparison doesn't use a simple average. It applies a z-score that accounts for the standard deviation intrinsic to that specific day for that specific property. A property that is naturally volatile on Thursdays — perhaps because of a weekly newsletter that drives traffic spikes — has a reference baseline that incorporates that volatility. The system doesn't alert for variations that fall within its normal oscillation range. If a historically stable property experiences an anomalous drop relative to its own history, the signal emerges precisely.

3.2 Week-over-week trend

The first signal answers the question "is this Monday normal compared to other Mondays?". The second answers a different question: "but were those other Mondays already declining?".

A property might appear "within normal range" when compared to prior Mondays — but if those Mondays were themselves declining against the ones before them, the aggregate signal tells a different story. The system computes a week-over-week trend across up to three prior weeks with time-decaying weights: the most recent week carries 50% of the signal, two weeks ago 30%, three weeks ago 20%. More recent data counts more because it better reflects the current situation.

This second signal catches the gradual declines the first one misses: a property in slow but steady decline that stays "within normal range" compared to its immediate history, but is accumulating deterioration that only becomes visible when you look at the overall trend across the past few weeks.

3.3 Italian holiday calendar awareness

The third signal comes from a practical observation: many of the false positives in the initial system coincided with public holidays. Ferragosto. Easter. The November bridge days. The system was reading as anomalies what were simply physiological drops tied to the Italian calendar.

I implemented full calendar awareness. Not just fixed public holidays — New Year's Day, Epiphany, Liberation Day, Ferragosto, Immaculate Conception, Christmas — but also Easter computed dynamically year by year using the Meeus/Jones/Butcher algorithm (valid for any year in the Gregorian calendar), bridge days that form automatically when a holiday falls on a Tuesday or Thursday, and the extended Christmas period from December 24th through January 3rd.

When a drop coincides with a public holiday or bridge day, the weight of the negative signal is reduced significantly. When the previous day was a holiday, a traffic spike — recovery after the prior day's inactivity — is dampened to avoid generating a false positive in the opposite direction. The system knows that December 26th is not just another Wednesday. It knows that the Monday after Easter is not just another Monday. It knows that November 2nd, if it falls on a Monday after All Saints' Day, is almost certainly a bridge day.

3.4 Composite score and severity levels

The three signals are combined into a composite score on a -3/+3 scale. Positive indicates anomalous growth, negative indicates anomalous decline. Only when the score exceeds a configurable threshold and the percentage variation simultaneously exceeds a configurable minimum is an anomaly classified. Below that threshold, the system stays silent.

The system distinguishes two levels: warning (moderate variation, worth watching) and anomaly (significant variation, requires investigation). There is also an "empty" state for properties that are historically inactive or have negligible traffic: they are not treated as broken sites, but simply as dormant ones. Across a portfolio of 100+ properties this distinction is essential to avoid drowning in irrelevant notifications — there are always properties on pause, under construction, or simply sleeping.

Level	Score	Condition	Action
Normal	< 1.0	Variation within expected range	None
Warning	≥ 1.0	Moderate variation	Monitor
Anomaly	≥ 1.5	Significant variation	Investigate
Empty	—	Historically no traffic	Ignore

Detection system severity levels

4. The AI layer: the patterns statistical signals cannot see

Statistical three-signal detection is excellent at catching daily shocks. But there is an entire category of problems that manifests gradually, below the threshold of any daily variance metric.

At some point I realized the system was seeing earthquakes but not the fault lines shifting slowly underneath. A site losing 4% of organic traffic every week for three consecutive months never triggers a daily alert — each individual day the decline falls within normal variance. Yet by the end of the quarter, that site has lost over 40% of its organic traffic. Another property might have had a severe anomaly six weeks ago, partially recovered, but never returned to its prior baseline — a regression invisible to variance-based metrics.

I tried to encode these patterns into deterministic rules. I failed. They were too contextual, too dependent on each property's specific history, too sensitive to combinations of factors that vary from site to site. Every time I thought I'd captured a pattern, I found a case that contradicted it. Eventually I stopped looking for the rule and passed the question directly to a language model, feeding it 90 days of historical data for every property.

Each property is analyzed with 90 days of daily sessions plus the traffic channel mix of the last 30 days compared against the preceding 60 — organic, direct, paid, social, referral, each as a percentage share and as an absolute value. The model receives precise instructions on categories of patterns to look for: gradual week-over-week declines that never reach the daily anomaly threshold; failed recoveries after past anomalies (the site "recovered" in numbers but never returned to its prior baseline); correlations between properties of the same account that might indicate a tracking configuration issue; channel mix shifts — in particular an organic decline masked by an increase in paid share; traffic sources that were active in the prior 60 days and have disappeared in the last 30; excessive dependency on a single channel that represents a structural risk.

The report follows a fixed structure: critical situations (require immediate action), elements to monitor (weak signals that could develop), positive trends (growth worth noting), and five prioritized recommendations with real numbers. Not "organic traffic has declined" but "organic traffic dropped from 65% to 40% of total in the last 30 days, with a loss of 340 organic sessions per week compared to the prior period". The report can be generated from the dashboard for any property, or sent directly via Telegram with one click.

5. How it works daily

Every day, automatically, without me opening any tool, the system runs three operations in sequence.

Data retrieval: for every property in the database, the GA4 Data API is queried for the previous day's sessions broken down by traffic channel. The three signals are computed, the composite score updated, the property status classified, and the data point archived to the historical record.
Anomaly notifications: all properties in warning or anomaly status are grouped by account and sent via Telegram in a structured message. Each line shows the property name, session count, percentage variation, and trend direction. Normal properties generate no notification.
Channel update: the traffic channel breakdown is updated to feed the AI longitudinal analysis layer.

Now I sit down at the computer in the morning already knowing whether something needs attention. If the phone didn't notify anything overnight, the 100 sites are fine. If there is a notification, I already know which property, which account, what the scale of the problem is, and I already have enough context to decide whether to act immediately or monitor. The time between an anomaly occurring and my awareness of it has gone from weeks to hours.

6. The dashboard

Telegram notifications handle urgent situations. The dashboard handles everything else: the portfolio overview, single-property analysis, AI report generation, threshold configuration.

The main screen shows all properties grouped by account with their current status — a single glance across the entire portfolio. Properties in anomaly are highlighted. Clicking a property opens the detail view: the value of all three signals, the percentage variation, the last two weeks of history, and — when present — the specific reason for the anomaly. Not just "this site is anomalous", but "28% decline compared to prior Mondays, week-over-week trend negative for three consecutive weeks, no relevant holidays in the period".

From here it's possible to generate the 90-day AI report for any property, or send it directly via Telegram. The dashboard also shows the history of past anomalies for each site, useful for identifying recurring patterns and verifying whether a recovery was complete or only partial.

7. Architecture and cost

The entire detection logic runs on serverless edge functions. There is no persistent server to maintain, no process to manage, no fixed infrastructure cost. The functions are invoked by the cron job once per day, do their work in a few minutes, and return to idle for the next 23+ hours. Cost is proportional to actual usage, and actual usage is minimal.

Adding a new property takes under a minute. You provide the service account, the initialization function independently collects the prior 30 days of historical data and calibrates the baseline parameters for that specific site, and the property is monitored from the following day. There is no manual threshold configuration — the system calibrates itself to each property's historical variability.

100+

Properties monitored

with automatic calibration for each one

< €1

Monthly operational cost

for 100+ properties, all included

< 1 min

New property onboarding

automatic historical initialization

< 24h

Time to detection

from anomaly onset to notification

8. Data model

The database is structured around six main tables, each with a precise responsibility.

ga4_credentials — the service account JSON for each property, encrypted at rest.
ga4_properties — the current status of each property: status, daily sessions, percentage variation, and a JSONB anomaly_details field with the full anomaly detail when present.
historical_sessions — the daily time series for each property, used to compute the three signals. Each record stores date, sessions, and intermediate signal values for debugging.
telegram_settings — Telegram bot configuration per account: token, chat ID, and notification preferences.
anomaly_settings — configurable thresholds per account or per individual property: threshold, min_sessions (below which no notification is sent), min_percentage_change, and time windows.
traffic_sources_history — daily per-channel breakdown (organic, direct, paid, social, referral), feeds the AI longitudinal layer.

The anomaly_details JSONB field in ga4_properties is what makes the dashboard useful rather than just a traffic light. It stores the anomaly level, the values of all three signals with their relative contribution to the composite score, the last two weeks of historical context, and the detection timestamp. This makes it possible to answer "why is this site anomalous?" without recomputing anything — all the context is already there.

9. What didn't work first

The current system is the third architecture. The first two failed for different reasons, and from each failure I extracted something that informed the next version.

The first solution was a spreadsheet with conditional formulas that compared each day against a seven-day moving average and colored cells red when the variation exceeded 15%. It required manual updating every morning — export data from GA4, paste, wait for formulas to recalculate. I abandoned it after two weeks. Not out of laziness, but because it required exactly the daily attention I was trying to eliminate. Partial automation tends to break at the manual part.

The second solution was a Google Apps Script that automated the import and sent an email every morning with the report. It worked for a few weeks, then GA4 changed something in the reporting API and the script stopped working silently. No more emails, no visible errors. I discovered it was broken when I opened GA4 to check manually after a few days of suspicious silence. The problem wasn't the code — it was that nobody was checking whether the system itself was still running.

The third solution — the current one — was built from the analysis of those two failures. From the spreadsheet: automation must be complete, not partial, otherwise the failure point is always the human part. From Apps Script: the system must be observable — if it stops working I need to know immediately, not three weeks later. From the first fixed-threshold version: the signal must be contextual, not absolute, because context — day of week, season, calendar — is half the information.

The most important lesson

The value of a monitoring system is measured not by how many anomalies it detects, but by how many notifications you don't ignore. A system that generates too much noise is as useless as one that doesn't work — but more insidious, because it gives the illusion of being covered.

10. A tool I use every day

It is not an elegant tool in the aesthetic sense. The interface is functional, not beautiful. The architecture is pragmatic, not optimized to look impressive in a technical presentation. It is a tool that works, that I use every day, and that has already saved me from at least a dozen uncomfortable conversations with clients.

The kind of conversation it prevents is specific: the one where the client asks "did you see that traffic dropped 30% last week?" and I have to say no. That conversation costs more in trust than building an automatic monitoring system ever costs. Since this system has been running, I haven't had that conversation.

What surprised me most, looking back, is how close to zero the cost actually is. Not figuratively: literally. Serverless APIs, the marginal cost of edge functions, affordable AI models for longitudinal analysis — the combination makes accessible today something that two years ago would have required infrastructure with significant fixed monthly costs. Timing mattered.

If you manage many sites and recognize this problem, get in touch — I'm happy to discuss it.

Analytics Anomaly Detection: Automated Surveillance for 100+ GA4 Properties