Anomaly Detection with Prophet

How to detect the unexpected? Is the behaviour of some measured value normal or did something unexpected happen?

To answer these questions we need to detect anomalous behaviour in a time series. In this article I want to show you how we can do this with prophet.

Prophet is a library written by Facebook in python and R for prediction of time series.

So for anomaly detection we train our model according to the known values except the last n. Then we predict the last n values and compare the predictions with the truth. If they differ we call them an anomaly.

Here’s an example with some data of website usage.

The data

The data consists of weekly data of the sessions of a website. The date is the Monday of the following week.

You can see clearly the drop in website usage at the end of the year. During the summer there’s also less usage. This spring there’s an all time high. (Guess Covid-19 started to spread then.)

Predictions with Prophet

So we have to compute the last date ( end_date) we can use for training.

Prophet expects a data.frame with a column named ds containing the date and a column y containing the depending data value. So let's transform our data:

So let’s train our model.

Now that our model is fitted let’s use it to do some predictions.

First we need to generate a data.frame containing the dates we want to predict. Prophet provides a handy function make_future_dataframe.

We also only want to predict Mondays because our training data only consists of Mondays.

The predict function returns the predictions for each row.

The result forecast contains the forecast ( yhat) and an uncertainty interval ( yhat_upper and yhat_lower).

Visualization

But we can use ggplot2 as well:

We can even build a function to highlight good and bad predictions:

We can also get a visualization of the components:

Holidays

But in our example we need to normalize (or mon dify as I call it) because our time series only consists of Mondays.

So here’s a lengthy function defining holidays in Germany:

As we can see the drops at New Year are slightly better predicted.

Consent Layer or What happened in 2020?

The answer is simple: Because of GDPR a consent layer was implemented asking the user if she accepts the tracking via Google Analytics or if she declines it.

When she declined it she still could access the website but she weren’t tracked anymore. So it seemed there were less sessions.

So how can we adjust the model?

We can add an additional regressor which indicates whether the consent layer was active or not.

There are two ways to add the additional regressor:

  • additive and
  • mulitplicative

The difference is if the effect of this regressor is additive or multiplicative. In our use case I think multiplactive is a good choice because in reallity a certain fraction of all users will decline the tracking pixel.

Additive Additional Regressor

Multiplicative Additional Regressor

The Anomaly Detection

Let’s pimp our plotting function:

So we’ve trained our model without the last number_of_weeks weeks. Now we predict these weeks. The predictions are shown as triangles.

As we can see two weeks were slightly better than predicted, the other two fall within the prediction corridor.

So there was no big anomaly within the last four weeks.