Time Series Anomaly Detection: A Field Study

Georgian
5 min readDec 2, 2022

--

By: Benjamin Ye

This is our third and final blog post on anomaly detection in time series. If you’ve been following along with us: thank you for your patience! 🎉 If this is the first time you’ve stumbled across this series, fear not! You can find the first two posts here (the why & use cases) and here (technical deep dive).

In the last post, we left you with a cliffhanger (sorry!) after briefly mentioning our time series anomaly detection package, pyoats (docs). The package allows you to quickly experiment with different scoring algorithms and thresholding methods.

In this post, we’ll guide you through how the package works using real-world data. Put your detective hats on, we’re going to do some field work! 🕵️

Preparation

You can download the package on pip using command `pip install pyoats`. If you’re running this in your own environment, it is recommended to install via conda or build via docker image. You can find more detailed installation instructions within our Github repo and package documentation.

The Data

The data we’re working with is NYC Taxi ridership data from 2014 to 2015. The data is aggregated in 30-minute windows. Within the data, we’ll encounter sudden spikes in ridership due to New Year celebrations and a prolonged lull in volume due to snow storms — it’s up to us to find these with our anomaly detection system!

You can download the data from the NAB repo with these commands:

# NYC Taxi Data
!wget https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv
# Labels
!wget https://raw.githubusercontent.com/numenta/NAB/master/labels/raw/known_labels_v1.0.json

Data Exploration

Let’s take a look at the data by plotting it with matplotlib.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,3)
# Loading Labels
with open("known_labels_v1.0.json") as f:
labels = json.load(f)
labels = labels["realKnownCause/nyc_taxi.csv"]
# Loading Data
df = pd.read_csv("nyc_taxi.csv", index_col=0, parse_dates=True)
# Plotting Data with Anomalies
plt.plot(df)
for label in labels:
date = np.datetime64(label)
plt.plot(date, df.loc[date], "ro")
plt.show()

We can see that the ridership is very periodic in nature and we can definitely observe the anomalous effects of holidays and natural events.

Let’s also take a look at the average week of a taxi driver in New York.


# An Average Week
df_week = df.copy()
df_week["day_of_week"] = df.index.dayofweek
df_week_grouped = df_week.groupby("day_of_week").mean()
df_week_grouped = df_week_grouped.rename({"value":"average"}, axis=1)
df_week_grouped["25th Percentile"] = df_week.groupby("day_of_week").quantile(0.25)
df_week_grouped["75th Percentile"] = df_week.groupby("day_of_week").quantile(0.75)
df_week_grouped["min"] = df_week.groupby("day_of_week").min()
df_week_grouped["max"] = df_week.groupby("day_of_week").max()
df_week_grouped.plot.line()

We can see that there’s generally a decrease in ridership on Sundays. But on one Sunday, we saw an unusual amount of rides — let’s see if we can find that.

We can also plot the average day in the life of a taxi driver.

# An Average Day
df_hour = df.copy()
df_hour["hour"] = df.index.hour
df_hour_grouped = df_hour.groupby("hour").mean()
df_hour_grouped = df_hour_grouped.rename({"value":"average"}, axis=1)
df_hour_grouped["25th Percentile"] = df_hour.groupby("hour").quantile(0.25)
df_hour_grouped["75th Percentile"] = df_hour.groupby("hour").quantile(0.75)
df_hour_grouped["min"] = df_hour.groupby("hour").min()
df_hour_grouped["max"] = df_hour.groupby("hour").max()
df_hour_grouped.plot.line()

We can see that it’s generally really quiet starting from midnight until 5 a.m. when people start to go to work. Then we see a dip during dinnertime and an increase as people start to go out to party and return home. But interestingly, we saw a spike just after midnight on the busiest day! Could that be when people start to get home after the ball-drop on New Year’s Day at Times Square?

Anomaly Scores

To catch these, let’s use Matrix Profile to calculate the anomaly score. In general, predictive methods work well for point anomalies while reconstruction and distance-based methods work well for pattern anomalies.

from oats.models import *
TRAIN_SIZE = 5000
mp_model = MatrixProfileModel(window=48, use_gpu=True) # Try changing the model or window size
# Since Matrix Profile doesn't require fitting, let's just get the scores on the whole set!
mp_scores = mp_model.get_scores(df.values.astype(np.double))
mp_scores[:TRAIN_SIZE] = 0 #To emulate the fact that we only care about points after TRAIN_SIZE

Let’s see the result:

ax, (data, anom_scores) = plt.subplots(2)
data.plot(df)
anom_scores.plot(mp_scores)

Seems like Matrix Profile is able to catch both point anomalies and pattern anomalies on our data.

Thresholding

Now onto the thresholding step. Let’s take the top 5% of anomaly scores and label them as anomalies.


from oats.threshold import *
threshold = QuantileThreshold() # Try changing thresholding method
threshold_values = threshold.get_threshold(mp_scores[TRAIN_SIZE:], percentile=0.95)
threshold_values = np.pad(threshold_values, (TRAIN_SIZE,0), mode="edge") # Extending the threshold leftward for a better looking chart (not necessary)
anomalies = mp_scores > threshold_values
anomaly_idx = np.where(anomalies==True)

We can now graph the results:


ax, (data, anom_scores) = plt.subplots(2)
data.plot(df)
for label in labels:
date = np.datetime64(label)
data.plot(date, df.loc[date], "ro")
anom_scores.plot(pd.DataFrame(mp_scores, index=df.index))
anom_scores.plot(pd.DataFrame(threshold_values, index=df.index), " - ")
for anom in anomaly_idx:
anom_scores.plot(df.index[anom], mp_scores[anom], "ro")

It seems like we are able to catch all the known anomalies provided by NAB repo.

Evaluation

We can now use the results from above to calculate quantitative metrics such as recall, precision, and F1. To evaluate the results, we can use pyoats’ built-in Scorer method (with built-in support for point adjustment).

from oats.scorer import SupervisedScorer

#Scorer expects a flattened label, such as [0,0,0,1,0..] where 1 is anomaly
flat_labels = np.zeros(len(df))
label_idx = np.intersect1d(df.index, [np.datetime64(l) for l in labels], return_indices=True)[1]
flat_labels[label_idx] = 1

​​
# Scoring
scorer = SupervisedScorer()
scorer.process(anomalies, flat_labels)

print(f"Recall: {scorer.recall:.1%}\nPrecision: {scorer.precision:.1%}\nF1: {scorer.f1:.1%}")

Recall: 80.0%

Precision: 1.5%

F1: 3.0%

F1 of 0.03 seems artificially low. This is because we have a lot of pattern anomalies that persist for more than one period but in our labels we only have single-period labels. We can try to improve the metric by extending a 24-hour grace period (48 timesteps) on detection.


# Adding 24-hour grace period
grace = 24 * 2
flat_labels_grace = flat_labels.copy()
for i in range(1, grace):
flat_labels_grace += np.roll(flat_labels.copy(), i)
scorer = SupervisedScorer()
scorer.process(anomalies, flat_labels_grace)
print(f"Recall: {scorer.recall:.1%}\nPrecision: {scorer.precision:.1%}\nF1: {scorer.f1:.1%}")

Recall: 100.0%

Precision: 65.0%

F1: 78.8%

Now we have metrics that are closer to the actual performance of our detector!

Closing Words

This wraps up our series of blogs on Time Series Anomaly Detection. We hope that these posts have helped you familiarize yourself with both the theories and practical tools for detecting anomalous events on time series data.

Lastly, we encourage you to play around with the code — try different detection and thresholding methods and see if you can improve upon our baseline. You can access the Colab notebook here and our docs as well as API reference here.

--

--

Georgian

Investors in high-growth business software companies across North America. Applied artificial intelligence, security and privacy, and conversational AI.