Unsupervised anomaly detection with Temporian and scikit-learn¶

In this tutorial, we use Temporian and Scikit-Learn to detect anomalies in a multivariate time series dataset.

Anomaly detection in time series, time sequences and other temporal formats is critical in a variety of domains. For instance, it is used in manufacturing to detect equipment failure and production lines, in computer systems and by financial institutions to detect fraudulent activities, and in energy management to detect outages.

We will use the Server Machine Dataset (SMD) dataset, published as part of the OmniAnomaly paper, which is available in CSV files in that same repository.

This dataset contains aggregated resource usage metrics (e.g., CPU and RAM utilization, network traffic) from 28 computers in 3 data centers over a period of 5 weeks. The timestamps and feature names have been anonymized and normalized. Therefore, calendar feature engineering or the use of expert knowledge is not possible.

The dataset will be loaded, feature engineered and converted into a tabular dataset using Temporian. Then, we will use an isolation forest model on this tabular data to detect anomalies. Finally, we will evaluate the quality of our detection on the ground truth anomalies, available as part of the testing dataset.

We will first evaluate our detections both with AUC-ROCs (Area Under the Receiver operating Characteristic curve) and with AMOCs (Activity Monitoring Operating Characteristic). Unlike an ROC, an AMOC takes into account the time to detection or forecasting horizon which is often critical in temporal problems like ours. If you don't know about AMOCs yet, don't worry. We will explain it.

Check out the Supervised anomaly detection tutorial for a version of this notebook that trains a supervised model using the ground truth labels, which is less common in an anomaly detection setting, but can yield better performance if those are available.

Installation and imports¶

In [26]:

Copied!

%pip install temporian -q
%pip install temporian -q

[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.

In [27]:

Copied!





import os
from pathlib import Path
from typing import List

import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.metrics import roc_auc_score
import urllib.request

import temporian as tp
import os
from pathlib import Path
from typing import List

import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.metrics import roc_auc_score
import urllib.request

import temporian as tp

Download dataset¶

The dataset is made of 3 groups of 8, 9, and 11 machines respectively, with names "machine-1-1", ..., "machine-3-11". Let's list the machine names, and download records locally.

In [28]:

Copied!

# machines_per_group = [8, 9, 11]  # Full dataset
machines_per_group = [3, 3, 3]  # Subset of data

machines = [f"machine-{group}-{id}" for group, machine in zip(range(1, 4), machines_per_group) for id in range(1, machine + 1)]
print("{len(machines)} machines")
# machines_per_group = [8, 9, 11]  # Full dataset
machines_per_group = [3, 3, 3]  # Subset of data

machines = [f"machine-{group}-{id}" for group, machine in zip(range(1, 4), machines_per_group) for id in range(1, machine + 1)]
print("{len(machines)} machines")

{len(machines)} machines

In [29]:

Copied!





data_dir = Path("tmp/temporian_server_machine_dataset")
dataset_url = "https://raw.githubusercontent.com/NetManAIOps/OmniAnomaly/master/ServerMachineDataset"

data_dir.mkdir(parents=True, exist_ok=True)

# Download the data and labels for each machine to its own folder
for machine in machines:
    print(f"Download data of {machine}")

    machine_dir = data_dir / machine
    machine_dir.mkdir(exist_ok=True)

    data_path = machine_dir / "data.csv"
    if not data_path.exists():
        urllib.request.urlretrieve(f"{dataset_url}/test/{machine}.txt", data_path)

    labels_path = machine_dir / "labels.csv"
    if not labels_path.exists():
         urllib.request.urlretrieve(f"{dataset_url}/test_label/{machine}.txt", labels_path)
data_dir = Path("tmp/temporian_server_machine_dataset")
dataset_url = "https://raw.githubusercontent.com/NetManAIOps/OmniAnomaly/master/ServerMachineDataset"

data_dir.mkdir(parents=True, exist_ok=True)

# Download the data and labels for each machine to its own folder
for machine in machines:
    print(f"Download data of {machine}")

    machine_dir = data_dir / machine
    machine_dir.mkdir(exist_ok=True)

    data_path = machine_dir / "data.csv"
    if not data_path.exists():
        urllib.request.urlretrieve(f"{dataset_url}/test/{machine}.txt", data_path)

    labels_path = machine_dir / "labels.csv"
    if not labels_path.exists():
         urllib.request.urlretrieve(f"{dataset_url}/test_label/{machine}.txt", labels_path)

Download data of machine-1-1
Download data of machine-1-2
Download data of machine-1-3
Download data of machine-2-1
Download data of machine-2-2
Download data of machine-2-3
Download data of machine-3-1
Download data of machine-3-2
Download data of machine-3-3

Load data¶

We use Pandas to load the data into a DataFrame and perform tabular manipulation before importing it into a Temporian as an EventSet.

We index the data by the machine column in Temporian. This was, we can apply Temporian transformations on each machine individually as well as across all machines.

In [30]:

Copied!





dataframes = []

for machine in machines:
    machine_dir = data_dir / machine

    # Read the data and labels
    print(f"Load data of {machine}...", end="")
    df = pd.read_csv(machine_dir / "data.csv", header=None).add_prefix("f")
    labels = pd.read_csv(machine_dir/ "labels.csv", header=None)
    df = df.assign(label=labels)
    df["machine"] = machine
    df["timestamp"] = range(df.shape[0])
    print(f"found {df.shape[0]} events")

    dataframes.append(df)

dataframes[0].head(3)
dataframes = []

for machine in machines:
    machine_dir = data_dir / machine

    # Read the data and labels
    print(f"Load data of {machine}...", end="")
    df = pd.read_csv(machine_dir / "data.csv", header=None).add_prefix("f")
    labels = pd.read_csv(machine_dir/ "labels.csv", header=None)
    df = df.assign(label=labels)
    df["machine"] = machine
    df["timestamp"] = range(df.shape[0])
    print(f"found {df.shape[0]} events")

    dataframes.append(df)

dataframes[0].head(3)

Load data of machine-1-1...found 28479 events
Load data of machine-1-2...found 23694 events
Load data of machine-1-3...found 23703 events
Load data of machine-2-1...found 23694 events
Load data of machine-2-2...found 23700 events
Load data of machine-2-3...found 23689 events
Load data of machine-3-1...found 28700 events
Load data of machine-3-2...found 23703 events
Load data of machine-3-3...found 23703 events

Out[30]:

	f0	f1	f2	f3	f5	f6	f8	f9	...	f31	f32	f33	f34	f35	machine	timestamp
0	0.075269	0.065678	0.070234	0.074332	0.933333	0.274011	0.031081	0.000000	...	0.048893	0.000386	0.000034	0.064432	0.064500	machine-1-1	0
1	0.086022	0.080508	0.075808	0.076655	0.930769	0.274953	0.031081	0.000122	...	0.050437	0.000386	0.000022	0.065228	0.065224	machine-1-1	1
2	0.075269	0.064619	0.071349	0.074332	0.928205	0.274953	0.030940	0.000366	...	0.055069	0.000386	0.000045	0.067111	0.067178	machine-1-1	2

3 rows × 41 columns

In [31]:

Copied!

# Convert the dataframes into a single Temporian EventSet

evset = tp.combine(*map(tp.from_pandas, dataframes))

# Index the EventSet according the the machine name.
evset = evset.set_index("machine")

# Cast the feature and label to a smaller dtypes to same one memory.
evset = evset.cast(tp.float32).cast({"label": tp.int32})

evset
# Convert the dataframes into a single Temporian EventSet

evset = tp.combine(*map(tp.from_pandas, dataframes))

# Index the EventSet according the the machine name.
evset = evset.set_index("machine")

# Cast the feature and label to a smaller dtypes to same one memory.
evset = evset.cast(tp.float32).cast({"label": tp.int32})

evset

WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).
WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).

Out[31]:

features [39]: f0 (float32) , f1 (float32) , f2 (float32) , f3 (float32) , f4 (float32) , f5 (float32) , f6 (float32) , f7 (float32) , f8 (float32) , f9 (float32) , f10 (float32) , f11 (float32) , f12 (float32) , f13 (float32) , f14 (float32) , f15 (float32) , f16 (float32) , f17 (float32) , f18 (float32) , f19 (float32) , f20 (float32) , f21 (float32) , f22 (float32) , f23 (float32) , f24 (float32) , f25 (float32) , f26 (float32) , f27 (float32) , f28 (float32) , f29 (float32) , f30 (float32) , f31 (float32) , f32 (float32) , f33 (float32) , f34 (float32) , f35 (float32) , f36 (float32) , f37 (float32) , label (int32)

indexes [1]: machine (str_)

events: 223065

index values: 9

memory usage: 36.6 MB

index ( machine: machine-1-1 ) with 28479 events

timestamp	f0	f1	f2	f3	f4	f5	f6	f7	f8	f9	f10	f11	f12	f13	f14	f15	…
0	0.07527	0.06568	0.07023	0.07433	0	0.9333	0.274	0	0.03108	0	0.1341	0.08108	0.0274	0.06781	0.1258	0.1506	…
1	0.08602	0.08051	0.07581	0.07666	0	0.9308	0.275	0	0.03108	0.000122	0.1488	0.1622	0.0548	0.0714	0.1231	0.1645	…
2	0.07527	0.06462	0.07135	0.07433	0	0.9282	0.275	0	0.03094	0.000366	0.1348	0.0946	0.0274	0.06328	0.129	0.1515	…
3	0.08602	0.04873	0.06355	0.07085	0	0.9282	0.2731	0	0.02725	0.000244	0.1313	0.08108	0.0274	0.06784	0.1104	0.1456	…
4	0.08602	0.05191	0.06243	0.07085	0	0.9333	0.274	0	0.03094	0.000244	0.1027	0.1081	0.0411	0.07565	0.1191	0.1184	…
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…

index ( machine: machine-1-2 ) with 23694 events

timestamp	f0	f1	f2	f3	f4	f5	f6	f7	f8	f9	f10	f11	f12	f13	f14	f15	…
0	0.23	0.1095	0.1266	0.1338	0.2264	0.621	0.4093	0	9.3e-05	1.5e-05	0	0.00722	0	0.03695	0.003551	0.02929	…
1	0.25	0.1089	0.1271	0.134	0.2264	0.5473	0.337	0	9.3e-05	1.5e-05	0	0.009025	0	0.03926	0.004129	0.03305	…
2	0.24	0.1228	0.1308	0.1353	0.2264	0.4785	0.2648	0	0.000653	0.000134	8e-06	0.01805	0	0.09858	0.009166	0.09242	…
3	0.23	0.1093	0.1286	0.1348	0.2264	0.4841	0.2648	0	0.001026	0.000134	8e-06	0.01444	0	0.102	0.01292	0.09136	…
4	0.24	0.09617	0.1254	0.1338	0.2264	0.485	0.2649	0	0.000187	1.5e-05	0	0.02347	0	0.02632	0.009579	0.02257	…
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…

index ( machine: machine-1-3 ) with 23703 events

timestamp	f0	f1	f2	f3	f4	f5	f6	f7	f8	f9	f10	f11	f12	f13	f14	f15	…
0	0.2424	0.01391	0.01491	0.01739	0.9286	0.4351	0.374	0	0.002196	0.000394	0	0.03985	0	0.01917	0.01931	0.01111	…
1	0.2424	0.01385	0.01479	0.01739	0.9286	0.4373	0.3764	0	0.000119	0	0	0.01414	0	0.008432	0.002625	0.00104	…
2	0.2222	0.01391	0.01498	0.01755	0.9286	0.436	0.3749	0	0.000178	0	0	0.01671	0	0.007965	0.003243	0.000964	…
3	0.2323	0.009957	0.01373	0.01708	0.9286	0.4347	0.3734	0	0.000178	0	0	0.01028	0	0.007965	0.002789	0.000832	…
4	0.2323	0.009734	0.01324	0.01692	0.9286	0.4353	0.374	0	0.000297	0	0	0.01671	0	0.008803	0.004606	0.001191	…
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…

index ( machine: machine-2-1 ) with 23694 events

timestamp	f0	f1	f2	f3	f4	f5	f6	f7	f8	f9	f10	f11	f12	f13	f14	f15	…
0	0.1111	0.009689	0.0171	0.04285	0.1579	0.8356	0.4777	0	0	0	0	0.03304	0	0.01214	0.005469	0.01656	…
1	0.101	0.006856	0.01541	0.04118	0.1579	0.8359	0.4779	0	0	0	0	0.02261	0	0.0238	0.003045	0.02151	…
2	0.1212	0.01429	0.01815	0.04248	0.1579	0.839	0.4815	0	0.001073	0	0	0.01391	0	0.06879	0.01759	0.04456	…
3	0.1313	0.01273	0.01872	0.04267	0.1579	0.8518	0.4956	0	0.001877	0	0	0.01565	0	0.06783	0.03362	0.1281	…
4	0.1111	0.009114	0.01751	0.04155	0.1579	0.8665	0.5114	0	0	0	0	0.01913	0	0.02304	0.001367	0.01609	…
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…

… (5 more indexes not shown)

Awesome! Seems like each machine has more than 20.000 events and 39 features (counting the "label" one).

As stated previously, all metrics seem to be anonymized and normalized to [0, 1], so we won't need to take care of that ourselves.

Data visualization¶

Let's take a look at some of the first machine's features.

In [32]:

Copied!

# Plot the first 3 features
evset.plot(indexes="machine-1-1", max_num_plots=3)

# Plot the labels
evset["label"].plot(indexes="machine-1-1")
# Plot the first 3 features
evset.plot(indexes="machine-1-1", max_num_plots=3)

# Plot the labels
evset["label"].plot(indexes="machine-1-1")

The number of plots (39) is larger than "options.max_num_plots=3". Only the first plots will be printed.

No description has been provided for this image

Great! A lot to unpack here:

It seems to be easy to understand when an anomaly occurs (label takes value of 1) by looking at the other plots. Features 11 to 14, for example, seem to be very correlated to the label.
The data seems to have some periodicity to it.
Some features seem empty, and we could evaluate dropping them if needed.

Data preparation¶

To prepare our data to train a model on it, let's start off by separating the features from the labels.

In [33]:

Copied!





feature_names = evset.schema.feature_names()
feature_names.remove("label")

raw_features = evset[feature_names]
labels = evset["label"]

print("Raw features:", raw_features.schema)
print("Labels:", labels.schema)
feature_names = evset.schema.feature_names()
feature_names.remove("label")

raw_features = evset[feature_names]
labels = evset["label"]

print("Raw features:", raw_features.schema)
print("Labels:", labels.schema)

Raw features: features: [('f0', float32), ('f1', float32), ('f2', float32), ('f3', float32), ('f4', float32), ('f5', float32), ('f6', float32), ('f7', float32), ('f8', float32), ('f9', float32), ('f10', float32), ('f11', float32), ('f12', float32), ('f13', float32), ('f14', float32), ('f15', float32), ('f16', float32), ('f17', float32), ('f18', float32), ('f19', float32), ('f20', float32), ('f21', float32), ('f22', float32), ('f23', float32), ('f24', float32), ('f25', float32), ('f26', float32), ('f27', float32), ('f28', float32), ('f29', float32), ('f30', float32), ('f31', float32), ('f32', float32), ('f33', float32), ('f34', float32), ('f35', float32), ('f36', float32), ('f37', float32)]
indexes: [('machine', str_)]
is_unix_timestamp: False

Labels: features: [('label', int32)]
indexes: [('machine', str_)]
is_unix_timestamp: False

Next, we'll need to split our dataset into train and testing sets, which we'll use an 80/20 split for.

Note that the train labels will only be used for evaluation purposes.

We'll be creating reusable functions for each step, since we'll do some iteration over the feature engineering -> training -> evaluation cycle.

In [34]:

Copied!





DROP_COLS = ["machine", "timestamp"]

def split_event_set(evset: tp.EventSet, test_frac: float = 0.2):

    # Average length of the records
    average_duration = np.mean([len(machine_data.timestamps) for _, machine_data in evset.data.items()])
    print("average_duration:", average_duration)

    # Select the train/test cutoff.
    # Note: All the machines are cut at the same time. This way, we can apply pre-processing that
    # exchange data between the machines without risk of label leakage!
    train_cutoff = average_duration * (1 - test_frac)
    print("train_cutoff:", train_cutoff)

    # Compute masks and split data based on cutoff
    train_mask = evset.timestamps() <= int(train_cutoff)
    test_mask = ~train_mask

    # Split EventSets
    train_evset = evset.filter(train_mask)
    test_evset = evset.filter(test_mask)

    print(f"Train events: {train_evset.num_events()}")
    print(f"Test events: {test_evset.num_events()}")

    return train_evset, test_evset

def evsets_to_tabular(*evsets: tp.EventSet) -> List[np.ndarray]:
    datasets = []
    for evset in evsets:
        # Fill missing values with -1 (raw data doesn't have any, but we will create some in our feature engineering)
        df = tp.to_pandas(evset).fillna(-1)

        # Remove timestamp and machine columns
        df = df.drop(columns=DROP_COLS)

        # Convert to numpy (and convert from 2D to 1D array if it has a single feature, in the case of the labels)
        arr = df.to_numpy().squeeze()

        datasets.append(arr)

    return datasets

features_train, features_test = split_event_set(raw_features)
labels_train, labels_test = split_event_set(labels)  # Note that we won't be using the train labels for training in this unsupervised setting

X_train, X_test, y_train, y_test = evsets_to_tabular(features_train, features_test, labels_train, labels_test)
DROP_COLS = ["machine", "timestamp"]

def split_event_set(evset: tp.EventSet, test_frac: float = 0.2):

    # Average length of the records
    average_duration = np.mean([len(machine_data.timestamps) for _, machine_data in evset.data.items()])
    print("average_duration:", average_duration)

    # Select the train/test cutoff.
    # Note: All the machines are cut at the same time. This way, we can apply pre-processing that
    # exchange data between the machines without risk of label leakage!
    train_cutoff = average_duration * (1 - test_frac)
    print("train_cutoff:", train_cutoff)

    # Compute masks and split data based on cutoff
    train_mask = evset.timestamps() <= int(train_cutoff)
    test_mask = ~train_mask

    # Split EventSets
    train_evset = evset.filter(train_mask)
    test_evset = evset.filter(test_mask)

    print(f"Train events: {train_evset.num_events()}")
    print(f"Test events: {test_evset.num_events()}")

    return train_evset, test_evset

def evsets_to_tabular(*evsets: tp.EventSet) -> List[np.ndarray]:
    datasets = []
    for evset in evsets:
        # Fill missing values with -1 (raw data doesn't have any, but we will create some in our feature engineering)
        df = tp.to_pandas(evset).fillna(-1)

        # Remove timestamp and machine columns
        df = df.drop(columns=DROP_COLS)

        # Convert to numpy (and convert from 2D to 1D array if it has a single feature, in the case of the labels)
        arr = df.to_numpy().squeeze()

        datasets.append(arr)

    return datasets

features_train, features_test = split_event_set(raw_features)
labels_train, labels_test = split_event_set(labels)  # Note that we won't be using the train labels for training in this unsupervised setting

X_train, X_test, y_train, y_test = evsets_to_tabular(features_train, features_test, labels_train, labels_test)

average_duration: 24785.0
train_cutoff: 19828.0
Train events: 178461
Test events: 44604
average_duration: 24785.0
train_cutoff: 19828.0
Train events: 178461
Test events: 44604

Training¶

Having done all that work to prepare our data, all that remains is to train our model.

In [35]:

Copied!





contamination = y_train.sum() / len(y_train)
print(f"{contamination=}")

def train(X_train):
    model = IsolationForest(
        contamination=contamination,
        n_jobs=-1,
        verbose=1,
        random_state=0,
    )
    model.fit(X_train)
    return model
contamination = y_train.sum() / len(y_train)
print(f"{contamination=}")

def train(X_train):
    model = IsolationForest(
        contamination=contamination,
        n_jobs=-1,
        verbose=1,
        random_state=0,
    )
    model.fit(X_train)
    return model

contamination=0.04820661096822275

In [36]:

Copied!

model = train(X_train)
model = train(X_train)

[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   2 out of   8 | elapsed:    0.4s remaining:    1.3s
[Parallel(n_jobs=8)]: Done   8 out of   8 | elapsed:    0.5s finished

Evaluation using AUC-ROC¶

We'll be reporting the model's ROC AUC score, which provides an aggregate measure of performance across all possible classification thresholds (since our model outputs an anomaly score for each sample, and in a real-world scenario it would be up to us to define the thershold from which we consider an event to be marked as anomalous).

In [37]:

Copied!





figsize=(20,3)

results = {}

def evaluate(model, X_test, y_test, name):
    """Evaluates a model on its training data and unseen test data, computing accuracy score and plotting ground truth vs predictions."""
    # Compute and print scores
    roc_auc = roc_auc_score(y_test, -model.score_samples(X_test))

    results[name] = {"ROC AUC": roc_auc}

    print("Results:")
    print(pd.DataFrame(results))

    # Plot predictions
    preds = model.predict(X_test)
    preds[preds == 1] = 0
    preds[preds == -1] = 1
    tp.event_set(timestamps=list(range(len(y_test))), features={"labels": y_test, "predictions": preds}).plot(style="vline")
figsize=(20,3)

results = {}

def evaluate(model, X_test, y_test, name):
    """Evaluates a model on its training data and unseen test data, computing accuracy score and plotting ground truth vs predictions."""
    # Compute and print scores
    roc_auc = roc_auc_score(y_test, -model.score_samples(X_test))

    results[name] = {"ROC AUC": roc_auc}

    print("Results:")
    print(pd.DataFrame(results))

    # Plot predictions
    preds = model.predict(X_test)
    preds[preds == 1] = 0
    preds[preds == -1] = 1
    tp.event_set(timestamps=list(range(len(y_test))), features={"labels": y_test, "predictions": preds}).plot(style="vline")

In [38]:

Copied!

evaluate(model, X_test, y_test, "raw features")
evaluate(model, X_test, y_test, "raw features")

Results:
         raw features
ROC AUC      0.675865

That's pretty decent for a first try! Our model seems to be learning.

The plotted predictions seem to be all over the place though, with the model sporadically predicting anomaly events throughout in several non-anomalous periods.

There's plenty of room for improvement, so let's kick off the feature engineering!

Evaluation using Detection AMOC¶

A detection AMOC shows the relation between false alerts (or equivalently, the average time between false alerts) and the average time it takes to detect anomalies.

Using Temporian, we compute an AMOC as follows.

The objective is to detect anomalous events called "anomalies". An anomaly is created each time the "label" (or "target") is true. Anomalies are detected using a numerical "signal" and a "threshold". An "alert" is created each time the signal is above the threshold.

In this dataset, two anomalies that happen close in time can be considered the same anomaly. Therefor, we apply a "shutoff" on the anomalies: Two anomalies cannot happen within "missed_target_shutoff" of each other. A "shutoff" can also be applied on the "alert". Instead, we apply the shutoff on the "false alerts" (see next definition).

A "false alert" is an alert not detecting an anomaly event, i.e. an alert at time t without any anomaly in [t-anomaly_window, t]. A "missed anomaly" is an anomaly not detected by any alert, i.e. an anomaly at time t without any alert in [t, t+anomaly_window]. The "time to detection" of an anomaly at time t is the time between the anomaly and the first following alert in [t, t+abnomaly_window]`. If not detected by an alert, the "time to detection" of an anomaly is related by the "anomaly_window" (this makes the plots easier to interpret).

Here is a summary:

We can compute an AMOC on a model or any other numerical signal. For instance, we can compute the AMOC on a feature. We can find which one of the features f1and f2 alone can best detect anomalies.

In [39]:

Copied!





# @title Define "compute_detection_amoc" and "plot_detection_amoc" to compute and plot amocs.

from dataclasses import dataclass
from typing import Optional, Dict, List
import matplotlib.pyplot as plt

@dataclass
class ThresholdEval:
    """Model metric for each threshold value applied on the score."""

    threshold: float
    num_false_alerts: int
    num_missed_anomalies: int
    average_time_to_detection: float
    total_record_duration: float
    num_machines: int

def compute_detection_amoc(target : tp.EventSet,
                   signal : tp.EventSet,
                   detection_thresholds: List[float] = np.linspace(0,1,100, dtype=np.float32).tolist(),
                   anomaly_window : float = 1000.,
                   anomaly_shutoff : float = 200.,
                   false_alert_shutoff : float = 200.,
                   plot_first: bool = True,
                  ) -> List[ThresholdEval]:
    """Computes a detection AMOC.

    Args:
        target: A boolean feature defining an anomaly to detect.
        signal: A numerical feature detecting recent anomalies. A signal above a threshold becomes an alert.
        detection_thresholds: List of evaluated detection thresholds applied on the "signal".
        anomaly_window: For how long an anomaly can be detected.
        anomaly_shutoff: Minimum distance between two anomalies.
        false_alert_shutoff: Minimum distance between two missed alerts.
        plot_first: Plot the alerts, anomalies of the first index and the mid-threshold.
    """

    # List the anomalies
    anomalies = target.cast(tp.bool_).filter()
    anomalies = anomalies.filter_moving_count(anomaly_shutoff)

    # Final timestamp used to collect all the metrics.
    end_of_records = signal.end()
    end_of_records_global = end_of_records.drop_index(keep=False).end()

    def sum_by_index(evset) -> int:
        """Sums of the last value of a feature in all the indexes."""
        assert len(evset.schema.features) == 1 # There should be only one feature

        return float(evset.drop_index(keep=False).cumsum(sampling=end_of_records_global).get_arbitrary_index_data().features[0][0])

    # The duration of record in each index is the time distance between the first and last event.
    record_duration = signal.begin().since_last(sampling=signal.end())
    total_record_duration = sum_by_index(record_duration)
    print("total_record_duration:",total_record_duration)

    # There is one index per machine
    num_machines = len(target.data)
    print("num_machines:",num_machines)

    num_anomalies = int(anomalies.drop_index(keep=False).moving_count(np.inf, sampling=end_of_records_global).get_arbitrary_index_data().features[0][0])
    print("num_anomalies:",num_anomalies)

    # Can also be computed this way:
    assert num_anomalies == np.sum([len(machine.timestamps) for key, machine in anomalies.data.items()])

    amoc = []

    for threshold_idx, threshold in enumerate(detection_thresholds):
        is_mid_threshold = threshold_idx == (len(detection_thresholds)//2)

        alerts = (signal >= threshold).filter()[[]]

        # False alerts
        false_alerts = anomalies.moving_count(anomaly_window, sampling=alerts).equal(0).filter()
        false_alerts = false_alerts.filter_moving_count(false_alert_shutoff)
        num_false_alerts = false_alerts.moving_count(np.inf, sampling=end_of_records)

        # Missed targets
        missed_anomalies = alerts.moving_count(anomaly_window, sampling=anomalies.lag(anomaly_window)).equal(0).filter().leak(anomaly_window)
        missed_anomalies = missed_anomalies.filter_moving_count(false_alert_shutoff)
        num_missed_anomalies = missed_anomalies.moving_count(np.inf, sampling=end_of_records)

        # Time to detection
        time_to_detection = anomalies.until_next(sampling=alerts, timeout=anomaly_window)
        # Set the time to detection of non detected anomalies with the maximum anomaly_window.
        time_to_detection = time_to_detection.isnan().where(anomaly_window, time_to_detection)
        # Note: Some machines don't have alerts
        sum_time_to_detection = time_to_detection.cumsum(sampling=end_of_records)

        if plot_first and is_mid_threshold:
            anomalies.name = "anomalies"
            alerts.name = "alerts"
            missed_anomalies.name = "missed_anomalies"
            false_alerts.name = "false_alerts"
            tp.plot([
                target,
                signal,
                anomalies,
                alerts,
                missed_anomalies,
                false_alerts
                ],
                    indexes="machine-1-1",
                   height_per_plot_px=100)

        amoc.append(ThresholdEval(
            threshold=threshold,
            num_false_alerts=sum_by_index(num_false_alerts) / num_machines,
            num_missed_anomalies=sum_by_index(num_missed_anomalies) / num_machines,
            average_time_to_detection=sum_by_index(sum_time_to_detection) / num_anomalies,
            total_record_duration=total_record_duration / num_machines,
            num_machines=num_machines,
        ))

    return amoc

def plot_detection_amoc(amoc_dict: Dict[str, List[ThresholdEval]]):

    # Plot the AMOC and other related plots
    fig, axs = plt.subplots(1, 2, figsize=(5*2, 3), squeeze=False)

    for name, amoc in amoc_dict.items():
        ax = axs[0,0]
        ax.plot([e.average_time_to_detection for e in amoc],
                [e.num_false_alerts for e in amoc],
               label=name)

        ax = axs[0,1]
        ax.plot([e.average_time_to_detection for e in amoc if e.num_false_alerts != 0],
                [e.total_record_duration / e.num_false_alerts for e in amoc if e.num_false_alerts != 0],
               label=name)

    ax = axs[0,0]
    ax.set_xlabel("Avg. time to detection")
    ax.set_ylabel("Num. false alerts")
    ax.legend()
    ax.grid(True)

    ax = axs[0,1]
    ax.set_xlabel("Avg. time to detection")
    ax.set_ylabel("Avg. time between false alerts")
    ax.set_yscale("log")
    ax.invert_xaxis()
    ax.legend()
    ax.grid(True)

amoc_f1 = compute_detection_amoc(evset["label"], evset["f1"])
amoc_f2 = compute_detection_amoc(evset["label"], evset["f2"], plot_first=False)

plot_detection_amoc({
    "f1":amoc_f1,
    "f2":amoc_f2,
})
# @title Define "compute_detection_amoc" and "plot_detection_amoc" to compute and plot amocs.

from dataclasses import dataclass
from typing import Optional, Dict, List
import matplotlib.pyplot as plt

@dataclass
class ThresholdEval:
    """Model metric for each threshold value applied on the score."""

    threshold: float
    num_false_alerts: int
    num_missed_anomalies: int
    average_time_to_detection: float
    total_record_duration: float
    num_machines: int

def compute_detection_amoc(target : tp.EventSet,
                   signal : tp.EventSet,
                   detection_thresholds: List[float] = np.linspace(0,1,100, dtype=np.float32).tolist(),
                   anomaly_window : float = 1000.,
                   anomaly_shutoff : float = 200.,
                   false_alert_shutoff : float = 200.,
                   plot_first: bool = True,
                  ) -> List[ThresholdEval]:
    """Computes a detection AMOC.

    Args:
        target: A boolean feature defining an anomaly to detect.
        signal: A numerical feature detecting recent anomalies. A signal above a threshold becomes an alert.
        detection_thresholds: List of evaluated detection thresholds applied on the "signal".
        anomaly_window: For how long an anomaly can be detected.
        anomaly_shutoff: Minimum distance between two anomalies.
        false_alert_shutoff: Minimum distance between two missed alerts.
        plot_first: Plot the alerts, anomalies of the first index and the mid-threshold.
    """

    # List the anomalies
    anomalies = target.cast(tp.bool_).filter()
    anomalies = anomalies.filter_moving_count(anomaly_shutoff)

    # Final timestamp used to collect all the metrics.
    end_of_records = signal.end()
    end_of_records_global = end_of_records.drop_index(keep=False).end()

    def sum_by_index(evset) -> int:
        """Sums of the last value of a feature in all the indexes."""
        assert len(evset.schema.features) == 1 # There should be only one feature

        return float(evset.drop_index(keep=False).cumsum(sampling=end_of_records_global).get_arbitrary_index_data().features[0][0])

    # The duration of record in each index is the time distance between the first and last event.
    record_duration = signal.begin().since_last(sampling=signal.end())
    total_record_duration = sum_by_index(record_duration)
    print("total_record_duration:",total_record_duration)

    # There is one index per machine
    num_machines = len(target.data)
    print("num_machines:",num_machines)

    num_anomalies = int(anomalies.drop_index(keep=False).moving_count(np.inf, sampling=end_of_records_global).get_arbitrary_index_data().features[0][0])
    print("num_anomalies:",num_anomalies)

    # Can also be computed this way:
    assert num_anomalies == np.sum([len(machine.timestamps) for key, machine in anomalies.data.items()])

    amoc = []

    for threshold_idx, threshold in enumerate(detection_thresholds):
        is_mid_threshold = threshold_idx == (len(detection_thresholds)//2)

        alerts = (signal >= threshold).filter()[[]]

        # False alerts
        false_alerts = anomalies.moving_count(anomaly_window, sampling=alerts).equal(0).filter()
        false_alerts = false_alerts.filter_moving_count(false_alert_shutoff)
        num_false_alerts = false_alerts.moving_count(np.inf, sampling=end_of_records)

        # Missed targets
        missed_anomalies = alerts.moving_count(anomaly_window, sampling=anomalies.lag(anomaly_window)).equal(0).filter().leak(anomaly_window)
        missed_anomalies = missed_anomalies.filter_moving_count(false_alert_shutoff)
        num_missed_anomalies = missed_anomalies.moving_count(np.inf, sampling=end_of_records)

        # Time to detection
        time_to_detection = anomalies.until_next(sampling=alerts, timeout=anomaly_window)
        # Set the time to detection of non detected anomalies with the maximum anomaly_window.
        time_to_detection = time_to_detection.isnan().where(anomaly_window, time_to_detection)
        # Note: Some machines don't have alerts
        sum_time_to_detection = time_to_detection.cumsum(sampling=end_of_records)

        if plot_first and is_mid_threshold:
            anomalies.name = "anomalies"
            alerts.name = "alerts"
            missed_anomalies.name = "missed_anomalies"
            false_alerts.name = "false_alerts"
            tp.plot([
                target,
                signal,
                anomalies,
                alerts,
                missed_anomalies,
                false_alerts
                ],
                    indexes="machine-1-1",
                   height_per_plot_px=100)

        amoc.append(ThresholdEval(
            threshold=threshold,
            num_false_alerts=sum_by_index(num_false_alerts) / num_machines,
            num_missed_anomalies=sum_by_index(num_missed_anomalies) / num_machines,
            average_time_to_detection=sum_by_index(sum_time_to_detection) / num_anomalies,
            total_record_duration=total_record_duration / num_machines,
            num_machines=num_machines,
        ))

    return amoc

def plot_detection_amoc(amoc_dict: Dict[str, List[ThresholdEval]]):

    # Plot the AMOC and other related plots
    fig, axs = plt.subplots(1, 2, figsize=(5*2, 3), squeeze=False)

    for name, amoc in amoc_dict.items():
        ax = axs[0,0]
        ax.plot([e.average_time_to_detection for e in amoc],
                [e.num_false_alerts for e in amoc],
               label=name)

        ax = axs[0,1]
        ax.plot([e.average_time_to_detection for e in amoc if e.num_false_alerts != 0],
                [e.total_record_duration / e.num_false_alerts for e in amoc if e.num_false_alerts != 0],
               label=name)

    ax = axs[0,0]
    ax.set_xlabel("Avg. time to detection")
    ax.set_ylabel("Num. false alerts")
    ax.legend()
    ax.grid(True)

    ax = axs[0,1]
    ax.set_xlabel("Avg. time to detection")
    ax.set_ylabel("Avg. time between false alerts")
    ax.set_yscale("log")
    ax.invert_xaxis()
    ax.legend()
    ax.grid(True)

amoc_f1 = compute_detection_amoc(evset["label"], evset["f1"])
amoc_f2 = compute_detection_amoc(evset["label"], evset["f2"], plot_first=False)

plot_detection_amoc({
    "f1":amoc_f1,
    "f2":amoc_f2,
})

total_record_duration: 223056.0
num_machines: 9
num_anomalies: 132
total_record_duration: 223056.0
num_machines: 9
num_anomalies: 132

We can see that both features performs relatively equally. For instance, with an average time to detection of 400 time-units, both features generate approximately ~30 false alerts for each machine.

What about our detection model now?

In [40]:

Copied!





def model_prediction_to_evset(model):
    df = tp.to_pandas(features_test)[["machine", "timestamp"]]
    df["detection"] = -model.score_samples(X_test)
    df["label"] = y_test
    evset = tp.from_pandas(df).add_index("machine")
    return evset

pred_evset = model_prediction_to_evset(model)
amoc_model_raw_features = compute_detection_amoc(pred_evset["label"], pred_evset["detection"])

plot_detection_amoc({
    "f1":amoc_f1,
    "f2":amoc_f2,
    "model": amoc_model_raw_features,
})
def model_prediction_to_evset(model):
    df = tp.to_pandas(features_test)[["machine", "timestamp"]]
    df["detection"] = -model.score_samples(X_test)
    df["label"] = y_test
    evset = tp.from_pandas(df).add_index("machine")
    return evset

pred_evset = model_prediction_to_evset(model)
amoc_model_raw_features = compute_detection_amoc(pred_evset["label"], pred_evset["detection"])

plot_detection_amoc({
    "f1":amoc_f1,
    "f2":amoc_f2,
    "model": amoc_model_raw_features,
})

WARNING:root:Feature "machine" is an array of numpy.object_ and will be casted to numpy.string_ (Note: numpy.string_ is equivalent to numpy.bytes_).

total_record_duration: 44595.0
num_machines: 9
num_anomalies: 31

The model is much better than the features alone. This is expected, but it is a good check to do :).

For example, the model is able to detect the anomalies within 200 time-units while raising ~8 false alerts. By comparison, using f1or f2alone will generate ~55 false alerts for the same time to detection

Wrapping up¶

In this notebook we learned how to perform feature engineering and visualization using Temporian, applying it to a real-world anomaly detection use case.