Aggregate events at a fixed interval¶

This recipe aggregates possibly non-uniformly sampled events into fixed-length intervals (e.g., seconds, hours, days, or weeks). In other words, it converts the event features into time series.

For example, suppose we have the sales log from a store, where each sold item is represented by an event. Let's assume each sale event has a date-time, the sale price and the unit cost of the product. We want to calculate total daily sales, with one single event at 00:00 each day.

Example data¶

Let's create some sale events with non-uniform sampling and the mentioned features.

In [1]:

Copied!





import pandas as pd
import temporian as tp

sales_data = pd.DataFrame(
    data=[
        # sale timestamp,   price, cost
        ["2020-01-01 13:04", 3.0,  1.0],
        ["2020-01-01 13:04", 5.0,  2.0],  # duplicated timestamp
        ["2020-01-02 15:24", 7.0,  3.0],
        ["2020-01-03 13:45", 3.0,  1.0],
        ["2020-01-03 16:10", 7.0,  3.0],
        ["2020-01-03 17:30", 10.0, 5.0],
        ["2020-01-06 10:10", 4.0,  2.0],
        ["2020-01-06 19:35", 3.0,  1.0],
    ],
    columns=[
        "timestamp",
        "unit_price",
        "unit_cost",
    ],
)

sales_evset = tp.from_pandas(sales_data)
sales_evset.plot()
import pandas as pd
import temporian as tp

sales_data = pd.DataFrame(
    data=[
        # sale timestamp,   price, cost
        ["2020-01-01 13:04", 3.0,  1.0],
        ["2020-01-01 13:04", 5.0,  2.0],  # duplicated timestamp
        ["2020-01-02 15:24", 7.0,  3.0],
        ["2020-01-03 13:45", 3.0,  1.0],
        ["2020-01-03 16:10", 7.0,  3.0],
        ["2020-01-03 17:30", 10.0, 5.0],
        ["2020-01-06 10:10", 4.0,  2.0],
        ["2020-01-06 19:35", 3.0,  1.0],
    ],
    columns=[
        "timestamp",
        "unit_price",
        "unit_cost",
    ],
)

sales_evset = tp.from_pandas(sales_data)
sales_evset.plot()

No description has been provided for this image

Solution¶

We want to calculate total daily sales. So this is what we can do:

Create a uniform sampling with one tick per day (could be any other interval), at time 00:00:00.
Add up all sales that happened between 00:00:01 from the previous day, and the current tick at 00:00:00.

1. Create uniform sampling¶

In [2]:

Copied!





# Define the time span to cover: one week
time_span = tp.event_set(timestamps=["2020-01-01 00:00", "2020-01-07 00:00"])

# Create daily ticks at 00:00
interval = tp.duration.days(1)
ticks = time_span.tick(interval)

ticks
# Define the time span to cover: one week
time_span = tp.event_set(timestamps=["2020-01-01 00:00", "2020-01-07 00:00"])

# Create daily ticks at 00:00
interval = tp.duration.days(1)
ticks = time_span.tick(interval)

ticks

Out[2]:

features [0]: none

indexes [0]: none

events: 7

index values: 1

memory usage: 456 B

index ( ) with 7 events

timestamp
2020-01-01 00:00:00+00:00
2020-01-02 00:00:00+00:00
2020-01-03 00:00:00+00:00
2020-01-04 00:00:00+00:00
2020-01-05 00:00:00+00:00
2020-01-06 00:00:00+00:00
2020-01-07 00:00:00+00:00

2. Aggregate the events¶

Now we can aggregate the events between ticks, in this case by running a moving sum over the specified sampling=ticks, with the window_length equal to the interval between ticks.

Note that all moving window operators support the sampling argument, so any other kind of aggregation could be used depending on the use case (e.g: moving average, max, min).

In [3]:

Copied!

# Provide uniform ticks as sampling
moving_sum = sales_evset.moving_sum(window_length=interval, sampling=ticks)

moving_sum
# Provide uniform ticks as sampling
moving_sum = sales_evset.moving_sum(window_length=interval, sampling=ticks)

moving_sum

Out[3]:

features [2]: unit_price (float64) , unit_cost (float64)

indexes [0]: none

events: 7

index values: 1

memory usage: 0.8 kB

index ( ) with 7 events

timestamp	unit_price	unit_cost
2020-01-01 00:00:00+00:00	0	0
2020-01-02 00:00:00+00:00	8	3
2020-01-03 00:00:00+00:00	7	3
2020-01-04 00:00:00+00:00	20	9
2020-01-05 00:00:00+00:00	0	0
2020-01-06 00:00:00+00:00	0	0
2020-01-07 00:00:00+00:00	7	3

(Optional) Rename and plot¶

Finally, we can rename features to match their actual meaning after aggregation.

In this case we also calculate and plot the daily profit.

In [4]:

Copied!

# Rename aggregated features
daily_sales = moving_sum.rename({"unit_price": "daily_revenue", "unit_cost": "daily_cost"})

# Profit = revenue - cost
daily_profit = (daily_sales["daily_revenue"] - daily_sales["daily_cost"]).rename("daily_profit")

daily_profit.plot()
# Rename aggregated features
daily_sales = moving_sum.rename({"unit_price": "daily_revenue", "unit_cost": "daily_cost"})

# Profit = revenue - cost
daily_profit = (daily_sales["daily_revenue"] - daily_sales["daily_cost"]).rename("daily_profit")

daily_profit.plot()

In [ ]: