Unify events with identical timestamps¶
This recipe shows how to avoid having duplicated timestamps in an EventSet
. Events with identical timestamps are aggregated with a moving window operation (e.g: sum, average, max, min), preserving the original timestamp values (which may be non-uniform).
For example, assume we've asynchronous sensor measurements, potentially from different sources. If there are two measurements at the same exact timestamp, we want to unify them and take their average value.
Example data¶
Let's define some events with non-uniform timestamps to illustrate the use case. Some of the timestamps are repeated, those are the ones that we'll unify.
But, we've to be careful because there are events very close in time, but not actually duplicated. We don't want to interfere with those.
import temporian as tp
sensor_evset = tp.event_set(timestamps=[1.1, 2.01, 2.02, 2.02, 3.5, 3.51, 3.51, 4.5, 5.0],
features={"y": [1., 2., 3., 4., 5., 6., 7., 8., 9.],
"z": [10., 20., 30., 40., 50., 60., 70., 80., 90.]
}
)
sensor_evset.plot()
Solution¶
In order to unify only the events with the exact same timestamp, we need to:
- Get the list of unique timestamps.
- Aggregate events at the exact same timestamp, making sure the moving window doesn't overlap with nearby measurements.
1. Get unique timestamps¶
The first step is to create a new sampling removing the duplicated timestamps at 2.02
and 3.51
:
# Remove duplicated timestamps
unique_t = sensor_evset.unique_timestamps()
unique_t
timestamp |
---|
1.1 |
2.01 |
2.02 |
3.5 |
3.51 |
4.5 |
5 |
2. Moving window with shortest length¶
To create a moving window that doesn't overlap with two different timestamps at any point, it must be smaller than the smallest possible step. But we want a solution that works for any resolution, from daily sales to nano-second sensor measurements.
In tp.duration.shortest
, we've defined the shortest possible interval that can be represented with a float64
timestamp at maximum resolution:
shortest_length = tp.duration.shortest
shortest_length
5e-324
Pretty small, right? Since null durations are not allowed, this is as close to zero as we can get. It's guaranteed that you'll never overlap two different timestamps using this.
Now we just need to run the aggregation function that we need, providing this small number as window_length
and the unique timestamps as sampling
:
unified_evset = sensor_evset.simple_moving_average(window_length=shortest_length, sampling=unique_t)
unified_evset
timestamp | y | z |
---|---|---|
1.1 | 1 | 10 |
2.01 | 2 | 20 |
2.02 | 3.5 | 35 |
3.5 | 5 | 50 |
3.51 | 6.5 | 65 |
4.5 | 8 | 80 |
5 | 9 | 90 |
Of course, instead of the average value, other moving window operations like moving_min
or moving_max
could make more sense depending on the use case. If multiple measurements are expected at each timestamp, you could also want the moving standard deviation to get a confidence interval.
Also, keep in mind that this exact procedure would work well in an EventSet
with multiple indexes, removing the duplicated timestamps in each index separately.
But let's keep the example simple for now 🙂