Skip to content

temporian.io.format.GroupedOrSingleEventSetFormat #

Format choices for converting dictionaries to and from EventSets.

The GROUPED_BY_INDEX value is generally recommended as it is more efficient than SINGLE_EVENTS.

GROUPED_BY_INDEX class-attribute instance-attribute #

GROUPED_BY_INDEX = 'grouped_by_index'

Events in the same index are grouped together in a dictionary mapping index value, features and timestamps to actual values.

In this dictionary, the features and timestamp keys are mapped to numpy arrays containing one value per event, and index keys are mapped to single value python primitives (e.g., int, float, bytes).

The dtype of each numpy array matches the Temporian dtype. For instance, a Temporian feature with dtype=tp.int32 is stored as a numpy array with dtype=np.int32.

For example, an EventSet with 3 events and the following Schema:

features=[("f1", tp.int64), ("f2", tp.str_)]
indexes=[("i1", tp.int64), ("i2", tp.str_)]

would be represented as the following dictionary:

{
"timestamp": np.array([100.0, 101.0, 102.0], np.float64),
"f1": np.array([1, 2, 3], np.int64),
"f2": np.array([b"a", b"b", b"c"], np.bytes_),
"i1": 10,
"i2": b"x",
}

SINGLE_EVENTS class-attribute instance-attribute #

SINGLE_EVENTS = 'single_events'

Each event is represented as an individual dictionary of keys to unique values. Each index value, feature and timestamp is represented by an independent dictionary.

For example, the same EventSet with 3 events and the following Schema:

features=[("f1", tp.int64), ("f2", tp.str_)]
indexes=[("i1", tp.int64), ("i2", tp.str_)]

would be represented as the following dictionaries:

{"timestamp": 100.0, "f1": 1, "f2": b"a", "i1": 10, "i2": b"x"}
{"timestamp": 101.0, "f1": 2, "f2": b"b", "i1": 10, "i2": b"x"}
{"timestamp": 102.0, "f1": 3, "f2": b"c", "i1": 10, "i2": b"x"}