Skip to content

harmonization

harmonization ¤

DataHarmonizer ¤

DataHarmonizer(
    dataframe: DataFrame,
    time_column: str = "systime",
    uuid_column: str = "uuid",
    value_column: str = "value_double",
)

Bases: Base

Data Harmonization for multi-signal timeseries.

Provides utilities to pivot, resample, align, and fill gaps across multiple UUID-keyed signals stored in long (stacked) format.

Methods: - pivot_to_wide: Pivot long-format to wide-format (one column per UUID). - resample_to_uniform: Resample to a uniform time grid with interpolation. - detect_gaps: Identify time gaps per UUID exceeding a threshold. - fill_gaps: Fill detected gaps using various strategies. - align_asof: Align two UUID signals using merge_asof. - merge_multi_signals: End-to-end harmonization pipeline.

pivot_to_wide ¤

pivot_to_wide(aggfunc: str = 'first') -> pd.DataFrame

Pivot long-format DataFrame to wide-format with one column per UUID.

Parameters:

Name Type Description Default
aggfunc str

Aggregation function for duplicate timestamps ('first', 'mean', 'last').

'first'

Returns:

Type Description
DataFrame

DataFrame with systime as index and one column per UUID.

resample_to_uniform ¤

resample_to_uniform(
    freq: str,
    method: str = "linear",
    fill_limit: Optional[int] = None,
) -> pd.DataFrame

Resample to a uniform time grid with interpolation.

Parameters:

Name Type Description Default
freq str

Pandas frequency string (e.g. '1s', '100ms', '1min').

required
method str

Interpolation method ('linear', 'time', 'nearest', 'quadratic', 'cubic').

'linear'
fill_limit Optional[int]

Maximum number of consecutive NaNs to fill.

None

Returns:

Type Description
DataFrame

DataFrame with uniform DatetimeIndex.

detect_gaps ¤

detect_gaps(threshold: str = '10s') -> pd.DataFrame

Identify time gaps per UUID exceeding the threshold.

Parameters:

Name Type Description Default
threshold str

Minimum gap duration as a pandas Timedelta string.

'10s'

Returns:

Type Description
DataFrame

DataFrame with columns: uuid, gap_start, gap_end, gap_duration.

fill_gaps ¤

fill_gaps(
    strategy: str = "interpolate",
    max_gap: Optional[str] = None,
    fill_value: Optional[float] = None,
) -> pd.DataFrame

Fill gaps in the wide-format data using the specified strategy.

Parameters:

Name Type Description Default
strategy str

One of 'interpolate', 'ffill', 'bfill', 'constant'.

'interpolate'
max_gap Optional[str]

Maximum gap size to fill (pandas Timedelta string). Larger gaps remain NaN.

None
fill_value Optional[float]

Value used when strategy='constant'.

None

Returns:

Type Description
DataFrame

Wide-format DataFrame with gaps filled.

align_asof ¤

align_asof(
    left_uuid: str,
    right_uuid: str,
    tolerance: str = "1s",
    direction: str = "nearest",
) -> pd.DataFrame

Align two UUID signals using merge_asof with configurable tolerance.

Parameters:

Name Type Description Default
left_uuid str

UUID of the left (reference) signal.

required
right_uuid str

UUID of the right signal.

required
tolerance str

Maximum time difference for matching.

'1s'
direction str

One of 'nearest', 'backward', 'forward'.

'nearest'

Returns:

Type Description
DataFrame

DataFrame with systime, value_left, value_right.

merge_multi_signals ¤

merge_multi_signals(
    uuids: Optional[List[str]] = None,
    freq: Optional[str] = None,
    method: str = "linear",
) -> pd.DataFrame

End-to-end harmonization: pivot, filter, resample, interpolate.

Parameters:

Name Type Description Default
uuids Optional[List[str]]

Optional list of UUIDs to include. None means all.

None
freq Optional[str]

Optional resample frequency. None means no resampling.

None
method str

Interpolation method for resampling.

'linear'

Returns:

Type Description
DataFrame

Wide-format DataFrame ready for cross-signal analytics.