harmonization
harmonization ¤
DataHarmonizer ¤
DataHarmonizer(
dataframe: DataFrame,
time_column: str = "systime",
uuid_column: str = "uuid",
value_column: str = "value_double",
)
Bases: Base
Data Harmonization for multi-signal timeseries.
Provides utilities to pivot, resample, align, and fill gaps across multiple UUID-keyed signals stored in long (stacked) format.
Methods: - pivot_to_wide: Pivot long-format to wide-format (one column per UUID). - resample_to_uniform: Resample to a uniform time grid with interpolation. - detect_gaps: Identify time gaps per UUID exceeding a threshold. - fill_gaps: Fill detected gaps using various strategies. - align_asof: Align two UUID signals using merge_asof. - merge_multi_signals: End-to-end harmonization pipeline.
pivot_to_wide ¤
pivot_to_wide(aggfunc: str = 'first') -> pd.DataFrame
Pivot long-format DataFrame to wide-format with one column per UUID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
aggfunc
|
str
|
Aggregation function for duplicate timestamps ('first', 'mean', 'last'). |
'first'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with systime as index and one column per UUID. |
resample_to_uniform ¤
resample_to_uniform(
freq: str,
method: str = "linear",
fill_limit: Optional[int] = None,
) -> pd.DataFrame
Resample to a uniform time grid with interpolation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
freq
|
str
|
Pandas frequency string (e.g. '1s', '100ms', '1min'). |
required |
method
|
str
|
Interpolation method ('linear', 'time', 'nearest', 'quadratic', 'cubic'). |
'linear'
|
fill_limit
|
Optional[int]
|
Maximum number of consecutive NaNs to fill. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with uniform DatetimeIndex. |
detect_gaps ¤
detect_gaps(threshold: str = '10s') -> pd.DataFrame
Identify time gaps per UUID exceeding the threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
str
|
Minimum gap duration as a pandas Timedelta string. |
'10s'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns: uuid, gap_start, gap_end, gap_duration. |
fill_gaps ¤
fill_gaps(
strategy: str = "interpolate",
max_gap: Optional[str] = None,
fill_value: Optional[float] = None,
) -> pd.DataFrame
Fill gaps in the wide-format data using the specified strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
str
|
One of 'interpolate', 'ffill', 'bfill', 'constant'. |
'interpolate'
|
max_gap
|
Optional[str]
|
Maximum gap size to fill (pandas Timedelta string). Larger gaps remain NaN. |
None
|
fill_value
|
Optional[float]
|
Value used when strategy='constant'. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Wide-format DataFrame with gaps filled. |
align_asof ¤
align_asof(
left_uuid: str,
right_uuid: str,
tolerance: str = "1s",
direction: str = "nearest",
) -> pd.DataFrame
Align two UUID signals using merge_asof with configurable tolerance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
left_uuid
|
str
|
UUID of the left (reference) signal. |
required |
right_uuid
|
str
|
UUID of the right signal. |
required |
tolerance
|
str
|
Maximum time difference for matching. |
'1s'
|
direction
|
str
|
One of 'nearest', 'backward', 'forward'. |
'nearest'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with systime, value_left, value_right. |
merge_multi_signals ¤
merge_multi_signals(
uuids: Optional[List[str]] = None,
freq: Optional[str] = None,
method: str = "linear",
) -> pd.DataFrame
End-to-end harmonization: pivot, filter, resample, interpolate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uuids
|
Optional[List[str]]
|
Optional list of UUIDs to include. None means all. |
None
|
freq
|
Optional[str]
|
Optional resample frequency. None means no resampling. |
None
|
method
|
str
|
Interpolation method for resampling. |
'linear'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Wide-format DataFrame ready for cross-signal analytics. |