data_gap_analysis
data_gap_analysis ¤
DataGapAnalysisEvents ¤
DataGapAnalysisEvents(
dataframe: DataFrame,
signal_uuid: str,
*,
event_uuid: str = "quality:data_gap",
value_column: str = "value_double",
time_column: str = "systime"
)
Bases: Base
Quality: Data Gap Analysis
Answer the question "where are the holes in my data?" by analysing
gaps in a numeric signal's timestamps. Complements
:class:SignalQualityEvents (which detects individual missing-data
events) with pattern-level analysis: gap summaries, per-period
coverage, and interpolation-candidate identification.
Methods: - find_gaps: Locate all gaps longer than a threshold. - gap_summary: Aggregate statistics across all gaps. - coverage_by_period: Data coverage percentage per time window. - interpolation_candidates: Gaps small enough to interpolate safely.
find_gaps ¤
find_gaps(min_gap: str = '5s') -> pd.DataFrame
Locate all gaps longer than min_gap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_gap
|
str
|
Minimum gap duration to report (e.g. |
'5s'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns: gap_start, gap_end, gap_duration_seconds, |
DataFrame
|
samples_before_gap, samples_after_gap. |
gap_summary ¤
gap_summary(min_gap: str = '5s') -> pd.DataFrame
Aggregate statistics across all gaps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_gap
|
str
|
Minimum gap duration to include (e.g. |
'5s'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Single-row DataFrame with columns: total_gaps, total_missing_seconds, |
DataFrame
|
longest_gap_seconds, shortest_gap_seconds, mean_gap_seconds, |
DataFrame
|
data_span_seconds, gap_fraction. |
coverage_by_period ¤
coverage_by_period(
freq: str = "1h", min_gap: str | None = None
) -> pd.DataFrame
Data coverage percentage per time window.
For each window, reports how much of the window actually contains data (based on first and last sample timestamps minus any internal gaps).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
freq
|
str
|
Resample frequency (e.g. |
'1h'
|
min_gap
|
str | None
|
Minimum inter-sample interval to count as a gap. Defaults to 2x the median sampling interval (auto-detected). |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns: period_start, sample_count, |
DataFrame
|
coverage_pct, gap_count, gap_seconds. |
interpolation_candidates ¤
interpolation_candidates(
max_gap: str = "10s", min_gap: str = "0s"
) -> pd.DataFrame
Identify gaps small enough to safely interpolate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_gap
|
str
|
Maximum gap duration to consider for interpolation. |
'10s'
|
min_gap
|
str
|
Minimum gap duration (ignore trivially small gaps). |
'0s'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns: gap_start, gap_end, gap_duration_seconds, |
DataFrame
|
value_before, value_after, value_jump, safe_to_interpolate. |
DataFrame
|
|
DataFrame
|
gap is within 2 standard deviations of the signal. |