Skip to content

data_gap_analysis

data_gap_analysis ¤

DataGapAnalysisEvents ¤

DataGapAnalysisEvents(
    dataframe: DataFrame,
    signal_uuid: str,
    *,
    event_uuid: str = "quality:data_gap",
    value_column: str = "value_double",
    time_column: str = "systime"
)

Bases: Base

Quality: Data Gap Analysis

Answer the question "where are the holes in my data?" by analysing gaps in a numeric signal's timestamps. Complements :class:SignalQualityEvents (which detects individual missing-data events) with pattern-level analysis: gap summaries, per-period coverage, and interpolation-candidate identification.

Methods: - find_gaps: Locate all gaps longer than a threshold. - gap_summary: Aggregate statistics across all gaps. - coverage_by_period: Data coverage percentage per time window. - interpolation_candidates: Gaps small enough to interpolate safely.

find_gaps ¤

find_gaps(min_gap: str = '5s') -> pd.DataFrame

Locate all gaps longer than min_gap.

Parameters:

Name Type Description Default
min_gap str

Minimum gap duration to report (e.g. '5s', '1min', '1h').

'5s'

Returns:

Type Description
DataFrame

DataFrame with columns: gap_start, gap_end, gap_duration_seconds,

DataFrame

samples_before_gap, samples_after_gap.

gap_summary ¤

gap_summary(min_gap: str = '5s') -> pd.DataFrame

Aggregate statistics across all gaps.

Parameters:

Name Type Description Default
min_gap str

Minimum gap duration to include (e.g. '5s').

'5s'

Returns:

Type Description
DataFrame

Single-row DataFrame with columns: total_gaps, total_missing_seconds,

DataFrame

longest_gap_seconds, shortest_gap_seconds, mean_gap_seconds,

DataFrame

data_span_seconds, gap_fraction.

coverage_by_period ¤

coverage_by_period(
    freq: str = "1h", min_gap: str | None = None
) -> pd.DataFrame

Data coverage percentage per time window.

For each window, reports how much of the window actually contains data (based on first and last sample timestamps minus any internal gaps).

Parameters:

Name Type Description Default
freq str

Resample frequency (e.g. '1h', '1D', '15min').

'1h'
min_gap str | None

Minimum inter-sample interval to count as a gap. Defaults to 2x the median sampling interval (auto-detected).

None

Returns:

Type Description
DataFrame

DataFrame with columns: period_start, sample_count,

DataFrame

coverage_pct, gap_count, gap_seconds.

interpolation_candidates ¤

interpolation_candidates(
    max_gap: str = "10s", min_gap: str = "0s"
) -> pd.DataFrame

Identify gaps small enough to safely interpolate.

Parameters:

Name Type Description Default
max_gap str

Maximum gap duration to consider for interpolation.

'10s'
min_gap str

Minimum gap duration (ignore trivially small gaps).

'0s'

Returns:

Type Description
DataFrame

DataFrame with columns: gap_start, gap_end, gap_duration_seconds,

DataFrame

value_before, value_after, value_jump, safe_to_interpolate.

DataFrame

safe_to_interpolate is True when the value jump across the

DataFrame

gap is within 2 standard deviations of the signal.