Skip to content

pattern_recognition

pattern_recognition ¤

PatternRecognition ¤

PatternRecognition(
    dataframe: DataFrame, column_name: str = "systime"
)

Bases: Base

Pattern Recognition for univariate timeseries.

Discover motifs, discords, and template matches using Matrix Profile and Dynamic Time Warping approaches.

Methods: - discover_motifs: Find top-k recurring subsequence patterns. - discover_discords: Find top-k anomalous subsequences. - similarity_search: Find subsequences most similar to a query (DTW). - template_match: Find all occurrences of a reference template. - compute_distance_profile: Distance from query to every subsequence.

discover_motifs classmethod ¤

discover_motifs(
    dataframe: DataFrame,
    value_column: str = "value_double",
    window_size: int = 50,
    top_k: int = 5,
    exclusion_zone: Optional[int] = None,
    time_column: str = "systime",
) -> pd.DataFrame

Find the top-k recurring subsequence patterns (motifs).

Parameters:

Name Type Description Default
dataframe DataFrame

Input DataFrame.

required
value_column str

Column containing numeric values.

'value_double'
window_size int

Length of subsequences to compare.

50
top_k int

Number of motif pairs to return.

5
exclusion_zone Optional[int]

Indices to exclude around matches. Defaults to window_size // 2.

None
time_column str

Column containing timestamps.

'systime'

Returns:

Type Description
DataFrame

DataFrame with motif_rank, index_a, index_b, distance, time_a, time_b.

discover_discords classmethod ¤

discover_discords(
    dataframe: DataFrame,
    value_column: str = "value_double",
    window_size: int = 50,
    top_k: int = 5,
    exclusion_zone: Optional[int] = None,
    time_column: str = "systime",
) -> pd.DataFrame

Find the top-k anomalous subsequences (discords).

Parameters:

Name Type Description Default
dataframe DataFrame

Input DataFrame.

required
value_column str

Column containing numeric values.

'value_double'
window_size int

Length of subsequences to compare.

50
top_k int

Number of discords to return.

5
exclusion_zone Optional[int]

Indices to exclude around matches. Defaults to window_size // 2.

None
time_column str

Column containing timestamps.

'systime'

Returns:

Type Description
DataFrame

DataFrame with discord_rank, start_index, distance, start_time.

similarity_search(
    dataframe: DataFrame,
    query: ndarray,
    value_column: str = "value_double",
    top_k: int = 5,
    normalize: bool = True,
    time_column: str = "systime",
) -> pd.DataFrame

Find the top-k most similar subsequences to a query using DTW.

Parameters:

Name Type Description Default
dataframe DataFrame

Input DataFrame.

required
query ndarray

Query pattern as numpy array.

required
value_column str

Column containing numeric values.

'value_double'
top_k int

Number of matches to return.

5
normalize bool

Whether to z-normalize before comparison.

True
time_column str

Column containing timestamps.

'systime'

Returns:

Type Description
DataFrame

DataFrame with rank, start_index, dtw_distance, start_time.

template_match classmethod ¤

template_match(
    dataframe: DataFrame,
    template: ndarray,
    value_column: str = "value_double",
    threshold: Optional[float] = None,
    normalize: bool = True,
    time_column: str = "systime",
) -> pd.DataFrame

Find all occurrences of a template pattern in the time series.

Parameters:

Name Type Description Default
dataframe DataFrame

Input DataFrame.

required
template ndarray

Reference pattern as numpy array.

required
value_column str

Column containing numeric values.

'value_double'
threshold Optional[float]

Maximum distance to consider a match. None = adaptive.

None
normalize bool

Whether to z-normalize before comparison.

True
time_column str

Column containing timestamps.

'systime'

Returns:

Type Description
DataFrame

DataFrame with start_index, distance, start_time, end_time.

compute_distance_profile classmethod ¤

compute_distance_profile(
    dataframe: DataFrame,
    query: ndarray,
    value_column: str = "value_double",
    metric: str = "euclidean",
    normalize: bool = True,
) -> np.ndarray

Compute distance from query to every subsequence of same length.

Parameters:

Name Type Description Default
dataframe DataFrame

Input DataFrame.

required
query ndarray

Query subsequence.

required
value_column str

Column containing numeric values.

'value_double'
metric str

'euclidean' (FFT-based) or 'dtw'.

'euclidean'
normalize bool

Whether to z-normalize.

True

Returns:

Type Description
ndarray

1D numpy array of distances.