pattern_recognition
pattern_recognition ¤
PatternRecognition ¤
PatternRecognition(
dataframe: DataFrame, column_name: str = "systime"
)
Bases: Base
Pattern Recognition for univariate timeseries.
Discover motifs, discords, and template matches using Matrix Profile and Dynamic Time Warping approaches.
Methods: - discover_motifs: Find top-k recurring subsequence patterns. - discover_discords: Find top-k anomalous subsequences. - similarity_search: Find subsequences most similar to a query (DTW). - template_match: Find all occurrences of a reference template. - compute_distance_profile: Distance from query to every subsequence.
discover_motifs
classmethod
¤
discover_motifs(
dataframe: DataFrame,
value_column: str = "value_double",
window_size: int = 50,
top_k: int = 5,
exclusion_zone: Optional[int] = None,
time_column: str = "systime",
) -> pd.DataFrame
Find the top-k recurring subsequence patterns (motifs).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframe
|
DataFrame
|
Input DataFrame. |
required |
value_column
|
str
|
Column containing numeric values. |
'value_double'
|
window_size
|
int
|
Length of subsequences to compare. |
50
|
top_k
|
int
|
Number of motif pairs to return. |
5
|
exclusion_zone
|
Optional[int]
|
Indices to exclude around matches. Defaults to window_size // 2. |
None
|
time_column
|
str
|
Column containing timestamps. |
'systime'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with motif_rank, index_a, index_b, distance, time_a, time_b. |
discover_discords
classmethod
¤
discover_discords(
dataframe: DataFrame,
value_column: str = "value_double",
window_size: int = 50,
top_k: int = 5,
exclusion_zone: Optional[int] = None,
time_column: str = "systime",
) -> pd.DataFrame
Find the top-k anomalous subsequences (discords).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframe
|
DataFrame
|
Input DataFrame. |
required |
value_column
|
str
|
Column containing numeric values. |
'value_double'
|
window_size
|
int
|
Length of subsequences to compare. |
50
|
top_k
|
int
|
Number of discords to return. |
5
|
exclusion_zone
|
Optional[int]
|
Indices to exclude around matches. Defaults to window_size // 2. |
None
|
time_column
|
str
|
Column containing timestamps. |
'systime'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with discord_rank, start_index, distance, start_time. |
similarity_search
classmethod
¤
similarity_search(
dataframe: DataFrame,
query: ndarray,
value_column: str = "value_double",
top_k: int = 5,
normalize: bool = True,
time_column: str = "systime",
) -> pd.DataFrame
Find the top-k most similar subsequences to a query using DTW.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframe
|
DataFrame
|
Input DataFrame. |
required |
query
|
ndarray
|
Query pattern as numpy array. |
required |
value_column
|
str
|
Column containing numeric values. |
'value_double'
|
top_k
|
int
|
Number of matches to return. |
5
|
normalize
|
bool
|
Whether to z-normalize before comparison. |
True
|
time_column
|
str
|
Column containing timestamps. |
'systime'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with rank, start_index, dtw_distance, start_time. |
template_match
classmethod
¤
template_match(
dataframe: DataFrame,
template: ndarray,
value_column: str = "value_double",
threshold: Optional[float] = None,
normalize: bool = True,
time_column: str = "systime",
) -> pd.DataFrame
Find all occurrences of a template pattern in the time series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframe
|
DataFrame
|
Input DataFrame. |
required |
template
|
ndarray
|
Reference pattern as numpy array. |
required |
value_column
|
str
|
Column containing numeric values. |
'value_double'
|
threshold
|
Optional[float]
|
Maximum distance to consider a match. None = adaptive. |
None
|
normalize
|
bool
|
Whether to z-normalize before comparison. |
True
|
time_column
|
str
|
Column containing timestamps. |
'systime'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with start_index, distance, start_time, end_time. |
compute_distance_profile
classmethod
¤
compute_distance_profile(
dataframe: DataFrame,
query: ndarray,
value_column: str = "value_double",
metric: str = "euclidean",
normalize: bool = True,
) -> np.ndarray
Compute distance from query to every subsequence of same length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframe
|
DataFrame
|
Input DataFrame. |
required |
query
|
ndarray
|
Query subsequence. |
required |
value_column
|
str
|
Column containing numeric values. |
'value_double'
|
metric
|
str
|
'euclidean' (FFT-based) or 'dtw'. |
'euclidean'
|
normalize
|
bool
|
Whether to z-normalize. |
True
|
Returns:
| Type | Description |
|---|---|
ndarray
|
1D numpy array of distances. |