profile_comparison
profile_comparison ¤
ProfileComparison ¤
ProfileComparison(
dataframe: DataFrame, column_name: str = "systime"
)
Bases: Base
Distance, clustering, similarity, and anomaly detection on metric profiles.
Operates on the output of SegmentProcessor.compute_metric_profiles. All methods are classmethods working on DataFrames.
Methods: - compute_distance_matrix: Pairwise distance matrix between groups. - cluster: Hierarchical clustering of items by metric similarity. - find_similar: Top-K most similar items to a target. - detect_anomalous: Flag items with unusual metric profiles. - detect_changes: Track metric shifts across consecutive segments per UUID. - find_similar_pairs: Find most similar (UUID, segment) pairs across all data.
compute_distance_matrix
classmethod
¤
compute_distance_matrix(
metric_profiles: DataFrame,
group_column: str = "uuid",
metric_columns: Optional[List[str]] = None,
distance_metric: str = "euclidean",
normalize: bool = True,
) -> pd.DataFrame
Compute pairwise distance matrix between metric profile vectors.
Can compare UUIDs (group_column='uuid') or segments (group_column='segment_value'). When multiple rows exist per group, metrics are averaged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_profiles
|
DataFrame
|
Output from SegmentProcessor.compute_metric_profiles. |
required |
group_column
|
str
|
Column to group by ('uuid' or 'segment_value'). |
'uuid'
|
metric_columns
|
Optional[List[str]]
|
Which metric columns to use. None auto-detects. |
None
|
distance_metric
|
str
|
'euclidean', 'cosine', or 'manhattan'. |
'euclidean'
|
normalize
|
bool
|
Z-normalize metrics before computing distances. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Square DataFrame indexed by group values with pairwise distances. |
cluster
classmethod
¤
cluster(
distance_matrix: DataFrame,
n_clusters: int = 3,
distance_threshold: Optional[float] = None,
linkage_method: str = "average",
) -> pd.DataFrame
Group items by metric similarity using hierarchical clustering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
distance_matrix
|
DataFrame
|
Square distance matrix from compute_distance_matrix. |
required |
n_clusters
|
int
|
Number of clusters. Ignored if distance_threshold is set. |
3
|
distance_threshold
|
Optional[float]
|
Cut dendrogram at this distance. Overrides n_clusters. |
None
|
linkage_method
|
str
|
'average', 'complete', 'single', or 'ward'. |
'average'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns [label, cluster]. |
find_similar
classmethod
¤
find_similar(
distance_matrix: DataFrame, target: str, top_k: int = 5
) -> pd.DataFrame
Find items most similar to a target based on metric profiles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
distance_matrix
|
DataFrame
|
Square distance matrix from compute_distance_matrix. |
required |
target
|
str
|
Item label to find similarities for. |
required |
top_k
|
int
|
Number of similar items to return. |
5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns [label, distance, rank] sorted by distance. |
detect_anomalous
classmethod
¤
detect_anomalous(
distance_matrix: DataFrame, threshold: float = 2.0
) -> pd.DataFrame
Detect items with unusual metric profiles.
Computes mean distance from each item to all others. Items whose z-score exceeds the threshold are flagged as anomalous.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
distance_matrix
|
DataFrame
|
Square distance matrix from compute_distance_matrix. |
required |
threshold
|
float
|
Z-score threshold for anomaly detection. |
2.0
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns [label, anomaly_score, z_score, is_anomalous]. |
detect_changes
classmethod
¤
detect_changes(
metric_profiles: DataFrame,
uuid_column: str = "uuid",
group_column: str = "segment_index",
metric_columns: Optional[List[str]] = None,
normalize: bool = True,
) -> pd.DataFrame
Track how each UUID's metrics change across consecutive segments.
Computes Euclidean distance between consecutive segment metric vectors for each UUID. Large change scores indicate process shifts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_profiles
|
DataFrame
|
Output from SegmentProcessor.compute_metric_profiles. |
required |
uuid_column
|
str
|
Column identifying each timeseries. |
'uuid'
|
group_column
|
str
|
Column ordering the segments (e.g. 'segment_index'). |
'segment_index'
|
metric_columns
|
Optional[List[str]]
|
Which metric columns to use. None auto-detects. |
None
|
normalize
|
bool
|
Z-normalize metrics before computing change scores. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns [uuid, |
find_similar_pairs
classmethod
¤
find_similar_pairs(
metric_profiles: DataFrame,
uuid_column: str = "uuid",
group_column: str = "segment_value",
metric_columns: Optional[List[str]] = None,
normalize: bool = True,
top_k: int = 10,
) -> pd.DataFrame
Find the most similar (UUID, segment) pairs across all data.
Useful for finding which process parameters behave similarly across different orders or part numbers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_profiles
|
DataFrame
|
Output from SegmentProcessor.compute_metric_profiles. |
required |
uuid_column
|
str
|
Column identifying each timeseries. |
'uuid'
|
group_column
|
str
|
Column identifying each segment. |
'segment_value'
|
metric_columns
|
Optional[List[str]]
|
Which metric columns to use. None auto-detects. |
None
|
normalize
|
bool
|
Z-normalize metrics before computing distances. |
True
|
top_k
|
int
|
Number of closest pairs to return. |
10
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns [uuid_a, group_a, uuid_b, group_b, distance, rank]. |