Skip to content

profile_comparison

profile_comparison ¤

ProfileComparison ¤

ProfileComparison(
    dataframe: DataFrame, column_name: str = "systime"
)

Bases: Base

Distance, clustering, similarity, and anomaly detection on metric profiles.

Operates on the output of SegmentProcessor.compute_metric_profiles. All methods are classmethods working on DataFrames.

Methods: - compute_distance_matrix: Pairwise distance matrix between groups. - cluster: Hierarchical clustering of items by metric similarity. - find_similar: Top-K most similar items to a target. - detect_anomalous: Flag items with unusual metric profiles. - detect_changes: Track metric shifts across consecutive segments per UUID. - find_similar_pairs: Find most similar (UUID, segment) pairs across all data.

compute_distance_matrix classmethod ¤

compute_distance_matrix(
    metric_profiles: DataFrame,
    group_column: str = "uuid",
    metric_columns: Optional[List[str]] = None,
    distance_metric: str = "euclidean",
    normalize: bool = True,
) -> pd.DataFrame

Compute pairwise distance matrix between metric profile vectors.

Can compare UUIDs (group_column='uuid') or segments (group_column='segment_value'). When multiple rows exist per group, metrics are averaged.

Parameters:

Name Type Description Default
metric_profiles DataFrame

Output from SegmentProcessor.compute_metric_profiles.

required
group_column str

Column to group by ('uuid' or 'segment_value').

'uuid'
metric_columns Optional[List[str]]

Which metric columns to use. None auto-detects.

None
distance_metric str

'euclidean', 'cosine', or 'manhattan'.

'euclidean'
normalize bool

Z-normalize metrics before computing distances.

True

Returns:

Type Description
DataFrame

Square DataFrame indexed by group values with pairwise distances.

cluster classmethod ¤

cluster(
    distance_matrix: DataFrame,
    n_clusters: int = 3,
    distance_threshold: Optional[float] = None,
    linkage_method: str = "average",
) -> pd.DataFrame

Group items by metric similarity using hierarchical clustering.

Parameters:

Name Type Description Default
distance_matrix DataFrame

Square distance matrix from compute_distance_matrix.

required
n_clusters int

Number of clusters. Ignored if distance_threshold is set.

3
distance_threshold Optional[float]

Cut dendrogram at this distance. Overrides n_clusters.

None
linkage_method str

'average', 'complete', 'single', or 'ward'.

'average'

Returns:

Type Description
DataFrame

DataFrame with columns [label, cluster].

find_similar classmethod ¤

find_similar(
    distance_matrix: DataFrame, target: str, top_k: int = 5
) -> pd.DataFrame

Find items most similar to a target based on metric profiles.

Parameters:

Name Type Description Default
distance_matrix DataFrame

Square distance matrix from compute_distance_matrix.

required
target str

Item label to find similarities for.

required
top_k int

Number of similar items to return.

5

Returns:

Type Description
DataFrame

DataFrame with columns [label, distance, rank] sorted by distance.

detect_anomalous classmethod ¤

detect_anomalous(
    distance_matrix: DataFrame, threshold: float = 2.0
) -> pd.DataFrame

Detect items with unusual metric profiles.

Computes mean distance from each item to all others. Items whose z-score exceeds the threshold are flagged as anomalous.

Parameters:

Name Type Description Default
distance_matrix DataFrame

Square distance matrix from compute_distance_matrix.

required
threshold float

Z-score threshold for anomaly detection.

2.0

Returns:

Type Description
DataFrame

DataFrame with columns [label, anomaly_score, z_score, is_anomalous].

detect_changes classmethod ¤

detect_changes(
    metric_profiles: DataFrame,
    uuid_column: str = "uuid",
    group_column: str = "segment_index",
    metric_columns: Optional[List[str]] = None,
    normalize: bool = True,
) -> pd.DataFrame

Track how each UUID's metrics change across consecutive segments.

Computes Euclidean distance between consecutive segment metric vectors for each UUID. Large change scores indicate process shifts.

Parameters:

Name Type Description Default
metric_profiles DataFrame

Output from SegmentProcessor.compute_metric_profiles.

required
uuid_column str

Column identifying each timeseries.

'uuid'
group_column str

Column ordering the segments (e.g. 'segment_index').

'segment_index'
metric_columns Optional[List[str]]

Which metric columns to use. None auto-detects.

None
normalize bool

Z-normalize metrics before computing change scores.

True

Returns:

Type Description
DataFrame

DataFrame with columns [uuid, , change_score].

find_similar_pairs classmethod ¤

find_similar_pairs(
    metric_profiles: DataFrame,
    uuid_column: str = "uuid",
    group_column: str = "segment_value",
    metric_columns: Optional[List[str]] = None,
    normalize: bool = True,
    top_k: int = 10,
) -> pd.DataFrame

Find the most similar (UUID, segment) pairs across all data.

Useful for finding which process parameters behave similarly across different orders or part numbers.

Parameters:

Name Type Description Default
metric_profiles DataFrame

Output from SegmentProcessor.compute_metric_profiles.

required
uuid_column str

Column identifying each timeseries.

'uuid'
group_column str

Column identifying each segment.

'segment_value'
metric_columns Optional[List[str]]

Which metric columns to use. None auto-detects.

None
normalize bool

Z-normalize metrics before computing distances.

True
top_k int

Number of closest pairs to return.

10

Returns:

Type Description
DataFrame

DataFrame with columns [uuid_a, group_a, uuid_b, group_b, distance, rank].