Skip to content

ts_shape.events.quality.outlier_detection ¤

Classes:

  • OutlierDetectionEvents

    Processes time series data to detect outliers based on specified statistical methods.

OutlierDetectionEvents ¤

OutlierDetectionEvents(dataframe: DataFrame, value_column: str, event_uuid: str = 'outlier_event', time_threshold: str = '5min')

Bases: Base

Processes time series data to detect outliers based on specified statistical methods.

Parameters:

  • dataframe ¤

    (DataFrame) –

    The input time series DataFrame.

  • value_column ¤

    (str) –

    The name of the column containing the values for outlier detection.

  • event_uuid ¤

    (str, default: 'outlier_event' ) –

    A UUID or identifier for detected outlier events.

  • time_threshold ¤

    (str, default: '5min' ) –

    The time threshold to group close events together.

Methods:

Source code in src/ts_shape/events/quality/outlier_detection.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def __init__(self, dataframe: pd.DataFrame, value_column: str, event_uuid: str = 'outlier_event', 
             time_threshold: str = '5min') -> None:
    """
    Initializes the OutlierDetectionEvents with specific attributes for outlier detection.

    Args:
        dataframe (pd.DataFrame): The input time series DataFrame.
        value_column (str): The name of the column containing the values for outlier detection.
        event_uuid (str): A UUID or identifier for detected outlier events.
        time_threshold (str): The time threshold to group close events together.
    """
    super().__init__(dataframe)
    self.value_column = value_column
    self.event_uuid = event_uuid
    self.time_threshold = time_threshold

detect_outliers_iqr ¤

detect_outliers_iqr(threshold: tuple = (1.5, 1.5)) -> DataFrame

Detects outliers using the IQR method.

Parameters:

  • threshold ¤

    (tuple, default: (1.5, 1.5) ) –

    The multipliers for the IQR range for detecting outliers (lower, upper).

Returns:

  • DataFrame

    pd.DataFrame: A DataFrame of detected outliers and grouped events.

Source code in src/ts_shape/events/quality/outlier_detection.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def detect_outliers_iqr(self, threshold: tuple = (1.5, 1.5)) -> pd.DataFrame:
    """
    Detects outliers using the IQR method.

    Args:
        threshold (tuple): The multipliers for the IQR range for detecting outliers (lower, upper).

    Returns:
        pd.DataFrame: A DataFrame of detected outliers and grouped events.
    """
    df = self.dataframe.copy()

    # Convert 'systime' to datetime and sort the DataFrame by 'systime' in descending order
    df['systime'] = pd.to_datetime(df['systime'])
    df = df.sort_values(by='systime', ascending=False)

    # Detect outliers using the IQR method
    Q1 = df[self.value_column].quantile(0.25)
    Q3 = df[self.value_column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - threshold[0] * IQR
    upper_bound = Q3 + threshold[1] * IQR
    df['outlier'] = (df[self.value_column] < lower_bound) | (df[self.value_column] > upper_bound)

    # Filter to keep only outliers
    outliers_df = df[df['outlier']]

    # Group and return the outliers
    return self._group_outliers(outliers_df)

detect_outliers_zscore ¤

detect_outliers_zscore(threshold: float = 3.0) -> DataFrame

Detects outliers using the Z-score method.

Parameters:

  • threshold ¤

    (float, default: 3.0 ) –

    The Z-score threshold for detecting outliers.

Returns:

  • DataFrame

    pd.DataFrame: A DataFrame of detected outliers and grouped events.

Source code in src/ts_shape/events/quality/outlier_detection.py
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
def detect_outliers_zscore(self, threshold: float = 3.0) -> pd.DataFrame:
    """
    Detects outliers using the Z-score method.

    Args:
        threshold (float): The Z-score threshold for detecting outliers.

    Returns:
        pd.DataFrame: A DataFrame of detected outliers and grouped events.
    """
    df = self.dataframe.copy()

    # Convert 'systime' to datetime and sort the DataFrame by 'systime' in descending order
    df['systime'] = pd.to_datetime(df['systime'])
    df = df.sort_values(by='systime', ascending=False)

    # Detect outliers using the Z-score method
    df['outlier'] = np.abs(zscore(df[self.value_column])) > threshold

    # Filter to keep only outliers
    outliers_df = df[df['outlier']]

    # Group and return the outliers
    return self._group_outliers(outliers_df)

get_dataframe ¤

get_dataframe() -> DataFrame

Returns the processed DataFrame.

Source code in src/ts_shape/utils/base.py
34
35
36
def get_dataframe(self) -> pd.DataFrame:
    """Returns the processed DataFrame."""
    return self.dataframe