Skip to content

outlier_detection

outlier_detection ¤

OutlierDetectionEvents ¤

OutlierDetectionEvents(
    dataframe: DataFrame,
    value_column: str,
    event_uuid: str = "outlier_event",
    time_threshold: str = "5min",
)

Bases: Base

Processes time series data to detect outliers based on specified statistical methods.

Initializes the OutlierDetectionEvents with specific attributes for outlier detection.

Parameters:

Name Type Description Default
dataframe DataFrame

The input time series DataFrame.

required
value_column str

The name of the column containing the values for outlier detection.

required
event_uuid str

A UUID or identifier for detected outlier events.

'outlier_event'
time_threshold str

The time threshold to group close events together.

'5min'

detect_outliers_zscore ¤

detect_outliers_zscore(
    threshold: float = 3.0, include_singles: bool = True
) -> pd.DataFrame

Detects outliers using the Z-score method.

Parameters:

Name Type Description Default
threshold float

The Z-score threshold for detecting outliers.

3.0
include_singles bool

Whether to include single outliers in the output. Default is True.

True

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame of detected outliers and grouped events.

detect_outliers_iqr ¤

detect_outliers_iqr(
    threshold: tuple = (1.5, 1.5),
    include_singles: bool = True,
) -> pd.DataFrame

Detects outliers using the IQR method.

Parameters:

Name Type Description Default
threshold tuple

The multipliers for the IQR range for detecting outliers (lower, upper).

(1.5, 1.5)
include_singles bool

Whether to include single outliers in the output. Default is True.

True

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame of detected outliers and grouped events.

detect_outliers_mad ¤

detect_outliers_mad(
    threshold: float = 3.5, include_singles: bool = True
) -> pd.DataFrame

Detects outliers using the Median Absolute Deviation (MAD) method. This method is more robust to outliers than z-score.

Parameters:

Name Type Description Default
threshold float

The MAD threshold for detecting outliers. Default is 3.5.

3.5
include_singles bool

Whether to include single outliers in the output. Default is True.

True

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame of detected outliers and grouped events.

detect_outliers_isolation_forest ¤

detect_outliers_isolation_forest(
    contamination: float = 0.1,
    include_singles: bool = True,
    random_state: Optional[int] = 42,
) -> pd.DataFrame

Detects outliers using sklearn's IsolationForest algorithm. Falls back gracefully if sklearn is not available.

Parameters:

Name Type Description Default
contamination float

The proportion of outliers in the dataset. Default is 0.1.

0.1
include_singles bool

Whether to include single outliers in the output. Default is True.

True
random_state Optional[int]

Random state for reproducibility. Default is 42.

42

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame of detected outliers and grouped events.

Raises:

Type Description
ImportError

If sklearn is not installed.