outlier_detection
outlier_detection ¤
OutlierDetectionEvents ¤
OutlierDetectionEvents(
dataframe: DataFrame,
value_column: str,
event_uuid: str = "outlier_event",
time_threshold: str = "5min",
)
Bases: Base
Processes time series data to detect outliers based on specified statistical methods.
Initializes the OutlierDetectionEvents with specific attributes for outlier detection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframe
|
DataFrame
|
The input time series DataFrame. |
required |
value_column
|
str
|
The name of the column containing the values for outlier detection. |
required |
event_uuid
|
str
|
A UUID or identifier for detected outlier events. |
'outlier_event'
|
time_threshold
|
str
|
The time threshold to group close events together. |
'5min'
|
detect_outliers_zscore ¤
detect_outliers_zscore(
threshold: float = 3.0, include_singles: bool = True
) -> pd.DataFrame
Detects outliers using the Z-score method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
The Z-score threshold for detecting outliers. |
3.0
|
include_singles
|
bool
|
Whether to include single outliers in the output. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame of detected outliers and grouped events. |
detect_outliers_iqr ¤
detect_outliers_iqr(
threshold: tuple = (1.5, 1.5),
include_singles: bool = True,
) -> pd.DataFrame
Detects outliers using the IQR method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
tuple
|
The multipliers for the IQR range for detecting outliers (lower, upper). |
(1.5, 1.5)
|
include_singles
|
bool
|
Whether to include single outliers in the output. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame of detected outliers and grouped events. |
detect_outliers_mad ¤
detect_outliers_mad(
threshold: float = 3.5, include_singles: bool = True
) -> pd.DataFrame
Detects outliers using the Median Absolute Deviation (MAD) method. This method is more robust to outliers than z-score.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
The MAD threshold for detecting outliers. Default is 3.5. |
3.5
|
include_singles
|
bool
|
Whether to include single outliers in the output. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame of detected outliers and grouped events. |
detect_outliers_isolation_forest ¤
detect_outliers_isolation_forest(
contamination: float = 0.1,
include_singles: bool = True,
random_state: Optional[int] = 42,
) -> pd.DataFrame
Detects outliers using sklearn's IsolationForest algorithm. Falls back gracefully if sklearn is not available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
contamination
|
float
|
The proportion of outliers in the dataset. Default is 0.1. |
0.1
|
include_singles
|
bool
|
Whether to include single outliers in the output. Default is True. |
True
|
random_state
|
Optional[int]
|
Random state for reproducibility. Default is 42. |
42
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame of detected outliers and grouped events. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If sklearn is not installed. |