cycle_processor
cycle_processor ¤
CycleDataProcessor ¤
CycleDataProcessor(
cycles_df: DataFrame,
values_df: DataFrame,
cycle_uuid_col: str = "cycle_uuid",
systime_col: str = "systime",
)
Bases: Base
A class to process cycle-based data and values with optimized performance. Uses pandas IntervalIndex for efficient cycle assignment instead of nested loops.
Initializes the CycleDataProcessor with cycles and values DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cycles_df
|
DataFrame
|
DataFrame containing columns 'cycle_start', 'cycle_end', and 'cycle_uuid'. |
required |
values_df
|
DataFrame
|
DataFrame containing the values and timestamps in the 'systime' column. |
required |
cycle_uuid_col
|
str
|
Name of the column representing cycle UUIDs. |
'cycle_uuid'
|
systime_col
|
str
|
Name of the column representing the timestamps for the values. |
'systime'
|
split_by_cycle ¤
split_by_cycle() -> Dict[str, pd.DataFrame]
Splits the values DataFrame by cycles defined in the cycles DataFrame. Uses optimized interval-based assignment.
Return
Dictionary where keys are cycle_uuids and values are DataFrames with the corresponding cycle data.
merge_dataframes_by_cycle ¤
merge_dataframes_by_cycle() -> pd.DataFrame
Merges the values DataFrame with the cycles DataFrame based on the cycle time intervals. Uses optimized interval-based assignment instead of nested loops.
Return
DataFrame with an added 'cycle_uuid' column.
group_by_cycle_uuid ¤
group_by_cycle_uuid(
data: Optional[DataFrame] = None,
) -> List[pd.DataFrame]
Group the DataFrame by the cycle_uuid column, resulting in a list of DataFrames, each containing data for one cycle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Optional[DataFrame]
|
DataFrame containing the data to be grouped by cycle_uuid. If None, uses the internal values_df. |
None
|
Return
List of DataFrames, each containing data for a unique cycle_uuid.
split_dataframes_by_group ¤
split_dataframes_by_group(
dfs: List[DataFrame], column: str
) -> List[pd.DataFrame]
Splits a list of DataFrames by groups based on a specified column. This function performs a groupby operation on each DataFrame in the list and then flattens the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dfs
|
List[DataFrame]
|
List of DataFrames to be split. |
required |
column
|
str
|
Column name to group by. |
required |
Return
List of DataFrames, each corresponding to a group in the original DataFrames.
compute_cycle_statistics ¤
compute_cycle_statistics() -> pd.DataFrame
Compute statistics for each cycle.
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with cycle-level statistics including duration, value counts, etc. |
compare_cycles ¤
compare_cycles(
reference_cycle_uuid: str, metric: str = "value_double"
) -> pd.DataFrame
Compare all cycles against a reference cycle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_cycle_uuid
|
str
|
UUID of the reference cycle |
required |
metric
|
str
|
Column name to use for comparison |
'value_double'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with comparison metrics for each cycle |
identify_golden_cycles ¤
identify_golden_cycles(
metric: str = "value_double",
method: str = "low_variability",
top_n: int = 5,
) -> List[str]
Identify the best performing cycles (golden cycles).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
str
|
Column name to evaluate |
'value_double'
|
method
|
str
|
Method for identification ('low_variability', 'high_mean', 'target_value') |
'low_variability'
|
top_n
|
int
|
Number of golden cycles to identify |
5
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of cycle UUIDs identified as golden cycles |