Skip to content

ts_shape.features.cycles.cycle_processor ¤

Classes:

  • CycleDataProcessor

    A class to process cycle-based data and values. It allows for splitting, merging, and grouping DataFrames

CycleDataProcessor ¤

CycleDataProcessor(cycles_df: DataFrame, values_df: DataFrame, cycle_uuid_col: str = 'cycle_uuid', systime_col: str = 'systime')

Bases: Base

A class to process cycle-based data and values. It allows for splitting, merging, and grouping DataFrames based on cycles, as well as handling grouping and transformations by cycle UUIDs.

Parameters:

  • cycles_df ¤

    (DataFrame) –

    DataFrame containing columns 'cycle_start', 'cycle_end', and 'cycle_uuid'.

  • values_df ¤

    (DataFrame) –

    DataFrame containing the values and timestamps in the 'systime' column.

  • cycle_uuid_col ¤

    (str, default: 'cycle_uuid' ) –

    Name of the column representing cycle UUIDs.

  • systime_col ¤

    (str, default: 'systime' ) –

    Name of the column representing the timestamps for the values.

Methods:

  • get_dataframe

    Returns the processed DataFrame.

  • group_by_cycle_uuid

    Group the DataFrame by the cycle_uuid column, resulting in a list of DataFrames, each containing data for one cycle.

  • merge_dataframes_by_cycle

    Merges the values DataFrame with the cycles DataFrame based on the cycle time intervals.

  • split_by_cycle

    Splits the values DataFrame by cycles defined in the cycles DataFrame.

  • split_dataframes_by_group

    Splits a list of DataFrames by groups based on a specified column.

Source code in src/ts_shape/features/cycles/cycle_processor.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def __init__(self, cycles_df: pd.DataFrame, values_df: pd.DataFrame, cycle_uuid_col: str = "cycle_uuid", systime_col: str = "systime"):
    """
    Initializes the CycleDataProcessor with cycles and values DataFrames.

    Args:
        cycles_df: DataFrame containing columns 'cycle_start', 'cycle_end', and 'cycle_uuid'.
        values_df: DataFrame containing the values and timestamps in the 'systime' column.
        cycle_uuid_col: Name of the column representing cycle UUIDs.
        systime_col: Name of the column representing the timestamps for the values.
    """
    super().__init__(values_df)  # Call the parent constructor
    self.values_df = values_df.copy()  # Initialize self.values_df explicitly
    self.cycles_df = cycles_df.copy()
    self.cycle_uuid_col = cycle_uuid_col
    self.systime_col = systime_col

    # Ensure proper datetime format
    self.cycles_df['cycle_start'] = pd.to_datetime(self.cycles_df['cycle_start'])
    self.cycles_df['cycle_end'] = pd.to_datetime(self.cycles_df['cycle_end'])
    self.values_df[systime_col] = pd.to_datetime(self.values_df[systime_col])

    logging.info("CycleDataProcessor initialized with cycles and values DataFrames.")

get_dataframe ¤

get_dataframe() -> DataFrame

Returns the processed DataFrame.

Source code in src/ts_shape/utils/base.py
34
35
36
def get_dataframe(self) -> pd.DataFrame:
    """Returns the processed DataFrame."""
    return self.dataframe

group_by_cycle_uuid ¤

group_by_cycle_uuid(data: Optional[DataFrame] = None) -> List[DataFrame]

Group the DataFrame by the cycle_uuid column, resulting in a list of DataFrames, each containing data for one cycle.

Parameters:

  • data ¤

    (Optional[DataFrame], default: None ) –

    DataFrame containing the data to be grouped by cycle_uuid. If None, uses the internal values_df.

Return

List of DataFrames, each containing data for a unique cycle_uuid.

Source code in src/ts_shape/features/cycles/cycle_processor.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def group_by_cycle_uuid(self, data: Optional[pd.DataFrame] = None) -> List[pd.DataFrame]:
    """
    Group the DataFrame by the cycle_uuid column, resulting in a list of DataFrames, each containing data for one cycle.

    Args:
        data: DataFrame containing the data to be grouped by cycle_uuid. If None, uses the internal values_df.

    Return:
        List of DataFrames, each containing data for a unique cycle_uuid.
    """
    if data is None:
        data = self.values_df

    grouped_dataframes = [group for _, group in data.groupby(self.cycle_uuid_col)]
    logging.info(f"Grouped data into {len(grouped_dataframes)} cycle UUID groups.")
    return grouped_dataframes

merge_dataframes_by_cycle ¤

merge_dataframes_by_cycle() -> DataFrame

Merges the values DataFrame with the cycles DataFrame based on the cycle time intervals. Appends the 'cycle_uuid' to the values DataFrame.

Return

DataFrame with an added 'cycle_uuid' column.

Source code in src/ts_shape/features/cycles/cycle_processor.py
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def merge_dataframes_by_cycle(self) -> pd.DataFrame:
    """
    Merges the values DataFrame with the cycles DataFrame based on the cycle time intervals. 
    Appends the 'cycle_uuid' to the values DataFrame.

    Return:
        DataFrame with an added 'cycle_uuid' column.
    """
    # Merge based on systime falling within cycle_start and cycle_end
    self.values_df[self.cycle_uuid_col] = None

    for _, row in self.cycles_df.iterrows():
        mask = (self.values_df[self.systime_col] >= row['cycle_start']) & (self.values_df[self.systime_col] <= row['cycle_end'])
        self.values_df.loc[mask, self.cycle_uuid_col] = row[self.cycle_uuid_col]

    merged_df = self.values_df.dropna(subset=[self.cycle_uuid_col])
    logging.info(f"Merged DataFrame contains {len(merged_df)} records.")
    return merged_df

split_by_cycle ¤

split_by_cycle() -> Dict[str, DataFrame]

Splits the values DataFrame by cycles defined in the cycles DataFrame. Each cycle is defined by a start and end time, and the corresponding values are filtered accordingly.

Return

Dictionary where keys are cycle_uuids and values are DataFrames with the corresponding cycle data.

Source code in src/ts_shape/features/cycles/cycle_processor.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def split_by_cycle(self) -> Dict[str, pd.DataFrame]:
    """
    Splits the values DataFrame by cycles defined in the cycles DataFrame. 
    Each cycle is defined by a start and end time, and the corresponding values are filtered accordingly.

    Return:
        Dictionary where keys are cycle_uuids and values are DataFrames with the corresponding cycle data.
    """
    result = {}
    for _, row in self.cycles_df.iterrows():
        mask = (self.values_df[self.systime_col] >= row['cycle_start']) & (self.values_df[self.systime_col] <= row['cycle_end'])
        result[row[self.cycle_uuid_col]] = self.values_df[mask].copy()

    logging.info(f"Split {len(result)} cycles.")
    return result

split_dataframes_by_group ¤

split_dataframes_by_group(dfs: List[DataFrame], column: str) -> List[DataFrame]

Splits a list of DataFrames by groups based on a specified column. This function performs a groupby operation on each DataFrame in the list and then flattens the result.

Parameters:

  • dfs ¤

    (List[DataFrame]) –

    List of DataFrames to be split.

  • column ¤

    (str) –

    Column name to group by.

Return

List of DataFrames, each corresponding to a group in the original DataFrames.

Source code in src/ts_shape/features/cycles/cycle_processor.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def split_dataframes_by_group(self, dfs: List[pd.DataFrame], column: str) -> List[pd.DataFrame]:
    """
    Splits a list of DataFrames by groups based on a specified column. 
    This function performs a groupby operation on each DataFrame in the list and then flattens the result.

    Args:
        dfs: List of DataFrames to be split.
        column: Column name to group by.

    Return:
        List of DataFrames, each corresponding to a group in the original DataFrames.
    """
    split_dfs = []
    for df in dfs:
        groups = df.groupby(column)
        for _, group in groups:
            split_dfs.append(group)

    logging.info(f"Split data into {len(split_dfs)} groups based on column '{column}'.")
    return split_dfs