Skip to content

s3proxy_parquet_loader

s3proxy_parquet_loader ¤

S3ProxyDataAccess ¤

S3ProxyDataAccess(
    start_timestamp: str,
    end_timestamp: str,
    uuids: List[str],
    s3_config: Dict[str, str],
)

A class to access timeseries data via an S3 proxy. This class retrieves data for specified UUIDs within a defined time range, with the option to output data as Parquet files or as a single combined DataFrame.

Initialize the S3ProxyDataAccess object. :param start_timestamp: Start timestamp in "Year-Month-Day Hour:Minute:Second" format. :param end_timestamp: End timestamp in "Year-Month-Day Hour:Minute:Second" format. :param uuids: List of UUIDs to retrieve data for. :param s3_config: Configuration dictionary for S3 connection.

fetch_data_as_parquet ¤

fetch_data_as_parquet(output_dir: str)

Retrieves timeseries data from S3 and saves it as Parquet files. Each file is saved in a directory structure of UUID/year/month/day/hour. :param output_dir: Base directory to save the Parquet files.

fetch_data_as_dataframe ¤

fetch_data_as_dataframe() -> pd.DataFrame

Retrieves timeseries data from S3 and returns it as a single DataFrame. :return: A combined DataFrame with data for all specified UUIDs and time slots.