parquet_loader
parquet_loader ¤
ParquetLoader ¤
ParquetLoader(base_path: str)
This class provides class methods to load parquet files from a specified directory structure.
Initialize the ParquetLoader with the base directory path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str
|
The base directory where parquet files are stored. |
required |
load_all_files
classmethod
¤
load_all_files(base_path: str) -> pd.DataFrame
Loads all parquet files in the specified base directory into a single pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str
|
The base directory where parquet files are stored. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing all the data from the parquet files. |
load_by_time_range
classmethod
¤
load_by_time_range(
base_path: str,
start_time: Timestamp,
end_time: Timestamp,
) -> pd.DataFrame
Loads parquet files that fall within a specified time range based on the directory structure.
The directory structure is expected to be in the format YYYY/MM/DD/HH.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str
|
The base directory where parquet files are stored. |
required |
start_time
|
Timestamp
|
The start timestamp. |
required |
end_time
|
Timestamp
|
The end timestamp. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the data from the parquet files within the time range. |
load_by_uuid_list
classmethod
¤
load_by_uuid_list(
base_path: str, uuid_list: list
) -> pd.DataFrame
Loads parquet files that match any UUID in the specified list.
The UUIDs are expected to be part of the file names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str
|
The base directory where parquet files are stored. |
required |
uuid_list
|
list
|
A list of UUIDs to filter the files. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the data from the parquet files with matching UUIDs. |
load_files_by_time_range_and_uuids
classmethod
¤
load_files_by_time_range_and_uuids(
base_path: str,
start_time: Timestamp,
end_time: Timestamp,
uuid_list: list,
) -> pd.DataFrame
Loads parquet files that fall within a specified time range and match any UUID in the list.
The directory structure is expected to be in the format YYYY/MM/DD/HH, and UUIDs are part of the file names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str
|
The base directory where parquet files are stored. |
required |
start_time
|
Timestamp
|
The start timestamp. |
required |
end_time
|
Timestamp
|
The end timestamp. |
required |
uuid_list
|
list
|
A list of UUIDs to filter the files. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the data from the parquet files that meet both criteria. |