Concept¤
ts-shape is a lightweight, composable toolkit for shaping time series data into analysis-ready DataFrames. It focuses on three pillars: loading, transforming, and extracting higher-level features/events — with a consistent, Pandas-first interface.
Architecture Overview¤
flowchart TD
A[Loaders: Timeseries + Metadata] --> B[Combine: join on uuid]
B --> C[Transform: Filters / Functions / Time Functions / Calculator]
C --> D[Features: Stats / Time Stats / Cycles]
D --> E[Events: Quality / Maintenance / Production / Engineering]
Core ideas:
- DataFrame-in, DataFrame-out: Every stage accepts and returns Pandas DataFrames for easy composition.
- Simple schema: Timeseries frames use a compact set of typed columns; metadata/enrichment joins on a stable
uuid
key. - Modular blocks: Use only what you need — loaders, transforms, features, and events are decoupled.
Data Model¤
Timeseries DataFrame (typical columns):
- uuid: string identifier for a signal/series
- systime: timestamp (tz-aware recommended)
- value_double, value_integer, value_string, value_bool: value channels (one or more may be present)
- is_delta: boolean flag indicating delta semantics (optional)
Metadata DataFrame:
- Indexed by uuid or has a
uuid
column - Arbitrary columns describing the signal (label, unit, config.*)
Conventions:
- Join key is
uuid
by default. - Keep values narrow: prefer one type-specific value column where possible.
Loaders¤
Timeseries:
- Parquet folder loader: Recursively reads parquet files from local/remote mounts.
- S3 proxy parquet loader: Streams parquet via S3-compatible endpoints.
- Azure Blob parquet loader: Loads parquet files from containers; supports time-based folder structure (parquet/YYYY/MM/DD/HH) and UUID filters.
- TimescaleDB loader: Streams rows by UUID and time range; can emit parquet partitioned by hour.
Metadata:
- JSON metadata loader: Robustly ingests JSON in multiple shapes (list-of-records, dicts of lists/dicts), flattens
config
into columns, and indexes byuuid
.
All loaders expose either a DataFrame-returning method (e.g., fetch_data_as_dataframe
, to_df
) or a parquet materialization method when desired.
Combination Layer¤
Use DataIntegratorHybrid.combine_data(...)
to merge timeseries and metadata sources into one frame:
- Accepts DataFrames or source objects (with
fetch_data_as_dataframe
/fetch_metadata
). - Merges on
uuid
(configurable), supporting different join strategies (left
,inner
, ...).
Example:
from ts_shape.loader.combine.integrator import DataIntegratorHybrid
combined = DataIntegratorHybrid.combine_data(
timeseries_sources=[ts_df_or_loader],
metadata_sources=[meta_df_or_loader],
uuids=["id-1", "id-2"],
join_key="uuid",
merge_how="left",
)
Transform¤
Reusable blocks to reshape and clean data:
- Filters: datatype-specific predicates (numeric/string/boolean/datetime) to subset rows or fix values.
- Functions: arbitrary lambda-like transformations for columns.
- Time Functions: timestamp operations (timezone shift, conversion, resampling helpers).
- Calculator: numeric calculators to derive engineered columns.
All transformations accept/return DataFrames to compose pipelines like small, testable steps.
Features¤
Feature extractors summarize series into compact descriptors:
- Stats: per-type descriptive stats (min/max/mean/std for numeric, frequency for strings, etc.).
- Time Stats: timestamp-specific stats (first/last timestamp, counts per window, coverage).
- Cycles: utilities to identify and process cycles in signals.
DescriptiveFeatures.compute(...)
can emit a nested dict or a flat DataFrame for easy downstream analysis.
Events¤
Event detectors derive categorical flags and ranges from raw signals:
- Quality: outlier detection, SPC rules, tolerance deviations.
- Maintenance: downtime and other operational events.
- Production/Engineering: domain patterns extractable from the shaped series.
Each detector takes a DataFrame and returns either annotated frames or event tables.
Typical Pipeline¤
- Load
- Read timeseries (e.g., parquet or DB) into a DataFrame with
uuid
,systime
, and values. -
Load metadata JSON and convert to a
uuid
-indexed DataFrame. -
Combine
-
Join timeseries with metadata on
uuid
to enrich context. -
Transform
-
Apply filters/functions/time operations; compute engineered columns.
-
Features & Events
-
Compute stats and time stats; identify domain events.
-
Output
- Keep as a DataFrame, write parquet/CSV, or feed to a model/BI tool.
Design Principles¤
- Minimal assumptions: Works with partial columns; you choose the value channel(s) in play.
- Composability: Small building blocks; pure DataFrame IO.
- Performance-aware: Vectorized Pandas ops; chunked DB reads; concurrent IO for remote storage.
- Extensible: Add new loaders, transforms, features, or events with simple, documented interfaces.
Extending ts-shape¤
- New loader: implement a class with
fetch_data_as_dataframe()
or an explicitto_parquet()
flow. - New transform: write a function that takes/returns a DataFrame; place under
transform/*
. - New feature/event: follow existing patterns; accept a DataFrame and return a summary/event frame.
When to Use ts-shape¤
- You need a quick, pythonic path from raw timeseries + context to analysis-ready tables.
- You want modular building blocks instead of a monolithic framework.
- You operate across storage backends (parquet, S3/Azure, SQL) and prefer a unified DataFrame API.