Concept
ts-shape is a lightweight toolkit for shaping timeseries data into analysis-ready DataFrames.
Architecture
flowchart LR
subgraph Load
L1[Parquet]
L2[S3/Azure]
L3[TimescaleDB]
L4[Metadata]
end
L1 --> C[Combine]
L2 --> C
L3 --> C
L4 --> C
C --> T[Transform]
T --> F[Features]
F --> E[Events]
E --> O[Output]
Core Principles
Principle
Description
DataFrame-First
Every operation accepts and returns Pandas DataFrames
Modular
Use only what you need - all components are decoupled
Composable
Chain operations together like building blocks
Consistent Schema
Simple, predictable data structure
Data Model
Timeseries DataFrame
Column
Type
Description
uuid
string
Signal/sensor identifier
systime
datetime
Timestamp (tz-aware recommended)
value_double
float
Numeric measurements
value_integer
int
Counter/integer values
value_string
string
Categorical data
value_bool
bool
Binary states
is_delta
bool
Delta vs absolute (optional)
Column
Type
Description
uuid
string
Signal identifier (join key)
label
string
Human-readable name
unit
string
Measurement unit
config.*
any
Additional configuration
Module Reference
Loaders
Module
Source
Method
ParquetLoader
Local/remote parquet
load_all_files()
S3ProxyParquetLoader
S3-compatible storage
fetch_data_as_dataframe()
AzureBlobLoader
Azure Blob containers
fetch_data_as_dataframe()
TimescaleLoader
TimescaleDB
fetch_data_as_dataframe()
MetadataLoader
JSON files
to_df()
Module
Purpose
NumericFilter
Filter by numeric range, null handling
StringFilter
Pattern matching, contains, regex
DateTimeFilter
Time range, weekday, hour filters
BooleanFilter
Flag-based row filtering
CustomFilter
Flexible pandas query syntax filtering
NumericCalc
Derived columns, calculations
TimezoneShift
Convert between timezones
TimestampConverter
Parse/format timestamps
LambdaProcessor
Apply custom functions to columns
Features
Module
Output
NumericStatistics
min, max, mean, std, percentiles
TimestampStats
first, last, count, coverage
StringStatistics
value counts, cardinality
CycleExtractor
Cycle detection, validation, method suggestion
Events - Quality
Module
Detection
OutlierDetection
Z-score, IQR, MAD, IsolationForest
StatisticalProcessControl
Western Electric Rules, CUSUM shifts
ToleranceDeviation
Specification violations, Cp/Cpk indices
Events - Production (Traceability)
Module
Purpose
PartProductionTracking
Production by part, daily summaries, totals
QualityTracking
NOK/scrap analysis, FPY, defect reasons
CycleTimeTracking
Cycle times, slow cycles, trends
DowntimeTracking
Downtime by shift/reason, availability
ShiftReporting
Shift production, targets, comparisons
MachineStateEvents
Run/idle intervals, transitions
ChangeoverEvents
Product changeover detection, windows
Events - Engineering
Module
Purpose
SetpointChangeEvents
Step/ramp detection, settling, overshoot
StartupEvents
Startup detection
Advanced Capabilities
Quality & SPC
Feature
Module
Method
CUSUM Shift Detection
StatisticalProcessControl
detect_cusum_shifts()
Western Electric Rules
StatisticalProcessControl
apply_rules_vectorized()
Rule Interpretations
StatisticalProcessControl
interpret_violations()
Dynamic Control Limits
StatisticalProcessControl
calculate_dynamic_control_limits()
Process Capability (Cp/Cpk)
ToleranceDeviation
compute_capability_indices()
Outlier Detection Methods
Method
Description
Best For
Z-score
Distance from mean in std units
Normal distributions
IQR
Interquartile range based
Skewed distributions
MAD
Median Absolute Deviation
Robust to extremes
IsolationForest
ML-based anomaly detection
Complex patterns
Cycle Analysis
Feature
Method
Description
Method Suggestion
suggest_method()
Auto-detect best extraction method
Cycle Validation
validate_cycles()
Validate duration constraints
Overlap Detection
detect_overlapping_cycles()
Find and resolve overlaps
Extraction Stats
get_extraction_stats()
Track success rate
Production Traceability
Feature
Module
Key Methods
Part Tracking
PartProductionTracking
production_by_part(), daily_production_summary()
Quality/NOK
QualityTracking
nok_by_shift(), quality_by_part(), nok_by_reason()
Cycle Times
CycleTimeTracking
cycle_time_statistics(), detect_slow_cycles(), cycle_time_trend()
Downtime
DowntimeTracking
downtime_by_shift(), downtime_by_reason(), availability_trend()
Shift Reports
ShiftReporting
shift_production(), shift_targets(), shift_comparison()
Machine State
MachineStateEvents
detect_run_idle(), transition_events(), state_quality_metrics()
Changeovers
ChangeoverEvents
detect_changeover(), changeover_window()
Control Quality KPIs
Feature
Module
Method
Time to Settle
SetpointChangeEvents
time_to_settle()
Rise Time
SetpointChangeEvents
rise_time()
Overshoot/Undershoot
SetpointChangeEvents
overshoot_metrics()
Oscillation Analysis
SetpointChangeEvents
oscillation_frequency()
Decay Rate
SetpointChangeEvents
decay_rate()
Comprehensive Metrics
SetpointChangeEvents
control_quality_metrics()
Pipeline Pattern
# 1. LOAD
from ts_shape.loader.timeseries.parquet_loader import ParquetLoader
from ts_shape.loader.metadata.metadata_json_loader import MetadataLoader
ts_df = ParquetLoader . load_all_files ( "data/" )
meta_df = MetadataLoader ( "config/signals.json" ) . to_df ()
# 2. COMBINE
from ts_shape.loader.combine.integrator import DataIntegratorHybrid
df = DataIntegratorHybrid . combine_data (
timeseries_sources = [ ts_df ],
metadata_sources = [ meta_df ],
join_key = "uuid"
)
# 3. TRANSFORM
from ts_shape.transform.filter.datetime_filter import DateTimeFilter
from ts_shape.transform.filter.numeric_filter import NumericFilter
df = DateTimeFilter . filter_after ( df , "systime" , "2024-01-01" )
df = NumericFilter . filter_not_null ( df , "value_double" )
# 4. ANALYZE
from ts_shape.features.stats.numeric_stats import NumericStatistics
from ts_shape.events.quality.outlier_detection import OutlierDetection
stats = NumericStatistics ( df , "value_double" )
outliers = OutlierDetection . detect_zscore_outliers ( df , "value_double" , threshold = 3.0 )
Design Decisions
Why DataFrames?
Universal : Understood by all data scientists
Ecosystem : Works with matplotlib, scikit-learn, etc.
Debuggable : Easy to inspect intermediate results
Exportable : Save to CSV, parquet, database
Why Modular?
Lightweight : Import only what you need
Testable : Each component works independently
Extensible : Add custom modules easily
Maintainable : Clear separation of concerns
Why This Schema?
Flexible : Not all columns required
Multi-type : Handles numeric, string, boolean values
Joinable : UUID enables metadata enrichment
Sparse-friendly : Nulls are fine
Extending ts-shape
Custom Loader
class MyDatabaseLoader :
def __init__ ( self , connection : str ):
self . conn = connection
def fetch_data_as_dataframe ( self , start : str , end : str ) -> pd . DataFrame :
# Query database, return DataFrame with uuid, systime, value_*
return df
class MyFilter :
@staticmethod
def filter_business_hours ( df : pd . DataFrame , column : str ) -> pd . DataFrame :
hours = pd . to_datetime ( df [ column ]) . dt . hour
return df [( hours >= 9 ) & ( hours < 17 )]
Custom Feature
class MyMetrics :
def __init__ ( self , df : pd . DataFrame , column : str ):
self . data = df [ column ] . dropna ()
def coefficient_of_variation ( self ) -> float :
return self . data . std () / self . data . mean ()
When to Use ts-shape
Use Case
ts-shape?
Load parquet/S3/Azure/DB into DataFrames
Yes
Filter and transform timeseries
Yes
Compute statistics on signals
Yes
Detect outliers and events
Yes
Real-time streaming
No (use Kafka/Flink)
Sub-millisecond latency
No (use specialized libs)
GPU acceleration
No (use cuDF/Rapids)