CatWing Pipeline Design

Quick Reference

Key Entity Tables

DomainCore TablesERD Section
Products products, brands, product_categories, product_attributes, product_images §9a
Suppliers suppliers, x_supplier_brands, x_supplier_products, supplier_available_products §9a
Locations locations, location_relationships, location_settings §9a
Pricing pricing_lists, price_tiers, currencies §9a + §9d
Customers customers, customer_types §9a
Sales sales_orders, sales_order_lines, sales_order_line_updates §9b
Purchasing purchase_orders, purchase_order_lines §9b
Transfers transfer_orders, transfer_order_lines, store_orders §9b
Behaviors product_behaviors, supplier_behaviors, location_behaviors, category_behaviors §9c
Demand demand_forecasts, demand_forecast_products §9c
Inventory inventory_quantities, inventory_strategies, inventory_targets §9c
Recommendations po_recommendations, to_recommendations, assortment_recommendations §9c
Data Quality data_quality_runs, data_quality_checks, data_quality_check_reports, data_health_reports §9c
Statistics daily_*, weekly_*, monthly_*, quarterly_* (16 tables) §10
Global countries, currencies, client_config, users, roles §9d

13-Step Pipeline Overview

StepNameInputOutput
1Data Ingestion ERP APIs, Excel, Market Data S3 raw-data/ (JSON, CSV, XLSX)
2DataQualityGateway S3 raw JSON Staging DB checks → gate decision
3Transform & Load S3 raw JSON / Excel DB core tables (§9a, §9b)
4DataHealthMonitor Freshly loaded DB tables Warning logs + data_health_reports
5BehaviorCalculator Core tables *_behaviors tables (§9c)
6Assortment Optimizer Behaviors + settings assortment_recommendations
7Demand Estimation Behaviors + sales history demand_forecasts, demand_forecast_products
8Aggregation Core + behaviors _stats schema (16 tables)
9AI/ML Preprocessor DB behaviors + forecasts S3 features.parquet
10AI/ML Engine S3 feature Parquet S3 models (.pkl/.json) + config
11Inventory Strategy (MEIO) Forecasts + settings inventory_targets
12AI/ML Inference S3 models + DB targets + inventory forecasts + recommendations → DB
13UpliftMonitor Recommendations + actuals *_recommendation_impacts

Design Principles (abridged)

#Principle
1Data recoverability & reproducibility (S3 = source of truth)
2Idempotent transforms (INSERT ... ON CONFLICT DO UPDATE)
3Multi-tenancy & client isolation (separate SQL Database schemas)
4Per-client customization (subclass overrides)
5Different input ERP data structures (client-specific transforms)
6Strong cohesion, loose coupling
7Maintainability & easy extensibility
8Data scientists work with Parquet
9DB-level domain constraints (lookup tables + FKs)
10Audit-proof configuration (append-only, versioned)
11Explicit naming (no abbreviations, x_ junctions)
12No cyclical data dependencies (strict DAG)
13Minimal nullable FKs (5 approved exceptions)
14No polymorphic tables (per-entity typed tables)
154NF: No bare string enums (FK to lookup tables)
16Cached columns with documented authority + staleness detection
17Structured over schemaless (JSONB only for dynamic data)
18Aggressive DB-level constraints (UNIQUE, NOT NULL, CHECK)
19Serialized pipeline execution (one run at a time per schema)
20No hard deletes on ERP-sourced entities (soft-delete only)
21Atomic pipeline steps (per-step transactions, rollback on failure)
22Date partitioning for time-series tables (monthly/quarterly)
23Dates client-local, timestamps UTC
24Data quality gating (hard gate before DB load)

Glossary

TermDefinition
Echelon A level in the supply chain hierarchy (Store → DC → Warehouse). Each echelon has its own inventory strategies.
MEIO Multi-Echelon Inventory Optimization — coordinated inventory targets across the full supply chain.
Newsvendor Single-period stochastic inventory model. Balances overage cost (Co) vs. underage cost (Cu) using critical ratio Cu/(Cu+Co).
FMR Fast / Medium / Rare — segmentation by % days with sales.
ABC Pareto classification by cumulative revenue — A (top 80%), B (80–95%), C (95–100%).
XYZ Demand variability — X (CV ≤ 0.5), Y (CV ≤ 1.0), Z (CV > 1.0).
Hotness Score Composite 0–100 product ranking: velocity (0.25) + trend (0.15) + shelf (0.15) + margin (0.15) + volume (0.30).
Phase-out Score Algorithmic dead / declining product detection based on sales recency, velocity, and supplier availability.
Upsert INSERT ... ON CONFLICT DO UPDATE — idempotent write ensuring re-running a transform produces identical DB state.
UK UNIQUE constraint (single-column, marked in ERD column definitions).
Junction table Many-to-many cross table, prefixed with x_ (e.g. x_supplier_products).
CASCADE Impact severity: change propagates through 4+ pipeline steps, affects forecasts AND recommendations.
GATE Impact severity: binary on/off — includes or excludes an entity from the entire pipeline.
_stats schema Derivative statistics schema (e.g. cw_200_stats). Droppable and rebuildable from the core schema.
cw_global Shared reference schema for cross-client data: geography, currencies, users, client configuration.