1. Motivation
Performance - full-rebuild bottleneck
The pipeline rebuilds the entire master file from scratch on every run. A single product metadata correction triggers reprocessing of all products × locations × time periods. Phase 3 computes 14 lag columns, 16 EWM features, trend slopes, and cumulative metrics in one Lambda invocation. No incremental path exists.
RAM limitations
Phase 1 pre-loads all purchase orders, receiving orders, store transactions, and invoice lines into memory simultaneously (CW200_Transformer.preload_data()). Phase 3 merges all products × locations into a single DataFrame with 80+ columns. This hits Lambda's 10 GB memory limit as client catalogs grow.
Data consistency - dual-source conflict
Two independent data streams can drift out of sync: (1) the ETL pipeline writes Parquet from ERP APIs, (2) the UI stores user edits to products, assortments, and supplier settings in a separate database. A user marks a product discontinued in the UI, but the next ETL run overwrites it from ERP. A single SQL database as authoritative store for both ETL output and UI edits eliminates this conflict.
Multi-tenancy - client isolation
Client data shares one S3 bucket, separated only by prefix paths (CW-000200/, CW-000151/). A misconfigured client_id could read or write the wrong client's data. The new architecture uses separate database schemas per client with no cross-schema queries.
Data cyclicity
UI actions feed back into the pipeline as inputs. A user disables a location in the UI → this is stored in the DB → an event triggers a new master file generation → the pipeline re-computes predictions and recommendations → the results are written back to the DB, updating the same tables the UI reads from. This circular flow (UI → DB → pipeline → DB → UI) must be managed explicitly: each step in the cycle must be idempotent, the pipeline must read the latest UI state as input, and writes must not overwrite UI-managed fields with stale ERP data (see "Data consistency" above). The new architecture makes this cyclicity a first-class concern rather than an emergent bug.