CatWing Pipeline Design

7. S3 Storage Layout

flowchart LR
    subgraph S3["S3: catwing-masters"]
        direction LR
        subgraph RAW["raw-data/  (immutable, append-only)"]
            R1["CW-000200/
2026-02-24/
products.json
sales_orders.json
purchase_orders.json
receiving_orders.json
store_orders.json
transfer_orders.json
inventory.json
pricing_lists.json
locations.json
suppliers.json"] R2["CW-000151/
2026-02-24/
..."] end subgraph IMAGES["product-images/"] I1["CW-000200/
{product_id}.jpg
..."] end subgraph CLIENT["client-data/ (manual uploads)"] C1["CW-000200/
supplier-availability/
static_suppliers_list.xlsx
locations.xlsx
supply_network_config.xlsx"] end subgraph ML["ml-output/ (ML artifacts)"] M1["CW-000200/
{run_date}/
features.parquet
targets.parquet
predictions.parquet
models/
classifier.pkl
regressor.pkl
warehouse_model.pkl
config.json"] end subgraph BACKUP["db-backups/"] B1["daily/
2026-02-24.sql.gz
weekly/
2026-W09.sql.gz"] end R1 ~~~ I1 ~~~ C1 ~~~ M1 ~~~ B1 end style RAW fill:#E8F5E9,stroke:#388E3C style IMAGES fill:#E3F2FD,stroke:#1976D2 style ML fill:#FFF3E0,stroke:#F57C00 style BACKUP fill:#FCE4EC,stroke:#C62828 style CLIENT fill:#F3E5F5,stroke:#7B1FA2

8. Disaster Recovery & Reproducibility

flowchart TB
    subgraph NORMAL["Normal Operation"]
        direction LR
        ERP["ERP APIs"] -->|"daily fetch
(Phase 0)"| S3[("S3: raw-data/
(immutable)")] S3 -->|"Step 3
transform"| DB[("SQL Database")] DB -->|"nightly
pg_dump"| BK[("S3: db-backups/
daily .sql.gz")] end FAIL{"DB failure"} subgraph FAST["Option A: Fast Recovery"] direction TB A1["1. Restore latest
pg_dump backup"] A2["2. Replay raw JSON
files dated AFTER
backup timestamp"] A3["3. Recompute stats
schema from
general schema"] A1 --> A2 --> A3 end subgraph FULL["Option B: Full Rebuild"] direction TB B1["1. Create empty DB
with schema migrations"] B2["2. Replay ALL raw JSON
files chronologically
(earliest → latest)"] B3["3. Run enrichment
pipeline"] B4["4. Recompute all
statistics"] B1 --> B2 --> B3 --> B4 end NORMAL --> FAIL FAIL -->|"partial loss
or corruption"| FAST FAIL -->|"total loss or
schema migration"| FULL S3 -.->|"source of
truth"| A2 S3 -.->|"source of
truth"| B2 BK -.->|"latest
snapshot"| A1 style S3 fill:#E8F5E9,stroke:#388E3C style BK fill:#FCE4EC,stroke:#C62828 style FAST fill:#E3F2FD,stroke:#1976D2 style FULL fill:#FFF3E0,stroke:#EF6C00

Key guarantee: Raw JSON in S3 is the ultimate source of truth. The entire database can be regenerated from scratch by replaying all JSON files in chronological order.