7. S3 Storage Layout
flowchart LR
subgraph S3["S3: catwing-masters"]
direction LR
subgraph RAW["raw-data/ (immutable, append-only)"]
R1["CW-000200/
2026-02-24/
products.json
sales_orders.json
purchase_orders.json
receiving_orders.json
store_orders.json
transfer_orders.json
inventory.json
pricing_lists.json
locations.json
suppliers.json"]
R2["CW-000151/
2026-02-24/
..."]
end
subgraph IMAGES["product-images/"]
I1["CW-000200/
{product_id}.jpg
..."]
end
subgraph CLIENT["client-data/ (manual uploads)"]
C1["CW-000200/
supplier-availability/
static_suppliers_list.xlsx
locations.xlsx
supply_network_config.xlsx"]
end
subgraph ML["ml-output/ (ML artifacts)"]
M1["CW-000200/
{run_date}/
features.parquet
targets.parquet
predictions.parquet
models/
classifier.pkl
regressor.pkl
warehouse_model.pkl
config.json"]
end
subgraph BACKUP["db-backups/"]
B1["daily/
2026-02-24.sql.gz
weekly/
2026-W09.sql.gz"]
end
R1 ~~~ I1 ~~~ C1 ~~~ M1 ~~~ B1
end
style RAW fill:#E8F5E9,stroke:#388E3C
style IMAGES fill:#E3F2FD,stroke:#1976D2
style ML fill:#FFF3E0,stroke:#F57C00
style BACKUP fill:#FCE4EC,stroke:#C62828
style CLIENT fill:#F3E5F5,stroke:#7B1FA2
8. Disaster Recovery & Reproducibility
flowchart TB
subgraph NORMAL["Normal Operation"]
direction LR
ERP["ERP APIs"] -->|"daily fetch
(Phase 0)"| S3[("S3: raw-data/
(immutable)")]
S3 -->|"Step 3
transform"| DB[("SQL Database")]
DB -->|"nightly
pg_dump"| BK[("S3: db-backups/
daily .sql.gz")]
end
FAIL{"DB failure"}
subgraph FAST["Option A: Fast Recovery"]
direction TB
A1["1. Restore latest
pg_dump backup"]
A2["2. Replay raw JSON
files dated AFTER
backup timestamp"]
A3["3. Recompute stats
schema from
general schema"]
A1 --> A2 --> A3
end
subgraph FULL["Option B: Full Rebuild"]
direction TB
B1["1. Create empty DB
with schema migrations"]
B2["2. Replay ALL raw JSON
files chronologically
(earliest → latest)"]
B3["3. Run enrichment
pipeline"]
B4["4. Recompute all
statistics"]
B1 --> B2 --> B3 --> B4
end
NORMAL --> FAIL
FAIL -->|"partial loss
or corruption"| FAST
FAIL -->|"total loss or
schema migration"| FULL
S3 -.->|"source of
truth"| A2
S3 -.->|"source of
truth"| B2
BK -.->|"latest
snapshot"| A1
style S3 fill:#E8F5E9,stroke:#388E3C
style BK fill:#FCE4EC,stroke:#C62828
style FAST fill:#E3F2FD,stroke:#1976D2
style FULL fill:#FFF3E0,stroke:#EF6C00
Key guarantee: Raw JSON in S3 is the ultimate source of truth. The entire database can be regenerated from scratch by replaying all JSON files in chronological order.