CatWing Pipeline Design

9a. ERD — Core Entities

Products & Brands

erDiagram
    products {
        int id PK
        string external_id UK
        string product_sku UK
        string product_vendor_sku
        string product_name
        int brand_id FK "nullable, unknown brands"
        int product_category_id FK
        int supplier_id FK
        float cost "cached — authoritative: x_supplier_products.unit_cost"
        float price "cached — authoritative: pricing_lists by priority + date"
        int currency_id FK "cw_global"
        date birth_date
        int gender_id FK
        int product_status_id FK
        int price_tier_id FK
        boolean is_active
    }

    brands {
        int id PK
        string name UK
        string country_of_origin "nullable"
    }

    product_attributes {
        int id PK
        int product_id FK
        string attribute_name
        string attribute_value
    }

    product_images {
        int id PK
        int product_id FK
        string s3_key
        int image_type_id FK
    }

    image_types {
        int id PK
        string name UK "main | gallery | thumbnail | etc."
    }

    brands ||--o{ products : ""
    product_categories ||--o{ products : ""
    genders ||--o{ products : ""
    product_statuses ||--o{ products : ""
    price_tiers ||--o{ products : ""
    suppliers ||--o{ products : "default"
    products ||--o{ product_attributes : ""
    products ||--o{ product_images : ""
    image_types ||--o{ product_images : ""

Convention — cached pricing: products.cost and products.price are pipeline-updated cached snapshots for fast queries. The authoritative cost source is x_supplier_products.unit_cost (per supplier); the authoritative price source is pricing_lists (the highest-rank active price_kind within valid from_date/thru_date range). The pipeline refreshes these cached values during Step 3 (Transform & Load).

Product Lookups

erDiagram
    genders {
        int id PK
        string name UK "Male | Female | Unisex"
    }

    product_statuses {
        int id PK
        string name UK "active | discontinued | draft | etc."
    }

    price_tiers {
        int id PK
        string name UK "per-client: e.g. Entry | Mid | Premium | Luxury"
    }

Suppliers

erDiagram
    suppliers {
        int id PK
        string external_id UK
        string supplier_name
        int city_id FK "cw_global.cities"
        boolean direct_shipment
        int supplier_status_id FK
        boolean is_active
    }

    x_supplier_products {
        int id PK
        int supplier_id FK
        int product_id FK
        boolean is_primary
        string supplier_sku
        float unit_cost
        int currency_id FK "cw_global"
    }

    x_supplier_brands {
        int id PK
        int supplier_id FK
        int brand_id FK
    }

    supplier_statuses {
        int id PK
        string name UK "active | inactive | suspended | etc."
    }

    suppliers ||--o{ x_supplier_products : ""
    products ||--o{ x_supplier_products : ""
    suppliers ||--o{ x_supplier_brands : ""
    brands ||--o{ x_supplier_brands : ""
    supplier_statuses ||--o{ suppliers : ""

Convention — supplier city: If only the supplier's country is known (no city), attach to the country's capital city. The geography hierarchy (cities → municipalities → regions → countries) always provides the country via JOINs.

Pricing

erDiagram
    pricing_lists {
        int id PK
        string external_id UK
        int product_id FK
        int price_kind_id FK
        float price_value
        int currency_id FK "cw_global"
        date from_date
        date thru_date
    }

    product_discount_authorizations {
        int id PK
        int product_id FK
        float discount_pct "supplier-authorized discount ceiling"
        float markdown_pct "permanent markdown (UnitPrice vs StandardUnitPrice)"
        int authorization_source_id FK
        date effective_from
        date effective_to "nullable — open-ended if still active"
    }

    price_kinds {
        int id PK
        string name UK "Базова цена | Промоционална | etc."
        int rank UK "priority ordering: Phase-out(2) > Monthly Discount(1) > Base(0)"
    }

    authorization_sources {
        int id PK
        string name UK "supplier | internal | clearance"
    }

    products ||--o{ pricing_lists : ""
    products ||--o{ product_discount_authorizations : ""
    price_kinds ||--o{ pricing_lists : ""
    authorization_sources ||--o{ product_discount_authorizations : ""

Design note — Pricing & Discount Data Model

The ERP carries multiple price signals per product that must be preserved separately:

ERP Field DB Column Meaning
StandardUnitPrice sales_order_lines.standard_unit_price Manufacturer's MSRP/RRP — the catalog list price
UnitPrice sales_order_lines.unit_price Actual selling price at POS (may be permanently marked down)
LineCustomDiscountPercent sales_order_lines.line_discount_pct Transaction-level discount (POS-applied, on top of any markdown)
LineAmount sales_order_lines.revenue Final revenue = unit_price × qty × (1 - line_discount_pct)
UnitCost (from Store Transaction) sales_order_lines.unit_cost Actual wholesale cost (from ST receipt or PO invoice)
attributes_ErpDiscountPct product_discount_authorizations.discount_pct Supplier-authorized discount ceiling per product

Why this matters: In the current pipeline, StandardUnitPrice is extracted in Phase 1 but lost at Phase 2 aggregation. This causes negative margins for ~20% of watch products where UnitPrice < cost due to clearance/markdown. The new pipeline must carry all price signals through to the analytics layer so downstream models can choose the appropriate price basis (MSRP for margin calculation, actual price for revenue forecasting). See juliani-business.md "Pricing Structure" section for the full investigation.

Locations & Topology

erDiagram
    location_types {
        int id PK
        string name UK "warehouse | dc | store | store_buffer | online | external"
    }

    locations {
        int id PK
        string external_id
        string location_name
        string address
        int location_type_id FK
        int city_id FK "cw_global"
        float latitude
        float longitude
        boolean is_central
        boolean is_buffer
        boolean is_ephemeral
        boolean is_active
    }

    location_relationships {
        int id PK
        int parent_location_id FK
        int child_location_id FK
        int location_relationship_type_id FK
        int lead_time_days
        float transfer_cost "nullable"
        int currency_id FK "nullable, cw_global"
        date effective_from
        date effective_to "nullable"
    }

    location_relationship_types {
        int id PK
        string name UK
    }

    inventory_quantities {
        int id PK
        int product_id FK
        int location_id FK
        date snapshot_date
        float qty_available
        float qty_blocked
        float qty_reserved
        float qty_total
        string batch_id "nullable"
        date batch_expiry_date "nullable"
    }

    location_pos_terminals {
        int id PK
        int location_id FK
        string pos_id UK
    }

    location_types ||--o{ locations : ""
    locations ||--o{ location_relationships : "parent"
    locations ||--o{ location_relationships : "child"
    location_relationship_types ||--o{ location_relationships : ""
    locations ||--o{ location_pos_terminals : ""
    products ||--o{ inventory_quantities : ""
    locations ||--o{ inventory_quantities : ""

Categorization

erDiagram
    product_category_types {
        int id PK
        string type_name UK
        boolean is_hierarchical
        int product_category_type_source_id FK
    }

    product_category_type_sources {
        int id PK
        string name UK "erp | config | computed"
    }

    product_categories {
        int id PK
        int product_category_type_id FK
        int parent_product_category_id FK "nullable, self-ref"
        string name
        string code "nullable"
        int sort_order
    }

    product_category_type_sources ||--o{ product_category_types : ""
    product_category_types ||--o{ product_categories : ""
    product_categories ||--o{ product_categories : "parent"

Settings (UI-managed)

erDiagram
    product_settings {
        int id PK
        int product_id FK
        boolean enabled_for_processing
        float first_order_qty "nullable"
        float package_qty "nullable"
        float minimum_order_qty "nullable"
        float optimal_order_qty "nullable"
    }

    product_location_settings {
        int id PK
        int product_id FK
        int location_id FK
        boolean enabled_for_location
    }

    supplier_settings {
        int id PK
        int supplier_id FK
        boolean operate
        int reorder_time_days "nullable"
        int lead_time_min_days "nullable"
        int lead_time_max_days "nullable"
        float min_order_amount "nullable"
        int currency_id FK "cw_global"
        float free_limit_transport "nullable"
        float transport_cost_min "nullable"
        float transport_cost_max "nullable"
        int backorder_expiry_days "nullable"
        string payment_terms "nullable"
    }

    location_settings {
        int id PK
        int location_id FK
        int office_id "nullable"
        boolean is_main_warehouse
        boolean monitor
        boolean recommend_orders
    }

    products ||--o{ product_settings : ""
    products ||--o{ product_location_settings : ""
    locations ||--o{ product_location_settings : ""
    suppliers ||--o{ supplier_settings : ""
    locations ||--o{ location_settings : ""

External Data — Market & Web

erDiagram
    product_market_observations {
        int id PK
        int product_id FK
        int market_observation_source_id FK
        date observation_date
        float market_share "nullable"
        float competitor_price "nullable"
        int currency_id FK "cw_global"
        jsonb metadata
    }

    category_market_observations {
        int id PK
        int product_category_id FK
        int market_observation_source_id FK
        date observation_date
        float market_share "nullable"
        float category_growth_rate "nullable"
        int currency_id FK "cw_global"
        jsonb metadata
    }

    product_web_analytics_daily {
        int id PK
        int product_id FK
        date observation_date
        int sessions
        int page_views
        int add_to_cart
        int transactions
        float revenue
        int currency_id FK "cw_global"
        float bounce_rate
        float conversion_rate
    }

    location_web_analytics_daily {
        int id PK
        int location_id FK
        date observation_date
        int sessions
        int page_views
        int add_to_cart
        int transactions
        float revenue
        int currency_id FK "cw_global"
        float bounce_rate
        float conversion_rate
    }

    market_observation_sources {
        int id PK
        string name UK "nielsen_jfk | kolkostruva"
    }

    products ||--o{ product_market_observations : ""
    product_categories ||--o{ category_market_observations : ""
    market_observation_sources ||--o{ product_market_observations : ""
    market_observation_sources ||--o{ category_market_observations : ""
    products ||--o{ product_web_analytics_daily : ""
    locations ||--o{ location_web_analytics_daily : ""

External Data — Social

erDiagram
    product_social_ad_daily {
        int id PK
        int product_id FK
        int social_platform_id FK
        date observation_date
        float spend
        int impressions
        int clicks
        int conversions
        float revenue
        int currency_id FK "cw_global"
        float ctr
        float roas
    }

    category_social_ad_daily {
        int id PK
        int product_category_id FK
        int social_platform_id FK
        date observation_date
        float spend
        int impressions
        int clicks
        int conversions
        float revenue
        int currency_id FK "cw_global"
        float ctr
        float roas
    }

    social_platforms {
        int id PK
        string name UK "facebook | instagram"
    }

    products ||--o{ product_social_ad_daily : ""
    product_categories ||--o{ category_social_ad_daily : ""
    social_platforms ||--o{ product_social_ad_daily : ""
    social_platforms ||--o{ category_social_ad_daily : ""

Customers

erDiagram
    customers {
        int id PK
        string external_id UK "ERP party/customer number"
        int customer_type_id FK
        string name "nullable"
    }

    customer_types {
        int id PK
        string name UK "Individual | Business | Wholesale | etc."
    }

    customer_types ||--o{ customers : ""

users table: Defined in §9d (cw_global schema). All client-schema tables reference it via audit columns (see convention below).

Convention — audit columns: All non-stats tables include created_at, updated_at, deleted_at (timestamps) and created_by, updated_by (FK → users). These columns are omitted from the ERD diagrams for readability.

Convention — is_active vs deleted_at: is_active reflects the ERP source system status (e.g., the ERP says this product is discontinued). deleted_at is our internal soft-delete (we removed this record from our platform). A product can be is_active = false (discontinued in ERP) but deleted_at = null (still in our system for historical analysis).

Product attributes: The product_attributes EAV table stores domain-specific fields (e.g., color, dial_color, strap_color for watches) without coupling the core schema to any product domain.

Categorization: The product_category_typesproduct_categories pattern supports arbitrary per-client category trees. Products always have exactly one category via direct product_category_id FK (NOT NULL), validated at application level to ensure the assigned category is of the ERP type. Example types: erp_category (hierarchical, from ERP), collection / subcollection (brand-specific). The product_category_type_source_id FK distinguishes ERP-sourced vs. config-driven vs. computed categories. Clients define only the category types they need.

Convention — currency: All monetary values in a row share the row's currency_id (FK → cw_global.currencies, NOT NULL). Stats-schema tables always use the client's base currency (defined in cw_global.client_config). Step 3 always resolves the currency; when the source omits it, client_config.base_currency_id is written.

Convention — translations: User-facing text fields (product_name, supplier_name, location_name, product_categories.name, etc.) store the primary-language value directly. Additional translations live in per-entity *_translations tables (product_translations, product_category_translations, location_translations, supplier_translations) with full FK enforcement. See §9d.

Convention — external_id sharing: A single ERP external_id may map to multiple location_id records with different location_type_id values — e.g. one ERP code split into a STORE and a DC within the same physical premises.

Convention — settings vs. ERP data: products, suppliers, and locations hold ERP-sourced identity/reference data. The *_settings tables hold UI-managed operational configuration (lead times, order quantities, monitoring flags). Settings override ERP defaults when both exist.

Convention — audit-proof settings: All *_settings tables and location_relationships are append-only — rows are never updated or deleted. Each insert creates a new version. created_at serves as the effective-from timestamp; created_by records who made the change. The current configuration is the row with the latest created_at per parent entity. To determine what changed: compare consecutive rows for the same entity.

Convention — product catalog: Both RawDataEndpoint.PRODUCT_CATALOG and RawDataEndpoint.PRODUCTS API endpoints feed into the same products table after Step 3 transformation.

Convention — external data: External data sources (market data, web analytics, social ads) store raw observations in scope-specific tables — one per entity scope (e.g., product_market_observations for product-level, category_market_observations for category-level). Each table has a NOT NULL FK to its scope entity; no polymorphic nullable FKs. Derived features (e.g., market share trend, ad-driven demand signal) are computed during Step 5 (BehaviorCalculator) and written to the corresponding behavior tables. Raw observations are never modified after ingestion.

Convention — compound UNIQUE constraints: Single-column UNIQUEs are marked UK in ERDs. Compound constraints (not expressible in Mermaid) are listed here. These also serve as INSERT ... ON CONFLICT upsert targets.

Table UNIQUE Constraint
locations (external_id, location_type_id)
x_supplier_products (supplier_id, product_id)
x_supplier_brands (supplier_id, brand_id)
x_user_roles (user_id, role_id)
product_attributes (product_id, attribute_name)
inventory_target_params (inventory_target_id, param_name)
embeddings (product_id, embedding_type_id, model_version)
assortment_products (assortment_id, product_id)
product_translations (product_id, field_name, language_id)
product_category_translations (product_category_id, field_name, language_id)
location_translations (location_id, field_name, language_id)
supplier_translations (supplier_id, field_name, language_id)
inventory_quantities (product_id, location_id, snapshot_date)
product_behaviors (product_id, snapshot_date)
location_behaviors (location_id, snapshot_date)
supplier_behaviors (supplier_id, snapshot_date)
category_behaviors (product_category_id, snapshot_date)
product_location_behaviors (product_id, location_id, snapshot_date)
weather_observations (city_id, observation_date)
product_web_analytics_daily (product_id, observation_date)
location_web_analytics_daily (location_id, observation_date)
product_social_ad_daily (product_id, social_platform_id, observation_date)
category_social_ad_daily (product_category_id, social_platform_id, observation_date)
product_market_observations (product_id, market_observation_source_id, observation_date)
category_market_observations (product_category_id, market_observation_source_id, observation_date)
inventory_targets (product_id, location_id, snapshot_date, inventory_strategy_id)
currency_rates (from_currency_id, to_currency_id, effective_date)
demand_forecast_products (demand_forecast_id, product_id, location_id, demand_source_type_id, horizon_days)
po_recommendations (product_id, supplier_id, location_id, recommendation_date)
to_recommendations (product_id, from_location_id, to_location_id, recommendation_date)
pricing_lists (product_id, price_kind_id, from_date)
supplier_available_products (product_id, supplier_id, file_date)
assortments (assortment_type_id, brand_id, location_id, effective_date)
product_settings (product_id)
supplier_settings (supplier_id)
location_settings (location_id)
product_location_settings (product_id, location_id)