HWHeat Waves
    DashboardUtforsk
    Analyse
    Data Kilder
        • README
        • v2 Rationale
        • v2-001 Migration
        • v2-002 Filter Codes
        • Crosstab
        • Brask Ingest
        • Patches & RPC
      • Tables & Grain
      • Postgres Contract
    • Design Rationale
    • Doc Map
    DocsSettings
    DashboardAtlasUtforsk
    Analyse
    Data Kilder
    1. Documentation
    2. Fire Data Schema
    3. Schema history index
    4. V2 Design Rationale Overview

    Loading documentation...

    HeatWaves v2 schema — design notes

    Source of truth role: high-level design rationale for v2 schema decisions. Do not treat this file as migration/contract authority; use docs/fire-data-schema/postgres-contract-rls-and-keys.md (and generated types) for executable shapes. v2-001-initial-migration-narrative.md / v2-002-filter-codes-migration-narrative.md remain migration-history narratives that may lag the linked database. Conflict-resolution note: for runtime filter-axis/RPC semantics and physical column types, docs/fire-data-schema/postgres-contract-rls-and-keys.md + generated types win when narrative docs disagree with earlier migrations.

    Companion to supabase/migrations/<ts>_v2_initial_schema.sql. This file explains why the schema looks the way it does, so future contributors do not have to reverse-engineer the rationale from SQL alone.

    At a glance

    fire_data schema (analytical, exposed)        ingest schema (operational, hidden)
    ─────────────────────────────────             ────────────────────────────────────
    sources         ← source registry             extraction_state ← scheduling/refresh
    dimensions      ← analytical axes             raw_payloads     ← optional audit log
    categories      ← source-scoped values
    fact_daily      ← DATE-grain incident facts (sources that support sub-year reads; see sources.supports_subyear)
    fact_yearly     ← YEAR-grain aggregates ingested at year cadence (SSB, BRIS yearly-only paths, etc.)
    fact_buildings_yearly ← SSB building stock (non-fire, for normalisation)
    

    Why these specific decisions

    1. Two schemas, not one

    fire_data holds everything a user query touches. ingest holds operational state. Supabase doesn't auto-expose custom schemas via PostgREST, so this yields the security boundary for free: fire_data is added to the dashboard's "Exposed schemas" list; ingest is not. Service-role keys bypass RLS to write into either.

    2. Fact tables split by grain

    fact_daily and fact_yearly are separate tables. A query for "monthly chart of BRASK + POLICE for 2023" hits only fact_daily (via get_fire_data with sub-year p_granularity). get_fire_data does not merge both tables in one aggregation: it branches on p_granularity — year reads fact_yearly only; day · week · month · quarter read fact_daily only, intersected with sources.supports_subyear so yearly-only catalogs never appear in sub-year reads. That keeps semantics aligned with upstream cadence without silent cross-grain mixing.

    Older migration-era prose sometimes described a UNION ALL over both fact tables for a single RPC body; the live contract is branching + supports_subyear (see docs/fire-data-schema/postgres-contract-rls-and-keys.md, docs/fire-data-schema/schema-history/incremental-patches-and-rpc-changelog.md).

    3. Filter axes as columns, not dimensions

    fire_type_code, building_type_code, building_age_code, region_code are nullable INTEGER FK columns → fire_data.filters.id. NULL means “this source doesn't carry that axis on this row”. Cross-source filters pass standard or native strings into get_fire_data; the RPC resolves them to ids (EXISTS predicates). Rows with NULL ids drop out automatically — graceful degradation by construction — no OR logic or per-source branches.

    Contract detail: fact_yearly / fact_daily each use a BEFORE INSERT OR UPDATE trigger (fact_yearly_enforce_filter_axes, fact_daily_enforce_filter_axes) calling fire_data.enforce_fact_filter_code_axes(). Optional filters.applicable_dim_codes restricts which dim_code values may carry a given filters.id. Full detail: docs/fire-data-schema/postgres-contract-rls-and-keys.md.

    Ingest writes atomic rows (one FK per axis per row, or NULL). get_fire_data may still pass several tokens on the same axis; the RPC resolves them to multiple ids combined with OR. See docs/ingestion/pipeline-slices-idempotency-and-triggers.md §5.3–5.4 and docs/fire-data-schema/schema-history/incremental-patches-and-rpc-changelog.md (RPC note).

    4. Source-scoped categories

    BRASK's "Ildsted" and POLICE's "Ildsted" are different rows in categories. They share standard_code = 'FIREPLACE'. This means:

    • Utforsk shows source-native wording (categories.native_name, optional name) per source.
    • Dashbord can compare across sources via standard_code.
    • A source revising a category never breaks another source's data.

    The cost is roughly 2× more rows in categories. At a few hundred rows total, this is invisible.

    5. Surrogate id + unique-index natural key

    Incident fact tables use NULLS NOT DISTINCT on the natural-key unique index (all grain keys plus the four nullable filter FK columns) so NULL slices collide correctly. ON CONFLICT (...) lists the same columns as that index.

    See docs/fire-data-schema/postgres-contract-rls-and-keys.md for the exact index and upsert shape.

    6. Numeric spine; labels from registry tables

    sources.source_id and dimensions.dim_code are INTEGER PKs. Short machine labels (sources.name) and titles (long_name) live on sources; axis titles on dimensions.name.

    EU-aligned analytical axes use fixed dim_code ids from docs/ingestion/adapter-mappings/category-native-to-cat-code.md:

    dim_codeTypical dimensions.name
    1Årsak
    2Tennkilde
    3Objekt
    4Arnested

    Further axes (Bygningsalder, Bygningstype, …) each receive their own integer PK row in dimensions at seed time — resolve labels via get_dimension_catalog or direct SELECT from fire_data.dimensions, not string tokens.

    Filter facts store filters.id; filters.native_code preserves upstream vocabulary and standard_code carries cross-source buckets. get_fire_data still accepts standard/native strings for filters — it resolves them to ids server-side.

    cat_code stays INT per BRASK; (source_id, dim_code, cat_code) keeps categories source-scoped.

    Exception: fact_buildings_yearly keeps TEXT building_type_code (SSB building-stock taxonomy), not the incident filters spine.

    7. value and value_kr as separate columns

    Adding a metric (insurance kr, injuries, fatalities) is a nullable column, not a refactor. Sources that don't track the metric leave it NULL and are silently absent from queries that target it. This pattern scales to ~5 metrics; beyond that, refactor to a metric-pivot table. YAGNI until then.

    8. extraction_state tracks slices, not rows

    The natural-key UPSERT on fact tables handles per-row dedup natively. The extraction_state table answers a different question: should the scheduler refresh this slice today? Diagnostic columns (last_row_count, last_total_value) make it cheap to detect a slice that suddenly returned zero rows or whose total dropped — usually an upstream API hiccup, sometimes a real correction.

    Adding a new dimension (the future-proof test)

    This is the test that decides whether the schema works. A new dimension — say, "Erstatningsutgifter per skadetype" — must require zero schema changes to deploy. Specifically, adding it should require only:

    1. One INSERT into fire_data.dimensions (new integer PK row — assign next code consistent with seeding policy).
    2. N INSERTs into fire_data.categories for the values BRASK reports.
    3. A new method on the BRASK adapter that ingests rows under the new dim_code.
    4. A redeploy of the adapter; the catalog RPC and the chart RPC are unchanged.

    If a new dimension forces you to alter fact_daily, the schema has failed its design goal. (Adding a new metric — like value_kr for kr — does require a column. Metrics ≠ dimensions.)

    Adding a new source

    Same shape. A new source — say, an EU dataset — needs:

    1. One INSERT into fire_data.sources.
    2. N INSERTs into fire_data.categories for that source's vocabulary.
    3. A new adapter file at lib/data-model/adapters/eu.ts.
    4. Wiring it into the ingest scheduler.

    The chart RPC, the catalog RPC, every UI component — all untouched.

    What's NOT in this schema (and where it goes)

    • RPCs (get_fire_data, get_dimension_catalog, etc.) — separate migration. Schema is the noun, RPCs are the verbs. Build nouns first.
    • TypeScript types — generated by supabase gen types typescript after this migration applies. Do not hand-write.
    • Adapters — TypeScript modules in lib/data-model/adapters/. They depend on the generated types. Build last.
    • Historical data migration — separate one-shot script that reads from legacy public.fire_data and writes to fire_data.fact_daily. Run once, archive, never run again.

    Known open questions

    • Region taxonomy: incident region_code on fact_* is an FK to filters.id (axis='region'). Encode ISO / Cresta / native strings as filters.native_code / standard_code before heavy regional ingestion.
    • Erstatningsutgifter precision: value_kr is NUMERIC (arbitrary). BRASK reports whole NOK amounts; decimals are unused. Could narrow to NUMERIC(15, 0) for storage savings. Negligible at current scale; defer.
    • fact_yearly partitioning: not partitioned today. Re-evaluate at ~1M rows. Probably never matters.
    • Hard-fire vs near-miss distinction in BRIS: the taxonomy doc mentions "Brann i bygning" vs "Branntilløp" partly encodes this distinction. Fire type is tagged via filters rows; whether a dedicated severity_code axis is also needed remains open. Defer until the UI surfaces a need.