Loading documentation...
Source of truth role: high-level design rationale for v2 schema decisions.
Do not treat this file as migration/contract authority; use docs/fire-data-schema/postgres-contract-rls-and-keys.md
(and generated types) for executable shapes. v2-001-initial-migration-narrative.md / v2-002-filter-codes-migration-narrative.md
remain migration-history narratives that may lag the linked database.
Conflict-resolution note: for runtime filter-axis/RPC semantics and physical column types,
docs/fire-data-schema/postgres-contract-rls-and-keys.md + generated types win when narrative docs disagree with earlier migrations.
Companion to supabase/migrations/<ts>_v2_initial_schema.sql. This file
explains why the schema looks the way it does, so future contributors do
not have to reverse-engineer the rationale from SQL alone.
fire_data schema (analytical, exposed) ingest schema (operational, hidden)
───────────────────────────────── ────────────────────────────────────
sources ← source registry extraction_state ← scheduling/refresh
dimensions ← analytical axes raw_payloads ← optional audit log
categories ← source-scoped values
fact_daily ← DATE-grain incident facts (sources that support sub-year reads; see sources.supports_subyear)
fact_yearly ← YEAR-grain aggregates ingested at year cadence (SSB, BRIS yearly-only paths, etc.)
fact_buildings_yearly ← SSB building stock (non-fire, for normalisation)
fire_data holds everything a user query touches. ingest holds operational
state. Supabase doesn't auto-expose custom schemas via PostgREST, so this
yields the security boundary for free: fire_data is added to the
dashboard's "Exposed schemas" list; ingest is not. Service-role keys
bypass RLS to write into either.
fact_daily and fact_yearly are separate tables. A query for "monthly chart
of BRASK + POLICE for 2023" hits only fact_daily (via get_fire_data with
sub-year p_granularity). get_fire_data does not merge both tables
in one aggregation: it branches on p_granularity — year reads
fact_yearly only; day · week · month · quarter read
fact_daily only, intersected with sources.supports_subyear so
yearly-only catalogs never appear in sub-year reads. That keeps semantics aligned
with upstream cadence without silent cross-grain mixing.
Older migration-era prose sometimes described a UNION ALL over both fact
tables for a single RPC body; the live contract is branching + supports_subyear
(see docs/fire-data-schema/postgres-contract-rls-and-keys.md, docs/fire-data-schema/schema-history/incremental-patches-and-rpc-changelog.md).
fire_type_code, building_type_code, building_age_code, region_code are
nullable INTEGER FK columns → fire_data.filters.id. NULL means
“this source doesn't carry that axis on this row”. Cross-source filters pass
standard or native strings into get_fire_data; the RPC resolves them to
ids (EXISTS predicates). Rows with NULL ids drop out automatically — graceful
degradation by construction — no OR logic or per-source branches.
Contract detail: fact_yearly / fact_daily each use a BEFORE INSERT OR UPDATE
trigger (fact_yearly_enforce_filter_axes, fact_daily_enforce_filter_axes)
calling fire_data.enforce_fact_filter_code_axes(). Optional
filters.applicable_dim_codes restricts which dim_code values may carry a
given filters.id. Full detail: docs/fire-data-schema/postgres-contract-rls-and-keys.md.
Ingest writes atomic rows (one FK per axis per row, or NULL).
get_fire_data may still pass several tokens on the same axis; the RPC
resolves them to multiple ids combined with OR. See
docs/ingestion/pipeline-slices-idempotency-and-triggers.md §5.3–5.4 and docs/fire-data-schema/schema-history/incremental-patches-and-rpc-changelog.md
(RPC note).
BRASK's "Ildsted" and POLICE's "Ildsted" are different rows in categories.
They share standard_code = 'FIREPLACE'. This means:
categories.native_name, optional
name) per source.standard_code.The cost is roughly 2× more rows in categories. At a few hundred rows total,
this is invisible.
id + unique-index natural keyIncident fact tables use NULLS NOT DISTINCT on the natural-key unique index
(all grain keys plus the four nullable filter FK columns) so NULL slices collide
correctly. ON CONFLICT (...) lists the same columns as that index.
See docs/fire-data-schema/postgres-contract-rls-and-keys.md for the exact index and upsert shape.
sources.source_id and dimensions.dim_code are INTEGER PKs. Short
machine labels (sources.name) and titles (long_name) live on sources;
axis titles on dimensions.name.
EU-aligned analytical axes use fixed dim_code ids from docs/ingestion/adapter-mappings/category-native-to-cat-code.md:
dim_code | Typical dimensions.name |
|---|---|
1 | Årsak |
2 | Tennkilde |
3 | Objekt |
4 | Arnested |
Further axes (Bygningsalder, Bygningstype, …) each receive their own integer PK row in dimensions at seed time — resolve labels via get_dimension_catalog or direct SELECT from fire_data.dimensions, not string tokens.
Filter facts store filters.id; filters.native_code preserves
upstream vocabulary and standard_code carries cross-source buckets.
get_fire_data still accepts standard/native strings for filters — it
resolves them to ids server-side.
cat_code stays INT per BRASK; (source_id, dim_code, cat_code) keeps
categories source-scoped.
Exception: fact_buildings_yearly keeps TEXT building_type_code
(SSB building-stock taxonomy), not the incident filters spine.
value and value_kr as separate columnsAdding a metric (insurance kr, injuries, fatalities) is a nullable column, not a refactor. Sources that don't track the metric leave it NULL and are silently absent from queries that target it. This pattern scales to ~5 metrics; beyond that, refactor to a metric-pivot table. YAGNI until then.
extraction_state tracks slices, not rowsThe natural-key UPSERT on fact tables handles per-row dedup natively. The
extraction_state table answers a different question: should the scheduler
refresh this slice today? Diagnostic columns (last_row_count,
last_total_value) make it cheap to detect a slice that suddenly returned
zero rows or whose total dropped — usually an upstream API hiccup, sometimes
a real correction.
This is the test that decides whether the schema works. A new dimension — say, "Erstatningsutgifter per skadetype" — must require zero schema changes to deploy. Specifically, adding it should require only:
fire_data.dimensions (new integer PK row — assign next code consistent with seeding policy).fire_data.categories for the values BRASK reports.If a new dimension forces you to alter fact_daily, the schema has failed
its design goal. (Adding a new metric — like value_kr for kr — does
require a column. Metrics ≠ dimensions.)
Same shape. A new source — say, an EU dataset — needs:
fire_data.sources.fire_data.categories for that source's vocabulary.lib/data-model/adapters/eu.ts.The chart RPC, the catalog RPC, every UI component — all untouched.
get_fire_data, get_dimension_catalog, etc.) — separate migration.
Schema is the noun, RPCs are the verbs. Build nouns first.supabase gen types typescript after
this migration applies. Do not hand-write.lib/data-model/adapters/. They depend
on the generated types. Build last.public.fire_data and writes to fire_data.fact_daily. Run once,
archive, never run again.region_code on fact_* is an FK to
filters.id (axis='region'). Encode ISO / Cresta / native strings
as filters.native_code / standard_code before heavy regional
ingestion.value_kr is NUMERIC (arbitrary). BRASK
reports whole NOK amounts; decimals are unused. Could narrow to
NUMERIC(15, 0) for storage savings. Negligible at current scale; defer.filters rows; whether a dedicated severity_code axis is
also needed remains open. Defer until the UI surfaces a need.