HeatWaves v2 schema — design notes

Source of truth role: high-level design rationale for v2 schema decisions. Do not treat this file as migration/contract authority; use docs/fire-data-schema/postgres-contract-rls-and-keys.md (and generated types) for executable shapes. v2-001-initial-migration-narrative.md / v2-002-filter-codes-migration-narrative.md remain migration-history narratives that may lag the linked database. Conflict-resolution note: for runtime filter-axis/RPC semantics and physical column types, docs/fire-data-schema/postgres-contract-rls-and-keys.md + generated types win when narrative docs disagree with earlier migrations.

Companion to supabase/migrations/<ts>_v2_initial_schema.sql. This file explains why the schema looks the way it does, so future contributors do not have to reverse-engineer the rationale from SQL alone.

At a glance

fire_data schema (analytical, exposed)        ingest schema (operational, hidden)
─────────────────────────────────             ────────────────────────────────────
sources         ← source registry             extraction_state ← scheduling/refresh
dimensions      ← analytical axes             raw_payloads     ← optional audit log
categories      ← source-scoped values
fact_daily      ← DATE-grain incident facts (sources that support sub-year reads; see sources.supports_subyear)
fact_yearly     ← YEAR-grain aggregates ingested at year cadence (SSB, BRIS yearly-only paths, etc.)
fact_buildings_yearly ← SSB building stock (non-fire, for normalisation)

Why these specific decisions

1. Two schemas, not one

fire_data holds everything a user query touches. ingest holds operational state. Supabase doesn't auto-expose custom schemas via PostgREST, so this yields the security boundary for free: fire_data is added to the dashboard's "Exposed schemas" list; ingest is not. Service-role keys bypass RLS to write into either.

2. Fact tables split by grain

fact_daily and fact_yearly are separate tables. A query for "monthly chart of BRASK + POLICE for 2023" hits only fact_daily (via get_fire_data with sub-year p_granularity). get_fire_data does not merge both tables in one aggregation: it branches on p_granularity — year reads fact_yearly only; day · week · month · quarter read fact_daily only, intersected with sources.supports_subyear so yearly-only catalogs never appear in sub-year reads. That keeps semantics aligned with upstream cadence without silent cross-grain mixing.

Older migration-era prose sometimes described a UNION ALL over both fact tables for a single RPC body; the live contract is branching + supports_subyear (see docs/fire-data-schema/postgres-contract-rls-and-keys.md, docs/fire-data-schema/schema-history/incremental-patches-and-rpc-changelog.md).

3. Filter axes as columns, not dimensions

fire_type_code, building_type_code, building_age_code, region_code are nullable INTEGER FK columns → fire_data.filters.id. NULL means “this source doesn't carry that axis on this row”. Cross-source filters pass standard or native strings into get_fire_data; the RPC resolves them to ids (EXISTS predicates). Rows with NULL ids drop out automatically — graceful degradation by construction — no OR logic or per-source branches.

Contract detail: fact_yearly / fact_daily each use a BEFORE INSERT OR UPDATE trigger (fact_yearly_enforce_filter_axes, fact_daily_enforce_filter_axes) calling fire_data.enforce_fact_filter_code_axes(). Optional filters.applicable_dim_codes restricts which dim_code values may carry a given filters.id. Full detail: docs/fire-data-schema/postgres-contract-rls-and-keys.md.

Ingest writes atomic rows (one FK per axis per row, or NULL). get_fire_data may still pass several tokens on the same axis; the RPC resolves them to multiple ids combined with OR. See docs/ingestion/pipeline-slices-idempotency-and-triggers.md §5.3–5.4 and docs/fire-data-schema/schema-history/incremental-patches-and-rpc-changelog.md (RPC note).

4. Source-scoped categories

BRASK's "Ildsted" and POLICE's "Ildsted" are different rows in categories. They share standard_code = 'FIREPLACE'. This means:

Utforsk shows source-native wording (categories.native_name, optional name) per source.
Dashbord can compare across sources via standard_code.
A source revising a category never breaks another source's data.

The cost is roughly 2× more rows in categories. At a few hundred rows total, this is invisible.

5. Surrogate `id` + unique-index natural key

Incident fact tables use NULLS NOT DISTINCT on the natural-key unique index (all grain keys plus the four nullable filter FK columns) so NULL slices collide correctly. ON CONFLICT (...) lists the same columns as that index.

See docs/fire-data-schema/postgres-contract-rls-and-keys.md for the exact index and upsert shape.

6. Numeric spine; labels from registry tables

sources.source_id and dimensions.dim_code are INTEGER PKs. Short machine labels (sources.name) and titles (long_name) live on sources; axis titles on dimensions.name.

EU-aligned analytical axes use fixed dim_code ids from docs/ingestion/adapter-mappings/category-native-to-cat-code.md:

`dim_code`	Typical `dimensions.name`
`1`	Årsak
`2`	Tennkilde
`3`	Objekt
`4`	Arnested

Further axes (Bygningsalder, Bygningstype, …) each receive their own integer PK row in dimensions at seed time — resolve labels via get_dimension_catalog or direct SELECT from fire_data.dimensions, not string tokens.

Filter facts store filters.id; filters.native_code preserves upstream vocabulary and standard_code carries cross-source buckets. get_fire_data still accepts standard/native strings for filters — it resolves them to ids server-side.

cat_code stays INT per BRASK; (source_id, dim_code, cat_code) keeps categories source-scoped.

Exception: fact_buildings_yearly keeps TEXT building_type_code (SSB building-stock taxonomy), not the incident filters spine.

7. `value` and `value_kr` as separate columns

Adding a metric (insurance kr, injuries, fatalities) is a nullable column, not a refactor. Sources that don't track the metric leave it NULL and are silently absent from queries that target it. This pattern scales to ~5 metrics; beyond that, refactor to a metric-pivot table. YAGNI until then.

8. `extraction_state` tracks slices, not rows

The natural-key UPSERT on fact tables handles per-row dedup natively. The extraction_state table answers a different question: should the scheduler refresh this slice today? Diagnostic columns (last_row_count, last_total_value) make it cheap to detect a slice that suddenly returned zero rows or whose total dropped — usually an upstream API hiccup, sometimes a real correction.

Adding a new dimension (the future-proof test)

This is the test that decides whether the schema works. A new dimension — say, "Erstatningsutgifter per skadetype" — must require zero schema changes to deploy. Specifically, adding it should require only:

One INSERT into fire_data.dimensions (new integer PK row — assign next code consistent with seeding policy).
N INSERTs into fire_data.categories for the values BRASK reports.
A new method on the BRASK adapter that ingests rows under the new dim_code.
A redeploy of the adapter; the catalog RPC and the chart RPC are unchanged.

If a new dimension forces you to alter fact_daily, the schema has failed its design goal. (Adding a new metric — like value_kr for kr — does require a column. Metrics ≠ dimensions.)

Adding a new source

Same shape. A new source — say, an EU dataset — needs:

One INSERT into fire_data.sources.
N INSERTs into fire_data.categories for that source's vocabulary.
A new adapter file at lib/data-model/adapters/eu.ts.
Wiring it into the ingest scheduler.

The chart RPC, the catalog RPC, every UI component — all untouched.

What's NOT in this schema (and where it goes)

RPCs (get_fire_data, get_dimension_catalog, etc.) — separate migration. Schema is the noun, RPCs are the verbs. Build nouns first.
TypeScript types — generated by supabase gen types typescript after this migration applies. Do not hand-write.
Adapters — TypeScript modules in lib/data-model/adapters/. They depend on the generated types. Build last.
Historical data migration — separate one-shot script that reads from legacy public.fire_data and writes to fire_data.fact_daily. Run once, archive, never run again.

Known open questions

Region taxonomy: incident region_code on fact_* is an FK to filters.id (axis='region'). Encode ISO / Cresta / native strings as filters.native_code / standard_code before heavy regional ingestion.
Erstatningsutgifter precision: value_kr is NUMERIC (arbitrary). BRASK reports whole NOK amounts; decimals are unused. Could narrow to NUMERIC(15, 0) for storage savings. Negligible at current scale; defer.
fact_yearly partitioning: not partitioned today. Re-evaluate at ~1M rows. Probably never matters.
Hard-fire vs near-miss distinction in BRIS: the taxonomy doc mentions "Brann i bygning" vs "Branntilløp" partly encodes this distinction. Fire type is tagged via filters rows; whether a dedicated severity_code axis is also needed remains open. Defer until the UI surfaces a need.

Loading documentation...

HeatWaves v2 schema — design notes

At a glance

fire_data schema (analytical, exposed)        ingest schema (operational, hidden)
─────────────────────────────────             ────────────────────────────────────
sources         ← source registry             extraction_state ← scheduling/refresh
dimensions      ← analytical axes             raw_payloads     ← optional audit log
categories      ← source-scoped values
fact_daily      ← DATE-grain incident facts (sources that support sub-year reads; see sources.supports_subyear)
fact_yearly     ← YEAR-grain aggregates ingested at year cadence (SSB, BRIS yearly-only paths, etc.)
fact_buildings_yearly ← SSB building stock (non-fire, for normalisation)

Why these specific decisions

1. Two schemas, not one

2. Fact tables split by grain

3. Filter axes as columns, not dimensions

4. Source-scoped categories

BRASK's "Ildsted" and POLICE's "Ildsted" are different rows in categories. They share standard_code = 'FIREPLACE'. This means:

Utforsk shows source-native wording (categories.native_name, optional name) per source.
Dashbord can compare across sources via standard_code.
A source revising a category never breaks another source's data.

The cost is roughly 2× more rows in categories. At a few hundred rows total, this is invisible.

5. Surrogate `id` + unique-index natural key

See docs/fire-data-schema/postgres-contract-rls-and-keys.md for the exact index and upsert shape.

6. Numeric spine; labels from registry tables

sources.source_id and dimensions.dim_code are INTEGER PKs. Short machine labels (sources.name) and titles (long_name) live on sources; axis titles on dimensions.name.

EU-aligned analytical axes use fixed dim_code ids from docs/ingestion/adapter-mappings/category-native-to-cat-code.md:

`dim_code`	Typical `dimensions.name`
`1`	Årsak
`2`	Tennkilde
`3`	Objekt
`4`	Arnested

cat_code stays INT per BRASK; (source_id, dim_code, cat_code) keeps categories source-scoped.

Exception: fact_buildings_yearly keeps TEXT building_type_code (SSB building-stock taxonomy), not the incident filters spine.

7. `value` and `value_kr` as separate columns

8. `extraction_state` tracks slices, not rows

Adding a new dimension (the future-proof test)

One INSERT into fire_data.dimensions (new integer PK row — assign next code consistent with seeding policy).
N INSERTs into fire_data.categories for the values BRASK reports.
A new method on the BRASK adapter that ingests rows under the new dim_code.
A redeploy of the adapter; the catalog RPC and the chart RPC are unchanged.

If a new dimension forces you to alter fact_daily, the schema has failed its design goal. (Adding a new metric — like value_kr for kr — does require a column. Metrics ≠ dimensions.)

Adding a new source

Same shape. A new source — say, an EU dataset — needs:

One INSERT into fire_data.sources.
N INSERTs into fire_data.categories for that source's vocabulary.
A new adapter file at lib/data-model/adapters/eu.ts.
Wiring it into the ingest scheduler.

The chart RPC, the catalog RPC, every UI component — all untouched.

What's NOT in this schema (and where it goes)

RPCs (get_fire_data, get_dimension_catalog, etc.) — separate migration. Schema is the noun, RPCs are the verbs. Build nouns first.
TypeScript types — generated by supabase gen types typescript after this migration applies. Do not hand-write.
Adapters — TypeScript modules in lib/data-model/adapters/. They depend on the generated types. Build last.
Historical data migration — separate one-shot script that reads from legacy public.fire_data and writes to fire_data.fact_daily. Run once, archive, never run again.

Known open questions

Region taxonomy: incident region_code on fact_* is an FK to filters.id (axis='region'). Encode ISO / Cresta / native strings as filters.native_code / standard_code before heavy regional ingestion.
Erstatningsutgifter precision: value_kr is NUMERIC (arbitrary). BRASK reports whole NOK amounts; decimals are unused. Could narrow to NUMERIC(15, 0) for storage savings. Negligible at current scale; defer.
fact_yearly partitioning: not partitioned today. Re-evaluate at ~1M rows. Probably never matters.
Hard-fire vs near-miss distinction in BRIS: the taxonomy doc mentions "Brann i bygning" vs "Branntilløp" partly encodes this distinction. Fire type is tagged via filters rows; whether a dedicated severity_code axis is also needed remains open. Defer until the UI surfaces a need.

HeatWaves v2 schema — design notes

At a glance

Why these specific decisions

1. Two schemas, not one

2. Fact tables split by grain

3. Filter axes as columns, not dimensions

4. Source-scoped categories

5. Surrogate id + unique-index natural key

6. Numeric spine; labels from registry tables

7. value and value_kr as separate columns

8. extraction_state tracks slices, not rows

Adding a new dimension (the future-proof test)

Adding a new source

What's NOT in this schema (and where it goes)

Known open questions

HeatWaves v2 schema — design notes

At a glance

Why these specific decisions

1. Two schemas, not one

2. Fact tables split by grain

3. Filter axes as columns, not dimensions

4. Source-scoped categories

5. Surrogate id + unique-index natural key

6. Numeric spine; labels from registry tables

7. value and value_kr as separate columns

8. extraction_state tracks slices, not rows

Adding a new dimension (the future-proof test)

Adding a new source

What's NOT in this schema (and where it goes)

Known open questions

5. Surrogate `id` + unique-index natural key

7. `value` and `value_kr` as separate columns

8. `extraction_state` tracks slices, not rows

5. Surrogate `id` + unique-index natural key

7. `value` and `value_kr` as separate columns

8. `extraction_state` tracks slices, not rows