Loading documentation...
Architectural reference for the v2 fire_data schema and lib/data-model — dimension vs filter vs metric, fact grains, and cross-source storage patterns.
This document explains how HeatWaves stores fire statistics in the v2 fire_data schema and how lib/data-model maps to it. Pair with ../semantics-and-sources/taxonomy-and-eu-firestat-alignment.md for upstream source semantics; pair with postgres-contract-rls-and-keys.md for executable DDL and RPC contracts.
This is the central classification rule. It decides where any new piece of data goes. The rule is about UI intent, not data shape — the same property could be a dimension or a filter depending on how the user will interact with it.
| Bucket | UI role | Stored as | Examples |
|---|---|---|---|
| Dimension | "Show bars/series per value of this" | dim_code + cat_code on fact rows | Årsak dim_code=1, Tennkilde 2, Objekt 3, Arnested 4, … — fixed ids for the EU-aligned quartet in docs/ingestion/adapter-mappings/category-native-to-cat-code.md; further axes get integer PKs in dimensions |
| Filter | "Apply uniformly across all bars" | nullable column on fact rows | fire_type_code, building_type_code, region_code |
| Metric | "Sum/average/count this number" | nullable NUMERIC column | value (count), value_kr (NOK) |
"Does the UI show bars per value of this thing?"
dim_code + cat_code.The schema enforces no answer — you could store BYGNINGSTYPE as either a dimension or a filter and Postgres wouldn't care. The decision is about how you want to query it:
get_fire_data resolves RESIDENTIAL / native tokens to
filters.id values (WHERE EXISTS …), not raw strings on facts.
Both are useful; pick based on the chart you want to draw.BYGNINGSTYPE is the canonical example. It coexists in two forms:
dim_code
row in dimensions, e.g. illustrative 6 — confirm seed) — for the granular
"fires by Enebolig vs Tomannsbolig vs Rekkehus" breakdown.building_type_code (3 values: RESIDENTIAL, COMMERCIAL,
OTHER) — for cross-source filtering when the user just wants "dwelling
fires across BRASK and BRIS".
This is intentional. Don't unify them — the granular dimension serves
single-source drill-down; the coarse filter serves cross-source comparison.A property that exists in multiple sources usually has different native values per source. HeatWaves stores both the native value (truth-preserving) and a standardized code (cross-source-comparable). This applies to dimensions AND filters.
The categories table has (source_id, dim_code, cat_code) as its primary
key. BRASK's "Ildsted" and POLICE's "Ildsted" are separate rows:
source_id dim_code cat_code native_name standard_code
───────── ──────── ──────── ──────────────────── ─────────────
1 2 7 Ildsted (…) FIREPLACE
2 2 14 Ildsted FIREPLACE
4 2 3 Tennkilde peis FIREPLACE
dim_code 2 = Tennkilde (heat source / Kilde), per docs/ingestion/adapter-mappings/category-native-to-cat-code.md. cat_code values here are illustrative per-source ids.
(Illustrative source_id / cat_code — confirm source_id from sources;
dimension numbering 1–4 for the EU quartet matches category-native-to-cat-code.md;
additional axes 5+ come from fire_data.dimensions seed — see postgres-contract-rls-and-keys.md.)
native_name preserves the source's native vocabulary (verbatim); optional
cleaned name when needed.standard_code is the heatwaves-unified equivalence key.standard_code.(source_id, dim_code, cat_code) directly
with no join.
Why source-scoped: if BRASK revises a category definition, only BRASK's
row changes. POLICE and DSB are unaffected. Sharing a cat_code across
sources would couple them in ways that break this isolation.Same principle, applied to filter axes. BRIS exposes mission-type selection
(Oppdragstype, request field revisedMissionTypes.ids). In HeatWaves that
vocabulary is not a separate dimension — it is the fire_type filter
axis: one row per native id in filters with axis = 'fire_type'
(mission labels such as Branntilløp komfyr, Brann i bygning, Brann i skorstein),
cross-walk to CONFINED / DEVELOPED on filters.standard_code. Canonical
native↔standard table: docs/ingestion/adapter-mappings/filter-axes-native-to-filter-id.md §1.
Facts store fire_type_code → filters.id (INTEGER FK). That is how
ingest tags each BRIS slice without inventing a separate Oppdragstype dimension row.
This means a single BRIS fact row is filtered and labeled two ways without contradiction:
get_fire_data using standard_code strings (CONFINED /
DEVELOPED) or native ids — the RPC resolves to filters.id — or join
filters on fact.fire_type_code = filters.id and filter by standard_code.filters row for display_name /
native_code — still distinguishes chimney vs komfyr vs full building fire inside
Politi data because those remain distinct filters natives under fire_type.The query layer chooses standard_code vs filters native columns based on the user's question.
Ingestion stores native vocabulary in categories / filters.
Fact filter columns store filters.id. Cross-source comparison uses
standard_code (via RPC resolution or joins).
Whenever the situation is "N sources with different vocabularies", apply this pattern. Don't pick one source's vocabulary as canonical; don't invent a new vocabulary that ignores native values; don't normalize at ingest time. Preserve native; map to standard for comparison.
fire_data — analytical surface, exposed via PostgREST. Tables here are
read by the application via RPCs and direct PostgREST queries. RLS-enabled
with anonymous read policies.
ingest — operational state, hidden. Tracks ingestion bookkeeping
(extraction timestamps, raw API payloads). Service-role-only, never exposed
to clients. RLS-enabled with no policies (default-deny).
The split is a security boundary as much as a logical one. Adding fire_data
to PostgREST's exposed-schemas list is the gate; ingest simply isn't on
the list.
sources ← 5 rows: BRASK, POLICE, BRIGADE, DSB, SSB
dimensions ← analytic axes (`dim_code` INTEGER PK, `name` human label).
EU-aligned quartet (fixed ids): 1 Årsak, 2 Tennkilde,
3 Objekt, 4 Arnested — **`category-native-to-cat-code.md`**.
Typical extras (authoritative PKs — see
[`category-native-to-cat-code.md`](../ingestion/adapter-mappings/category-native-to-cat-code.md)):
**5** Bygningstype, **6** Bygningsalder → **6 rows** (BRIS
Oppdragstype is **`fire_type`** / **`filters`**, not a dimension — **`filter-axes-native-to-filter-id.md`**).
categories ← source-scoped (source_id, dim_code, cat_code)
fact_daily ← DATE-grain facts (sub-year `get_fire_data` reads; sources with supports_subyear=true)
fact_yearly ← YEAR-grain facts from ingest (SSB, BRIGADE, DSB yearly endpoints; yearly slices elsewhere)
fact_buildings_yearly ← SSB building-stock totals (non-fire)
extraction_state ← per-slice "when did we last fetch this"
raw_payloads ← optional API response audit log
fact_daily has a date DATE NOT NULL column. fact_yearly has a
year SMALLINT NOT NULL column. get_fire_data chooses one fact table per call:
p_granularity = 'year' reads fact_yearly only (rows produced only by ingest at yearly grain; never summarized from fact_daily in SQL).p_granularity ∈ day · week · month · quarter reads fact_daily only, restricted to sources where sources.supports_subyear is true (taxonomy yearly-only endpoints such as SSB, BRIGADE, DSB do not participate).Mixing grains in one physical table forces every query to remember upstream cadence. Keeping two tables plus explicit branching keeps semantics aligned with taxonomy-and-eu-firestat-alignment.md.
Cross-source yearly dashboards still combine sources at year grain via fact_yearly. Sub-year charts omit yearly-only sources until they have compatible daily facts.
Ingest contract: fact_yearly rows come only from ingestion (upstream yearly aggregates or API yearly buckets). fact_daily rows come only from ingestion at day grain. get_fire_data never derives yearly totals by scanning fact_daily. Sources that expose both yearly summaries and daily incidents write whichever slices belong in each table; adapters decide — no Postgres rollup bridges daily→yearly.
fire_type_code, building_type_code, building_age_code, region_code are nullable columns
on the fact tables. NULL means "this source doesn't carry this filter info".
When the user activates a cross-source fire-type filter, get_fire_data
resolves CONFINED / native tokens (including BRIS revisedMissionTypes ids when passed as natives) to filters.id values and applies
WHERE EXISTS (… filter_id = fu.fire_type_code). Rows with fire_type_code IS NULL drop out automatically — no per-source OR logic in the RPC. This is
graceful degradation by construction — the schema's most important property
after the three-bucket rule.
This is why these axes are not dimensions. A dimension forces a value; a filter axis allows NULL, and NULL means "not applicable to this row".
Filter-axis domains are not duplicated as CHECK (... IN (...)) on facts.
Instead:
filters (native_code,
standard_code, display_name).filters.id; triggers fact_yearly_enforce_filter_axes /
fact_daily_enforce_filter_axes invoke fire_data.enforce_fact_filter_code_axes()
so each non-null id matches filters.axis and filters.applicable_dim_codes
(when set) for fact.dim_code. Details: postgres-contract-rls-and-keys.md.When adding a new native value, INSERT into filters (and migrate
existing facts if needed). lib/data-model/types.ts FireTypeCode /
BuildingTypeCode describe standard_code vocabulary for UI and RPC string
filters — they mirror taxonomy intent, not Postgres CHECK clauses on facts.
Full checklist: section Adding a new filter axis below.
sources.source_id and dimensions.dim_code are INTEGER primary keys.
Stable short labels (sources.name ≈ BRASK) and human titles (long_name) live
on sources; dimension titles on dimensions.name.
Canonical dim_code ↔ Norwegian axis names for the EU-aligned breakdown dimensions:
dim_code | Typical dimensions.name |
|---|---|
1 | Årsak |
2 | Tennkilde |
3 | Objekt |
4 | Arnested |
Source: docs/ingestion/adapter-mappings/category-native-to-cat-code.md. Other seeded dimensions
(Bygningsalder, Bygningstype, …) each get their own integer dim_code;
resolve labels via dimensions / get_dimension_catalog, not string tokens like cause on facts.
BRIS mission types (Oppdragstype) live under filters (axis = 'fire_type'), not dimensions — filter-axes-native-to-filter-id.md.
Filter axes on facts store filters.id (INTEGER FK). Cross-source
and URL-facing tokens such as CONFINED / RESIDENTIAL are
filters.standard_code (TEXT UPPER_SNAKE_CASE) and/or native_code;
get_fire_data accepts those strings and resolves them to ids internally.
cat_code stays INT (BRASK-native numeric ids). The compound key
(source_id, dim_code, cat_code) prevents collisions across sources even though
the integer space is shared.
Why ints for the spine: FK integrity, smaller indexes, stable joins from facts
to registry rows. Labels are always one join away (sources, dimensions,
filters) — which is what charts should use for legends anyway.
Each fact table has both:
id BIGSERIAL PRIMARY KEY -- for FK references and admin operations CREATE UNIQUE INDEX fact_daily_natural_key ON fire_data.fact_daily ( source_id, dim_code, cat_code, date, fire_type_code, building_type_code, region_code, building_age_code ) NULLS NOT DISTINCT;
NULLS NOT DISTINCT makes all-null filter slices collide the way equality
would for plain values. Two rows with the same grain keys but different filter
ids remain distinct. Exact definitions: docs/fire-data-schema/postgres-contract-rls-and-keys.md.
The natural-key index is the upsert target:
INSERT INTO fire_data.fact_daily (...) VALUES (...) ON CONFLICT ( source_id, dim_code, cat_code, date, fire_type_code, building_type_code, region_code, building_age_code ) DO UPDATE SET value = EXCLUDED.value, value_kr = EXCLUDED.value_kr, ingested_at = now();
This means re-running ingestion is safe. If the upstream API returns a revised count for an existing slice, the row updates. If the slice is new, it inserts. No diff logic needed.
Application code never sends SQL strings to Postgres. All analytical queries
go through parameterized RPCs in the fire_data schema.
CREATE OR REPLACE FUNCTION fire_data.<name>( p_<param1> <type>, ... ) RETURNS TABLE(...) -- typed return LANGUAGE sql -- or plpgsql when needed STABLE -- so Postgres can inline / parallelize PARALLEL SAFE -- when the query allows SET search_path = '' -- prevents search_path attacks; fully qualify all -- table references inside the function AS $$ -- fully-qualified table names: fire_data.fact_daily, not fact_daily $$; GRANT EXECUTE ON FUNCTION fire_data.<name> TO anon, authenticated;
SET search_path = ''Without it, an attacker who could create a temp table/function in the
session could shadow fact_daily and run arbitrary code. With it, every
table reference must be fully qualified, so there's no shadow risk.
Supabase's security advisor will warn if you forget this.
The get_fire_data RPC treats NULL on a filter-axis parameter as: only
facts where that axis FK is NULL (upstream had no filter on that axis).
Passing standard_code tokens (e.g. RESIDENTIAL) resolves to all
matching filters.id rows for the requested source(s).
-- Live pattern (per axis; breakdown-aware variant in postgres-contract) AND (p_fire_type IS NULL AND f.fire_type_code IS NULL OR p_fire_type IS NOT NULL AND f.fire_type_code IN (SELECT filter_id FROM resolved_fire_type))
When the axis is in p_breakdown_axes the NULL-param branch instead keeps leaf
rows (fire_type_code IS NOT NULL) and groups them — see
postgres-contract-rls-and-keys.md.
p_sources / p_dim_codes still use the older “NULL = no restriction”
pattern — only the four filter axes use the IS NULL semantics above.
The application accesses the schema through a single module: lib/data-model/.
Application code imports only from @/lib/data-model:
// ✓ Right import { getCatalog, getFireData, type FactDaily } from '@/lib/data-model' // ✗ Wrong — bypasses the public surface import { getFireData } from '@/lib/data-model/queries' import { createFireDataClient } from '@/lib/data-model/client' import type { FactDaily } from '@/lib/data-model/types'
The barrel index.ts re-exports query helpers and types from internal files.
This lets internals be refactored without touching call sites.
createFireDataClient is intentionally not re-exported from the barrel.
Application code asks "give me chart data" (high-level), not "give me a
database connection" (low-level). Centralizing client construction means
future cross-cutting concerns (telemetry, retry, auth swap) live in one place.
The single legitimate exception is adapters inside
lib/data-model/adapters/. They live inside the module's boundary and
need direct Supabase access for service-role writes. They import the client
directly from ./client. The "external code uses the barrel" rule doesn't
apply to them because they're not external.
lib/data-model/
├── index.ts ← Public surface — re-exports queries.ts and types.ts
├── types.ts ← Type aliases (Source, Dimension, Category, FactDaily, ...)
├── client.ts ← INTERNAL: createFireDataClient (server + browser)
├── queries.ts ← INTERNAL: getFireData, getCatalog, lookups
└── adapters/ ← Adapters — import client directly (legitimate exception)
The canonical types entrypoint is types/database.types.ts. It re-exports
the fire_data types under an alias so migration day is a one-line change:
// types/database.types.ts (canonical entrypoint) // Today: legacy public schema is THE Database export type { Database } from './public.database.types' // And fire_data is aliased export type { Database as FireDataDatabase } from './fire_data.database.types' // Migration day: swap which one is THE Database. lib/data-model untouched.
lib/data-model/* imports FireDataDatabase from this file. Don't import
from fire_data.database.types directly.
This is the schema's design test. Adding a new dimension must require zero schema changes. Specifically:
fire_data.dimensions (one row).fire_data.categories for the values the source reports.fact_daily, the schema has failed
its design goal. Investigate — usually the property you're trying to add is
actually a metric or a filter axis, not a dimension.Same shape:
fire_data.sources.fire_data.categories for that source's vocabulary.lib/data-model/adapters/<source>.ts.A metric (insurance kr, injuries, fatalities, response time) is a nullable
NUMERIC column on fact_daily and fact_yearly. Sources that don't track
it leave it NULL.
get_fire_data's p_metric parameter to dispatch on the new
metric name.Metric literal type.
This pattern scales to ~5 metrics. Beyond that, refactor to a metric-pivot
table. YAGNI until then.filters rows for the new axis (per source / native / standard).ALTER TABLE nullable INTEGER FK on fact_daily and fact_yearly
enforce_fact_filter_code_axes.get_fire_data resolved CTE + parameter + breakdown projection.fact_*_natural_key to include the new column (NULLS NOT DISTINCT).lib/data-model/types.ts literals if new
standard_code values are part of the public UI contract.Use these to resolve future design questions without long debates:
Is X a dimension, filter, or metric? Apply the three-bucket framework. "Does the UI show bars per value?" → dimension. "Apply uniformly?" → filter. "Number to aggregate?" → metric.
Where does X live in the schema?
categories rows (native_name, standard_code) keyed by (source_id, dim_code, cat_code).type, BRIS Oppdragstype / revisedMissionTypes, næring, age bracket, …) →
filters with the right axis (fire_type, building_type, building_age, region) plus
standard_code where cross-source comparison applies — facts hold INTEGER NULL FKs
→ filters.id. Fire-type mapping for BRIS: filter-axes-native-to-filter-id.md §1.fire_data or ingest?fire_data.ingest.
Source-scoped or shared? Always source-scoped for categories. Cross-source
equivalence via standard_code. Never share cat_code between sources.Does this change break Utforsk's source-fidelity? If yes, redesign.
Utforsk shows native names without translation. Cross-source pages go
through standard_code.
Add a column or a table? A new metric is a column. A new entity
(fundamentally different ontological thing — e.g. building stock totals vs.
fire incidents) is a table. The fact_buildings_yearly split was deliberately
separate from fact_yearly for this reason.
Applying schema changes: use versioned files under supabase/migrations/ and the Supabase CLI (npm run db:migration:new, db:migration:list). Regenerate types with npm run types:gen:fire-data after DDL changes. The Supabase Dashboard SQL editor is for one-off exploration only — not the source of truth for production schema.
For exact CHECK definitions, FK declarations, natural-key indexes (NULLS NOT DISTINCT), upsert SQL, RLS policies, and the RPC template, read
postgres-contract-rls-and-keys.md beside this file. Use it when you need
strings to copy.
Short summary: source_id and dim_code are INTEGER FKs. Fact filter columns
(fire_type_code, …) store filters.id only — never native_code or
standard_code. Those strings live on filters / categories and enter
read paths via get_fire_data text-array parameters.
These are anti-patterns the v2 schema was built to eliminate. Don't reintroduce:
execute_sql RPC. Every query goes through a parameterized RPC.
Raw SQL strings from the application is the original sin this schema
exists to eliminate.filters.id;
do not write raw standard strings into fact filter columns.cat_code across sources. Source-scoped is mandatory.dim_code rows in categories. Every category belongs to
exactly one dimension. The CHECK constraint enforces this.WHERE filter = ANY(...) that drops sources without warning.
The graceful-degradation property uses NULLs, not OR-logic. Sources that
don't carry the filter return NULL rows, which the WHERE excludes
automatically. This is the right behaviour but it should be explicit
in the UI ("DSB excluded — does not carry fire_type info").bygg_type=alle, this would double-count rows that also exist as
bygg_type=boliger. Ingest the granular slices, never the rollup.