HWHeat Waves
    DashboardUtforsk
    Analyse
    Data Kilder
      • Tables & Grain
      • Postgres Contract
    • Design Rationale
    • Doc Map
    DocsSettings
    DashboardAtlasUtforsk
    Analyse
    Data Kilder
    1. Documentation
    2. Fire Data Schema
    3. Conceptual tables and grain

    Loading documentation...

    Conceptual tables and grain

    Architectural reference for the v2 fire_data schema and lib/data-model — dimension vs filter vs metric, fact grains, and cross-source storage patterns.

    Conceptual tables and grain

    This document explains how HeatWaves stores fire statistics in the v2 fire_data schema and how lib/data-model maps to it. Pair with ../semantics-and-sources/taxonomy-and-eu-firestat-alignment.md for upstream source semantics; pair with postgres-contract-rls-and-keys.md for executable DDL and RPC contracts.


    The three-bucket framework — dimension vs filter vs metric

    This is the central classification rule. It decides where any new piece of data goes. The rule is about UI intent, not data shape — the same property could be a dimension or a filter depending on how the user will interact with it.

    BucketUI roleStored asExamples
    Dimension"Show bars/series per value of this"dim_code + cat_code on fact rowsÅrsak dim_code=1, Tennkilde 2, Objekt 3, Arnested 4, … — fixed ids for the EU-aligned quartet in docs/ingestion/adapter-mappings/category-native-to-cat-code.md; further axes get integer PKs in dimensions
    Filter"Apply uniformly across all bars"nullable column on fact rowsfire_type_code, building_type_code, region_code
    Metric"Sum/average/count this number"nullable NUMERIC columnvalue (count), value_kr (NOK)

    The deciding question

    "Does the UI show bars per value of this thing?"

    • Yes → dimension. Stored as dim_code + cat_code.
    • No, but it filters all series uniformly → filter. Nullable column.
    • It's a number to aggregate → metric. Numeric column. That single question resolves 95% of cases.

    Why this rule works

    The schema enforces no answer — you could store BYGNINGSTYPE as either a dimension or a filter and Postgres wouldn't care. The decision is about how you want to query it:

    • A dimension lets you produce N series at once: "fires by Kilde" → one bar per Kilde value.
    • A filter narrows what gets aggregated: "fires (filtered to RESIDENTIAL)" → one number; get_fire_data resolves RESIDENTIAL / native tokens to filters.id values (WHERE EXISTS …), not raw strings on facts. Both are useful; pick based on the chart you want to draw.

    The same property can be both

    BYGNINGSTYPE is the canonical example. It coexists in two forms:

    • Dimension Bygningstype — granular Matrikkel breakdown (INTEGER dim_code row in dimensions, e.g. illustrative 6 — confirm seed) — for the granular "fires by Enebolig vs Tomannsbolig vs Rekkehus" breakdown.
    • Filter column building_type_code (3 values: RESIDENTIAL, COMMERCIAL, OTHER) — for cross-source filtering when the user just wants "dwelling fires across BRASK and BRIS". This is intentional. Don't unify them — the granular dimension serves single-source drill-down; the coarse filter serves cross-source comparison.

    The preserve-native + standardize pattern

    A property that exists in multiple sources usually has different native values per source. HeatWaves stores both the native value (truth-preserving) and a standardized code (cross-source-comparable). This applies to dimensions AND filters.

    For dimensions: source-scoped categories

    The categories table has (source_id, dim_code, cat_code) as its primary key. BRASK's "Ildsted" and POLICE's "Ildsted" are separate rows:

    source_id  dim_code  cat_code  native_name           standard_code
    ─────────  ────────  ────────  ────────────────────  ─────────────
    1          2         7         Ildsted (…)           FIREPLACE
    2          2         14        Ildsted               FIREPLACE
    4          2         3         Tennkilde peis        FIREPLACE
    

    dim_code 2 = Tennkilde (heat source / Kilde), per docs/ingestion/adapter-mappings/category-native-to-cat-code.md. cat_code values here are illustrative per-source ids.

    (Illustrative source_id / cat_code — confirm source_id from sources; dimension numbering 1–4 for the EU quartet matches category-native-to-cat-code.md; additional axes 5+ come from fire_data.dimensions seed — see postgres-contract-rls-and-keys.md.)

    • native_name preserves the source's native vocabulary (verbatim); optional cleaned name when needed.
    • standard_code is the heatwaves-unified equivalence key.
    • Cross-source query joins on standard_code.
    • Native drill-down queries on (source_id, dim_code, cat_code) directly with no join. Why source-scoped: if BRASK revises a category definition, only BRASK's row changes. POLICE and DSB are unaffected. Sharing a cat_code across sources would couple them in ways that break this isolation.

    For filters: standardized values + native values preserved

    Same principle, applied to filter axes. BRIS exposes mission-type selection (Oppdragstype, request field revisedMissionTypes.ids). In HeatWaves that vocabulary is not a separate dimension — it is the fire_type filter axis: one row per native id in filters with axis = 'fire_type' (mission labels such as Branntilløp komfyr, Brann i bygning, Brann i skorstein), cross-walk to CONFINED / DEVELOPED on filters.standard_code. Canonical native↔standard table: docs/ingestion/adapter-mappings/filter-axes-native-to-filter-id.md §1.

    Facts store fire_type_code → filters.id (INTEGER FK). That is how ingest tags each BRIS slice without inventing a separate Oppdragstype dimension row.

    This means a single BRIS fact row is filtered and labeled two ways without contradiction:

    • Cross-source: get_fire_data using standard_code strings (CONFINED / DEVELOPED) or native ids — the RPC resolves to filters.id — or join filters on fact.fire_type_code = filters.id and filter by standard_code.
    • Native drill-down: join the same filters row for display_name / native_code — still distinguishes chimney vs komfyr vs full building fire inside Politi data because those remain distinct filters natives under fire_type.

    The query layer chooses standard_code vs filters native columns based on the user's question.

    The general rule

    Ingestion stores native vocabulary in categories / filters. Fact filter columns store filters.id. Cross-source comparison uses standard_code (via RPC resolution or joins).

    Whenever the situation is "N sources with different vocabularies", apply this pattern. Don't pick one source's vocabulary as canonical; don't invent a new vocabulary that ignores native values; don't normalize at ingest time. Preserve native; map to standard for comparison.


    Schema layout

    Two schemas

    fire_data — analytical surface, exposed via PostgREST. Tables here are read by the application via RPCs and direct PostgREST queries. RLS-enabled with anonymous read policies.

    ingest — operational state, hidden. Tracks ingestion bookkeeping (extraction timestamps, raw API payloads). Service-role-only, never exposed to clients. RLS-enabled with no policies (default-deny).

    The split is a security boundary as much as a logical one. Adding fire_data to PostgREST's exposed-schemas list is the gate; ingest simply isn't on the list.

    Tables (fire_data)

    sources                  ← 5 rows: BRASK, POLICE, BRIGADE, DSB, SSB
    dimensions               ← analytic axes (`dim_code` INTEGER PK, `name` human label).
                                EU-aligned quartet (fixed ids): 1 Årsak, 2 Tennkilde,
                                3 Objekt, 4 Arnested — **`category-native-to-cat-code.md`**.
                                Typical extras (authoritative PKs — see
                                [`category-native-to-cat-code.md`](../ingestion/adapter-mappings/category-native-to-cat-code.md)):
                                **5** Bygningstype, **6** Bygningsalder → **6 rows** (BRIS
                                Oppdragstype is **`fire_type`** / **`filters`**, not a dimension — **`filter-axes-native-to-filter-id.md`**).
    categories               ← source-scoped (source_id, dim_code, cat_code)
    fact_daily               ← DATE-grain facts (sub-year `get_fire_data` reads; sources with supports_subyear=true)
    fact_yearly              ← YEAR-grain facts from ingest (SSB, BRIGADE, DSB yearly endpoints; yearly slices elsewhere)
    fact_buildings_yearly    ← SSB building-stock totals (non-fire)
    

    Tables (ingest)

    extraction_state         ← per-slice "when did we last fetch this"
    raw_payloads             ← optional API response audit log
    

    Why fact tables are split by grain

    fact_daily has a date DATE NOT NULL column. fact_yearly has a year SMALLINT NOT NULL column. get_fire_data chooses one fact table per call:

    • p_granularity = 'year' reads fact_yearly only (rows produced only by ingest at yearly grain; never summarized from fact_daily in SQL).
    • p_granularity ∈ day · week · month · quarter reads fact_daily only, restricted to sources where sources.supports_subyear is true (taxonomy yearly-only endpoints such as SSB, BRIGADE, DSB do not participate).

    Mixing grains in one physical table forces every query to remember upstream cadence. Keeping two tables plus explicit branching keeps semantics aligned with taxonomy-and-eu-firestat-alignment.md.

    Cross-source yearly dashboards still combine sources at year grain via fact_yearly. Sub-year charts omit yearly-only sources until they have compatible daily facts.

    Ingest contract: fact_yearly rows come only from ingestion (upstream yearly aggregates or API yearly buckets). fact_daily rows come only from ingestion at day grain. get_fire_data never derives yearly totals by scanning fact_daily. Sources that expose both yearly summaries and daily incidents write whichever slices belong in each table; adapters decide — no Postgres rollup bridges daily→yearly.


    Filter axes and graceful degradation

    fire_type_code, building_type_code, building_age_code, region_code are nullable columns on the fact tables. NULL means "this source doesn't carry this filter info".

    When the user activates a cross-source fire-type filter, get_fire_data resolves CONFINED / native tokens (including BRIS revisedMissionTypes ids when passed as natives) to filters.id values and applies WHERE EXISTS (… filter_id = fu.fire_type_code). Rows with fire_type_code IS NULL drop out automatically — no per-source OR logic in the RPC. This is graceful degradation by construction — the schema's most important property after the three-bucket rule.

    This is why these axes are not dimensions. A dimension forces a value; a filter axis allows NULL, and NULL means "not applicable to this row".

    Integrity for filter columns

    Filter-axis domains are not duplicated as CHECK (... IN (...)) on facts. Instead:

    • Allowed natives per source live in filters (native_code, standard_code, display_name).
    • Facts reference filters.id; triggers fact_yearly_enforce_filter_axes / fact_daily_enforce_filter_axes invoke fire_data.enforce_fact_filter_code_axes() so each non-null id matches filters.axis and filters.applicable_dim_codes (when set) for fact.dim_code. Details: postgres-contract-rls-and-keys.md.

    When adding a new native value, INSERT into filters (and migrate existing facts if needed). lib/data-model/types.ts FireTypeCode / BuildingTypeCode describe standard_code vocabulary for UI and RPC string filters — they mirror taxonomy intent, not Postgres CHECK clauses on facts. Full checklist: section Adding a new filter axis below.


    Numeric spine + label tables

    sources.source_id and dimensions.dim_code are INTEGER primary keys. Stable short labels (sources.name ≈ BRASK) and human titles (long_name) live on sources; dimension titles on dimensions.name.

    Canonical dim_code ↔ Norwegian axis names for the EU-aligned breakdown dimensions:

    dim_codeTypical dimensions.name
    1Årsak
    2Tennkilde
    3Objekt
    4Arnested

    Source: docs/ingestion/adapter-mappings/category-native-to-cat-code.md. Other seeded dimensions (Bygningsalder, Bygningstype, …) each get their own integer dim_code; resolve labels via dimensions / get_dimension_catalog, not string tokens like cause on facts. BRIS mission types (Oppdragstype) live under filters (axis = 'fire_type'), not dimensions — filter-axes-native-to-filter-id.md.

    Filter axes on facts store filters.id (INTEGER FK). Cross-source and URL-facing tokens such as CONFINED / RESIDENTIAL are filters.standard_code (TEXT UPPER_SNAKE_CASE) and/or native_code; get_fire_data accepts those strings and resolves them to ids internally.

    cat_code stays INT (BRASK-native numeric ids). The compound key (source_id, dim_code, cat_code) prevents collisions across sources even though the integer space is shared.

    Why ints for the spine: FK integrity, smaller indexes, stable joins from facts to registry rows. Labels are always one join away (sources, dimensions, filters) — which is what charts should use for legends anyway.


    Surrogate id + natural-key unique index

    Each fact table has both:

    id BIGSERIAL PRIMARY KEY -- for FK references and admin operations CREATE UNIQUE INDEX fact_daily_natural_key ON fire_data.fact_daily ( source_id, dim_code, cat_code, date, fire_type_code, building_type_code, region_code, building_age_code ) NULLS NOT DISTINCT;

    NULLS NOT DISTINCT makes all-null filter slices collide the way equality would for plain values. Two rows with the same grain keys but different filter ids remain distinct. Exact definitions: docs/fire-data-schema/postgres-contract-rls-and-keys.md.

    Upsert pattern for adapters

    The natural-key index is the upsert target:

    INSERT INTO fire_data.fact_daily (...) VALUES (...) ON CONFLICT ( source_id, dim_code, cat_code, date, fire_type_code, building_type_code, region_code, building_age_code ) DO UPDATE SET value = EXCLUDED.value, value_kr = EXCLUDED.value_kr, ingested_at = now();

    This means re-running ingestion is safe. If the upstream API returns a revised count for an existing slice, the row updates. If the slice is new, it inserts. No diff logic needed.


    RPCs, never raw SQL

    Application code never sends SQL strings to Postgres. All analytical queries go through parameterized RPCs in the fire_data schema.

    Properties every RPC must have

    CREATE OR REPLACE FUNCTION fire_data.<name>( p_<param1> <type>, ... ) RETURNS TABLE(...) -- typed return LANGUAGE sql -- or plpgsql when needed STABLE -- so Postgres can inline / parallelize PARALLEL SAFE -- when the query allows SET search_path = '' -- prevents search_path attacks; fully qualify all -- table references inside the function AS $$ -- fully-qualified table names: fire_data.fact_daily, not fact_daily $$; GRANT EXECUTE ON FUNCTION fire_data.<name> TO anon, authenticated;

    Why SET search_path = ''

    Without it, an attacker who could create a temp table/function in the session could shadow fact_daily and run arbitrary code. With it, every table reference must be fully qualified, so there's no shadow risk. Supabase's security advisor will warn if you forget this.

    NULL RPC parameters mean “unfiltered fact slices”

    The get_fire_data RPC treats NULL on a filter-axis parameter as: only facts where that axis FK is NULL (upstream had no filter on that axis). Passing standard_code tokens (e.g. RESIDENTIAL) resolves to all matching filters.id rows for the requested source(s).

    -- Live pattern (per axis; breakdown-aware variant in postgres-contract) AND (p_fire_type IS NULL AND f.fire_type_code IS NULL OR p_fire_type IS NOT NULL AND f.fire_type_code IN (SELECT filter_id FROM resolved_fire_type))

    When the axis is in p_breakdown_axes the NULL-param branch instead keeps leaf rows (fire_type_code IS NOT NULL) and groups them — see postgres-contract-rls-and-keys.md.

    p_sources / p_dim_codes still use the older “NULL = no restriction” pattern — only the four filter axes use the IS NULL semantics above.


    The TypeScript data layer (lib/data-model)

    The application accesses the schema through a single module: lib/data-model/.

    Public surface (the barrel)

    Application code imports only from @/lib/data-model:

    // ✓ Right import { getCatalog, getFireData, type FactDaily } from '@/lib/data-model' // ✗ Wrong — bypasses the public surface import { getFireData } from '@/lib/data-model/queries' import { createFireDataClient } from '@/lib/data-model/client' import type { FactDaily } from '@/lib/data-model/types'

    The barrel index.ts re-exports query helpers and types from internal files. This lets internals be refactored without touching call sites.

    Internal boundary (the exception)

    createFireDataClient is intentionally not re-exported from the barrel. Application code asks "give me chart data" (high-level), not "give me a database connection" (low-level). Centralizing client construction means future cross-cutting concerns (telemetry, retry, auth swap) live in one place.

    The single legitimate exception is adapters inside lib/data-model/adapters/. They live inside the module's boundary and need direct Supabase access for service-role writes. They import the client directly from ./client. The "external code uses the barrel" rule doesn't apply to them because they're not external.

    Files

    lib/data-model/
    ├── index.ts        ← Public surface — re-exports queries.ts and types.ts
    ├── types.ts        ← Type aliases (Source, Dimension, Category, FactDaily, ...)
    ├── client.ts       ← INTERNAL: createFireDataClient (server + browser)
    ├── queries.ts      ← INTERNAL: getFireData, getCatalog, lookups
    └── adapters/       ← Adapters — import client directly (legitimate exception)
    

    Type entrypoint

    The canonical types entrypoint is types/database.types.ts. It re-exports the fire_data types under an alias so migration day is a one-line change:

    // types/database.types.ts (canonical entrypoint) // Today: legacy public schema is THE Database export type { Database } from './public.database.types' // And fire_data is aliased export type { Database as FireDataDatabase } from './fire_data.database.types' // Migration day: swap which one is THE Database. lib/data-model untouched.

    lib/data-model/* imports FireDataDatabase from this file. Don't import from fire_data.database.types directly.


    Adding a new dimension (the future-proof test)

    This is the schema's design test. Adding a new dimension must require zero schema changes. Specifically:

    1. INSERT into fire_data.dimensions (one row).
    2. INSERTs into fire_data.categories for the values the source reports.
    3. New method on the relevant adapter that ingests rows under the new dim_code.
    4. Redeploy the adapter. If a new dimension forces ALTER TABLE on fact_daily, the schema has failed its design goal. Investigate — usually the property you're trying to add is actually a metric or a filter axis, not a dimension.

    Adding a new source

    Same shape:

    1. INSERT into fire_data.sources.
    2. INSERTs into fire_data.categories for that source's vocabulary.
    3. New adapter file at lib/data-model/adapters/<source>.ts.
    4. Wire into the ingest scheduler. The chart RPC, the catalog RPC, every UI component — untouched.

    Adding a new metric

    A metric (insurance kr, injuries, fatalities, response time) is a nullable NUMERIC column on fact_daily and fact_yearly. Sources that don't track it leave it NULL.

    1. ALTER TABLE add the column.
    2. Update CHECK constraints if the metric has a domain (e.g. non-negative).
    3. Update get_fire_data's p_metric parameter to dispatch on the new metric name.
    4. Update TypeScript Metric literal type. This pattern scales to ~5 metrics. Beyond that, refactor to a metric-pivot table. YAGNI until then.

    Adding a new filter axis

    1. filters rows for the new axis (per source / native / standard).
    2. ALTER TABLE nullable INTEGER FK on fact_daily and fact_yearly
      • extend enforce_fact_filter_code_axes.
    3. get_fire_data resolved CTE + parameter + breakdown projection.
    4. Recreate fact_*_natural_key to include the new column (NULLS NOT DISTINCT).
    5. Regenerate TS types; extend lib/data-model/types.ts literals if new standard_code values are part of the public UI contract.

    Decision frameworks (rules of thumb)

    Use these to resolve future design questions without long debates:

    Is X a dimension, filter, or metric? Apply the three-bucket framework. "Does the UI show bars per value?" → dimension. "Apply uniformly?" → filter. "Number to aggregate?" → metric.

    Where does X live in the schema?

    • Property of a dimension value (Årsak, Tennkilde, Objekt, Arnested, …) → categories rows (native_name, standard_code) keyed by (source_id, dim_code, cat_code).
    • Property of an upstream slice parameter that narrows counts uniformly (BRASK type, BRIS Oppdragstype / revisedMissionTypes, næring, age bracket, …) → filters with the right axis (fire_type, building_type, building_age, region) plus standard_code where cross-source comparison applies — facts hold INTEGER NULL FKs → filters.id. Fire-type mapping for BRIS: filter-axes-native-to-filter-id.md §1.
    • Property of the data itself (count, kr) → numeric column. Should X be in fire_data or ingest?
    • Read by the application → fire_data.
    • Operational metadata about ingestion → ingest. Source-scoped or shared? Always source-scoped for categories. Cross-source equivalence via standard_code. Never share cat_code between sources.

    Does this change break Utforsk's source-fidelity? If yes, redesign. Utforsk shows native names without translation. Cross-source pages go through standard_code.

    Add a column or a table? A new metric is a column. A new entity (fundamentally different ontological thing — e.g. building stock totals vs. fire incidents) is a table. The fact_buildings_yearly split was deliberately separate from fact_yearly for this reason.

    Applying schema changes: use versioned files under supabase/migrations/ and the Supabase CLI (npm run db:migration:new, db:migration:list). Regenerate types with npm run types:gen:fire-data after DDL changes. The Supabase Dashboard SQL editor is for one-off exploration only — not the source of truth for production schema.


    Constraints and conventions reference

    For exact CHECK definitions, FK declarations, natural-key indexes (NULLS NOT DISTINCT), upsert SQL, RLS policies, and the RPC template, read postgres-contract-rls-and-keys.md beside this file. Use it when you need strings to copy.

    Short summary: source_id and dim_code are INTEGER FKs. Fact filter columns (fire_type_code, …) store filters.id only — never native_code or standard_code. Those strings live on filters / categories and enter read paths via get_fire_data text-array parameters.


    Don't-do list

    These are anti-patterns the v2 schema was built to eliminate. Don't reintroduce:

    • No execute_sql RPC. Every query goes through a parameterized RPC. Raw SQL strings from the application is the original sin this schema exists to eliminate.
    • No client-side filtering of large result sets. Filters become RPC parameters. The browser receives only what it needs to render.
    • No business logic in API routes. Routes are thin pipes — call RPC, return JSON, handle error. Aggregation lives in Postgres.
    • No ambiguous filter values on facts. Facts hold filters.id; do not write raw standard strings into fact filter columns.
    • No shared cat_code across sources. Source-scoped is mandatory.
    • No NULL dim_code rows in categories. Every category belongs to exactly one dimension. The CHECK constraint enforces this.
    • No mixing daily and yearly in one fact table. Split by grain.
    • No silent WHERE filter = ANY(...) that drops sources without warning. The graceful-degradation property uses NULLs, not OR-logic. Sources that don't carry the filter return NULL rows, which the WHERE excludes automatically. This is the right behaviour but it should be explicit in the UI ("DSB excluded — does not carry fire_type info").
    • No raw "ALL" rollup rows in fact tables. When ingesting BRASK with bygg_type=alle, this would double-count rows that also exist as bygg_type=boliger. Ingest the granular slices, never the rollup.