HWHeat Waves
    DashboardUtforsk
    Analyse
    Data Kilder
        • README
        • v2 Rationale
        • v2-001 Migration
        • v2-002 Filter Codes
        • Crosstab
        • Brask Ingest
        • Patches & RPC
      • Tables & Grain
      • Postgres Contract
    • Design Rationale
    • Doc Map
    DocsSettings
    DashboardAtlasUtforsk
    Analyse
    Data Kilder
    1. Documentation
    2. Fire Data Schema
    3. Schema history index
    4. Crosstab And Bucket Modeling

    Loading documentation...

    HeatWaves schema design — analytical depth via filter axes

    Source of truth: Canonical schema-rationale document for v2 data modeling decisions.

    Status: Decision document. Authored 2026-04-29, aligned to deployed v2-002-filter-codes-migration-narrative.md on 2026-04-29. Scope: Pre-ingest schema decisions for the v2 fire_data schema. Related: ../../semantics-and-sources/taxonomy-and-eu-firestat-alignment.md, ../conceptual-tables-and-grain.md.

    This document captures the design decisions made for storing fire statistics data from BRASK, BRIS (Police, Brigade, DSB), and SSB into the HeatWaves Supabase schema. It exists because some of these decisions are expensive to reverse (require full re-ingestion of historical data), so they need to be made deliberately, before the first ingest run, and documented for future reference.


    1. Goal

    Enable complete analytical freedom over Norwegian fire statistics through a schema that is:

    • Future-proof — adding new sources, dimensions, filter axes, or metrics requires no breaking schema changes.
    • Flexible — the same fact data supports single-dimension breakdowns, cross-tabulations, time series, and multi-source comparisons without re-ingestion.
    • Expandable — additive operations only. New analytical capabilities never require rewriting existing fact rows.
    • Precise — preserves native source vocabulary while enabling cross-source comparison via standardized codes. No information lost at ingest.

    The driving analytical use cases are:

    1. Single-dimension breakdowns over time (e.g., "fires by cause, monthly, 2010-2024").
    2. Cross-source comparison on standardized axes (e.g., "BRASK + BRIS dwelling fires by cause").
    3. Cross-tabulation matrices — N tables of (dimension × time), one per filter slice (e.g., one cause-over-time matrix per building age bracket).
    4. Native source drill-down preserving original vocabulary (Utforsk page).

    Use case #3 is the design driver for several decisions below.


    2. The three-bucket framework (the decision rule)

    Every property of a fact row falls into exactly one of three buckets. The bucket determines storage shape.

    BucketUI roleStored asExamples
    DimensionShow bars/series per valuedim_code + cat_code rowsKILDE, ÅRSAK, ARNESTED, OBJEKT, BYGNINGSALDER, BYGNINGSTYPE
    Filter axisApply uniformly across all barsNullable column on fact rowsfire_type_code, building_type_code, building_age_code, region_code
    MetricSum/average/count this numberNullable NUMERIC columnvalue (count), value_kr (NOK)

    The deciding question

    "Does the UI show bars per value of this thing?"

    • Yes → dimension. Stored as dim_code + cat_code rows.
    • No, but it narrows what gets aggregated uniformly → filter axis. Nullable column.
    • It's a number to aggregate → metric. NUMERIC column.

    Why this matters

    • A dimension lets you produce N series from one query: "fires by cause" → one bar per cause.
    • A filter narrows what gets counted before the breakdown: pass CONFINED (standard) or a native token into get_fire_data — the RPC resolves filters.id — then aggregate over the active dimension.
    • A metric is the number being summed/averaged: count of fires, kr of damage.

    The schema does not enforce which bucket you pick — Postgres stores either shape happily. The decision is about how you want to query.

    Numeric dim_code (facts and categories)

    Facts reference dimensions.dim_code as an INTEGER PK. EU-aligned analytical axes use the fixed ids from docs/ingestion/adapter-mappings/category-native-to-cat-code.md:

    dim_codedimensions.name (typical)
    1Årsak
    2Tennkilde
    3Objekt
    4Arnested

    Additional axes (e.g. Bygningsalder, Bygningstype) get their own integer rows in fire_data.dimensions. Authoritative dim_code values: 5 = Bygningstype, 6 = Bygningsalder — see category-native-to-cat-code.md §6–§7.


    3. The hard constraint: dimension ↔ filter is expensive to reverse

    This is the rule that makes pre-ingest decision-making critical.

    Adding a new column later is cheap — ALTER TABLE ADD COLUMN, run adapters again, populate. No re-fetch from source APIs needed if the data is already in fact rows in some form.

    Demoting a dimension to a filter (or vice versa) is expensive — these are different storage shapes. Dimension data lives as separate rows (one per category value); filter data lives as a column on existing rows. Reshaping requires re-fetching from source APIs and re-writing fact tables, because the rows themselves don't carry the information in the right arrangement to reconstruct the other shape.

    Easy to change later (additive)

    ChangeCost
    Add a new dimensionINSERT INTO dimensions + INSERT INTO categories + new adapter method. Zero schema changes.
    Add a new filter axisALTER TABLE ADD COLUMN (nullable) + update natural-key unique index + add/update filters mappings + update RPC + run adapters.
    Add a new metricALTER TABLE ADD COLUMN (nullable NUMERIC) + update RPC dispatch.
    Add a new sourceINSERT INTO sources + new adapter file.
    Add new categories to existing dimensionINSERT INTO categories.
    Update standard_code mappingsPlain UPDATE on categories.

    Hard to change later (requires re-ingest)

    ChangeCost
    Demote a dimension to a filterFull re-ingest of all data from sources providing it.
    Promote a filter to a dimensionFull re-ingest of all data carrying it.
    Add a filter axis but only ingest combined slicesFull re-fetch with split parameters from source APIs.
    Change fact-table grain (daily ↔ yearly)Full re-ingest. Already split correctly.

    Rule: at ingest time, always fetch the most granular slice the source API offers, even if today's UI doesn't use that granularity. Granular ingest is a few extra API calls; coarse ingest plus future regret is a full re-fetch.


    4. Property-by-property decisions

    Dimensions (shows bars per value)

    Propertydim_codeNative termSourcesRationale
    Årsak1Årsak, BrannårsakBRASK, Police, BrigadePrimary causal factor (EU Firestat #4). Cross-source via standard_code.
    Tennkilde2Tennkilde, KildeBRASK, DSBHeat source breakdown is the core EU Firestat #11 analysis.
    Objekt3Brannen startet i, Hva startetPolice (Sub-A), DSBItem first ignited (EU Firestat #12).
    Arnested4Antatt arnestedPolice (Sub-A), DSBArea of origin (EU Firestat #10).
    BYGNINGSTYPE5Bygningstype, Matrikkel codesBRASK, Police (Sub-A), SSBGranular building sub-type (46 Matrikkel codes).
    BYGNINGSALDER6BygningsalderBRASK onlyBuilding age bracket. Single-source dimension; native-vocab drill-down.

    Filter axes (narrow uniformly before aggregating)

    ColumnDomainSources carrying itRationale
    fire_type_codefilters.id (axis='fire_type') | NULLIncident sources carrying typeNative codes live in filters; standard_code exposes CONFINED / DEVELOPED for cross-source RPC filters.
    building_type_codefilters.id (axis='building_type') | NULLBRASK, BRIS Police Sub-ACoarse RESIDENTIAL / commercial / other via standard_code.
    building_age_codefilters.id (axis='building_age') | NULLBRASK — see §5Cross-tab driver; natives (e.g. age brackets) live in filters.
    region_codefilters.id (axis='region') | NULLTBDPopulate filters before regional ingest.

    Metrics

    ColumnTypeSourcesRationale
    valueBIGINT NOT NULLAllFire count (the universal metric).
    value_krNUMERIC NULLBRASK onlyInsurance damage in NOK. NULL on non-BRASK rows. NULL metrics ≠ NULL filters: rows are still included in queries; the metric just contributes nothing. Single-source metrics are fine.

    The "both" pattern — explicit dual treatment

    Two properties are stored as both a dimension and a filter axis. This is intentional and serves different analytical use cases at different granularities.

    BYGNINGSTYPE dimension + building_type_code filter

    • Dimension form (46 Matrikkel codes): native granular drill-down — "fires by Enebolig vs Tomannsbolig vs Rekkehus". Source-specific (BRASK).
    • Filter form (3 values: RESIDENTIAL / COMMERCIAL / OTHER): cross-source filter — "BRASK + BRIS for RESIDENTIAL only".
    • Adapter responsibility: ingest writes both. The granular dim row carries the Matrikkel code; the same fact row carries the standardized 3-value building_type_code. Two ingestion paths, two analytical capabilities, one fact row.

    BYGNINGSALDER dimension + building_age_code filter

    • Dimension form: standalone "fires by building age" breakdown.
    • Filter form: enables the cross-tabulation matrix use case (§5).
    • Adapter responsibility: for every BRASK characteristic (KILDE, cause), iterate over the Bygningsalder native parameter values and tag each resulting fact row with building_age_code. This produces the joint distribution needed for cross-tab analysis.

    5. The cross-tabulation use case (the design driver)

    The analytical goal

    For a given dimension (e.g., cause), produce N matrices — one per building age bracket — where each matrix is cause × year:

    Matrix 1 — Building age 0-10:    cause × year (count of fires)
    Matrix 2 — Building age 10-20:   cause × year
    Matrix 3 — Building age 20-50:   cause × year
    Matrix 4 — Building age 50+:     cause × year
    

    This is a joint distribution — each value answers "fires of cause X in age bracket Y in year Z". This is fundamentally different from marginal distributions (cause-only or age-only).

    Why building_age_code must be a filter, not just a dimension

    If Bygningsalder were stored only as a dimension (dim_code = 6 for Bygningsalder + cat_code), the cause rows would have no age tagging. The schema could only answer "fires by cause" OR "fires by age", never "fires by cause AND age". Marginal data cannot be used to compute joint statistics — the information was never recorded together.

    Storing building_age_code as a filter FK tagging joint slices — same dim_code / cat_code identification plus building_age_code pointing at the bracket's filters row.

    Why this is feasible (BRASK API capability)

    BRASK's API permits cross-filtering: every category within a dimension can be selected as a filter, and queries can request a different dimension as the row breakdown while filtering by another. Concretely:

    Fetch cause breakdown WHERE Bygningsalder = "51-75" AND Type = "Kald" AND BygningType = "Boliger"
    

    This is what makes the joint data extractable from the source. The schema's job is to preserve that joint structure.

    Ingest cross-product

    For each BRASK characteristic (Kilde, Årsak), the adapter iterates the cross-product. HeatWaves has one building-type filter: building_type_code (plus filters.standard_code for cross-source buckets like RESIDENTIAL vs other). On BRASK, building-type semantics are supplied as native næring (naering in the Utforsk JSON) — not a second HeatWaves axis.

    fire_type      × 2   (BRASK `type`: Varm/Kald — maps to fire_type_code / filters)
    building_type  × 23  (BRASK `næring` ids '0'..'21','99' — canonical list in [`BRASK_EXPLORER_DIMENSIONS_JSON`](../../../lib/data-model/adapters/_braskWebForms.constants.ts) `dimensionValues.naering`)
    building_age   × N   (Bygningsalder brackets)
    

    = 2 × 23 × N fetches per characteristic when iterating all næring natives. With building_type fixed to one sector (e.g. naering='0' / Beboelse only), collapse the middle factor to 1 → 2 × N. Each native id maps to its own filters.id and standard_code (see seed_brask_naering_building_type_categories.sql).

    This raises ingest API call volume but is the price of analytical depth. Per the rule in §3: fetch granular at ingest, never coarse-and-regret.

    One-engine RPC contract (get_fire_data)

    The deployed schema uses one RPC engine for standard charts and cross-tabs: fire_data.get_fire_data.

    • No separate get_cross_tab RPC should be introduced.
    • Cross-tab is a query shape, not a second backend function.
    • Drift prevention rule: all filter and breakdown behavior evolves only in get_fire_data.

    Next.js consumption pattern (single engine)

    // BRASK source id matches lib/data-model/source-ids.ts (FIRE_DATA_SOURCE_ID.BRASK). const BRASK_SOURCE_ID = 1 as const // Årsak → dim_code 1 per docs/ingestion/adapter-mappings/category-native-to-cat-code.md (confirm in dimensions catalog). const DIM_CAUSE = 1 as const // Årsak export async function getCrossTabByAge( client: SupabaseClient, opts: { dimCode: typeof DIM_CAUSE fireType?: 'CONFINED' | 'DEVELOPED' startYear: number endYear: number } ) { const ageBrackets = ['Ny', '1-5', '6-10', '11-15', '16-20', '21-25', '26-30', '31-40', '41-50', '51-75', '76-100', 'Over 100', 'Ukjent'] as const const matrices = await Promise.all( ageBrackets.map(async (age) => { const { data, error } = await client.rpc('get_fire_data', { p_start_date: `${opts.startYear}-01-01`, p_end_date: `${opts.endYear}-12-31`, p_sources: [BRASK_SOURCE_ID], p_dim_codes: [opts.dimCode], p_fire_type: opts.fireType ? [opts.fireType] : null, p_building_age: [age], p_breakdown_axes: [], p_granularity: 'year', p_metric: 'count', }) if (error) throw error return { ageBracket: age, rows: data ?? [] } }) ) return matrices }

    Each matrix is rendered as one table or one heatmap in the UI. Parallelizing the calls keeps the page responsive.

    If you want all age brackets in one call, pass:

    p_building_age: ['Ny', '1-5', '6-10', '...'], p_breakdown_axes: ['building_age']

    Then split results by building_age_code (filters.id) in application code — join filters for legends.


    6. Deployed contract (runtime database)

    Authoritative SQL narratives live in docs/fire-data-schema/schema-history/v2-002-filter-codes-migration-narrative.md (history + rationale). Live shapes (PK types, FK targets, natural keys, RPC) are summarized in docs/fire-data-schema/postgres-contract-rls-and-keys.md and generated types.

    6.1 Filter axes → filters.id on facts

    • Incident fact_daily / fact_yearly store fire_type_code / building_type_code / building_age_code / region_code as INTEGER NULL FK → filters.id.
    • filters rows carry native_code, standard_code, and display fields per (source_id, axis, native_code). Every row has a defined standard_code.
    • RPC get_fire_data accepts native or standard strings in p_fire_type etc.; it resolves them to filters.id before filtering facts. Omitted axis args should match facts where that FK IS NULL (see postgres-contract-rls-and-keys.md).
    • BRASK næring='0' (Beboelse) is a filters catalog row → its filters.id on facts. BRASK næring='' (Alle næringer) is the unfiltered request: it is not a filter row but the building_type_code IS NULL total slice. Overlap is avoided because the NULL total is read via the omitted axis param and never summed with Beboelse.

    6.2 Natural-key upsert (NULLS NOT DISTINCT)

    Unique indexes include all four nullable filter FK columns as plain integers.

    CREATE UNIQUE INDEX fact_daily_natural_key ON fire_data.fact_daily ( source_id, dim_code, cat_code, date, fire_type_code, building_type_code, region_code, building_age_code ) NULLS NOT DISTINCT;
    INSERT INTO fire_data.fact_daily (...) VALUES (...) ON CONFLICT ( source_id, dim_code, cat_code, date, fire_type_code, building_type_code, region_code, building_age_code ) DO UPDATE SET value = EXCLUDED.value, value_kr = EXCLUDED.value_kr, ingested_at = now();

    6.3 TypeScript types

    Generated FactDaily / FactYearly use number | null for filter FK columns. Prefer types/fire_data.database.types.ts after every migration.

    6.4 BYGNINGSALDER dimension (still ingested as dimension)

    The adapter writes both marginal (dim_code = 6 for Bygningsalder) and joint (another dimension row + building_age_code FK) slices — same analytical separation as before; only storage types changed.

    6.5 BRASK adapter behaviour (conceptual)

    Resolve each native tuple to filters.id (upsert filters first), then write facts:

    // lib/data-model/adapters/brask.ts (sketch) await upsertFactRows(client, data.map(row => ({ source_id: BRASK_SOURCE_ID, dim_code: DIM_CAUSE, cat_code: row.causeCode, date: row.date, value: row.count, value_kr: row.kr ?? null, fire_type_code: await lookupFilterId(client, { axis: 'fire_type', native: ft.native }), building_type_code: await lookupFilterId(client, { axis: 'building_type', native: bt.native }), building_age_code: await lookupFilterId(client, { axis: 'building_age', native: age.native }), region_code: null, })))

    7. Graceful degradation properties preserved

    When the user activates a building_age_code filter:

    • BRASK rows with that age tagging → included.
    • BRASK rows without tagging (e.g. older ingest before this change, or characteristics that don't use age brackets) → building_age_code IS NULL → excluded.
    • All BRIS rows → building_age_code IS NULL → excluded silently.
    • SSB rows → building_age_code IS NULL → excluded silently.

    The UI must signal this exclusion when it activates the filter — "Building age available from BRASK only. Other sources excluded." This matches the filter-axis graceful-degradation pattern in conceptual-tables-and-grain.md and applies identically here.

    The cross-tab use case is single-source (BRASK only) by design, so silent exclusion is the desired behaviour.


    8. What this design buys analytically

    With building_age_code as a filter axis, you can now compute (without re-ingest):

    • Cross-tabs: cause × year, one matrix per age bracket. KILDE × year, one matrix per age bracket. Same for any future BRASK dimension.
    • Triple cross-tabs: cause × year × age (if rendered as a 3D structure or stacked heatmaps).
    • Filtered breakdowns: "fires by cause in pre-1950 buildings" — one chart, age bracket as filter rather than dimension.
    • Composite filters: "confined fires in RESIDENTIAL built before 1950, by cause" — compose fire_type_code + building_type_code + building_age_code filters together with any dimension.
    • Marginal-from-joint sanity checks: SUM across age brackets should equal the BYGNINGSALDER-NULL fact row total. A useful integrity check at ingest QA.

    All of this is enabled by storing the joint distribution at ingest. None of it would be possible from marginal-only storage.


    9. Future expansion checklist

    When adding a new BRASK dimension or filter axis, follow this order:

    1. Decide the bucket using §2's deciding question. Does the UI show bars per value? → dimension. Apply uniformly? → filter. Number to aggregate? → metric.
    2. Check API capability: does BRASK allow this property to be filtered while requesting a different dimension as the breakdown? If yes, joint data is extractable — proceed. If no, only marginal data exists — store as dimension only, accept the analytical limitation.
    3. For dimensions: INSERT into dimensions and categories, write adapter, no schema migration.
    4. For filter axes: migration per §6.1 pattern — column, natural-key index update, filters mapping, RPC parameter support. If the property has both forms (like BYGNINGSTYPE / building_type_code), wire the adapter to write both.
    5. Document the cross-tab capability: which dimensions can be cross-tabbed against which filter axes — this is determined by which (dim × filter) combinations the adapter actually ingests.

    Adding new analytical capability post-ingest is always cheaper than fixing a missed joint distribution. The only thing this design cannot retrofit cheaply is a new filter axis on data already ingested at coarser granularity. Hence: think before ingesting.


    10. Decisions deferred

    These are flagged for explicit later resolution:

    • Native BRASK Bygningsalder bracket boundaries — confirm the canonical native set from brask.finansnorge.no and keep filters mappings aligned. Do not merge native buckets.
    • region_code value domain — reserved column, value set not yet decided. Fylke-level seems likely.
    • Cross-tab UI rendering — N-matrix display is a UI concern, not a schema concern. The schema enables it; the UI decides whether to render as stacked heatmaps, small multiples, or a tab switcher.
    • Performance budgets — at current data scale (hundreds of thousands of rows) get_fire_data cross-tab-shaped queries should respond in single-digit milliseconds. Revisit if approaching millions of fact rows.

    11. Anti-patterns to avoid

    These would break the design's properties — do not introduce them:

    • Don't store BYGNINGSALDER as a dimension only, then try to compute cross-tabs in the application layer. The marginal data does not contain joint information; you'd be silently fabricating values.
    • Don't unify BYGNINGSTYPE dimension with building_type_code filter into a single representation. They serve different granularities; merging loses one analytical capability.
    • Don't ingest BRASK with Bygningsalder=alle (the rollup) and skip the per-bracket fetches. The combined slice is unrecoverable into per-bracket data.
    • Don't add building_age_code to BRIS or SSB ingest paths without the source actually supplying age data. NULL is the correct value; fabricated values would corrupt cross-tabs.
    • Don't expose raw SQL or the execute_sql pattern in the application. Every query goes through a typed RPC — this is enforced by the v2 schema.
    • Don't bypass the standardize-and-preserve pattern: native age bracket strings stay in categories.name, and building_age_code stores the native filter value on fact rows; cross-source standardization happens through filters.standard_code.