HWHeat Waves
    DashboardUtforsk
    Analyse
    Data Kilder
      • Catalog
      • Pipeline
      • Entrypoints
    • Design Rationale
    • Doc Map
    DocsSettings
    DashboardAtlasUtforsk
    Analyse
    Data Kilder
    1. Documentation
    2. Ingestion
    3. Catalog governance and ingest reconciliation

    Loading documentation...

    Catalog governance and ingest reconciliation

    Orchestration view of catalog completeness, ingest routing, and quality signals for fire_data. Complements pipeline and Postgres contract docs.

    Ingest operational policy — HeatWaves fire_data

    Purpose and scope

    This document provides a concise orchestration view of catalog governance, ingest routing, and quality signals for fire_data. It complements—not replaces—source-specific and schema-canonical references enumerated in §7.

    Normative pipeline behaviour, atomic-fact rules, and per-adapter contracts live in pipeline-slices-idempotency-and-triggers.md. Physical constraints and trigger semantics live in postgres-contract-rls-and-keys.md. Grain branching for reads is specified in incremental-patches-and-rpc-changelog.md and v2-design-rationale-overview.md.

    Database and migration references: docs/fire-data-schema/schema-history/README.md.


    1. Category catalog completeness (upstream buckets)

    Upstream APIs expose analytical breakdowns as bucket maps: keys (typically native labels or sentinel tokens such as -1) map to counts. Adapters resolve keys to categories.cat_code via registry lookups (registry.catByNativeName; see pipeline-slices-idempotency-and-triggers.md §4.1).

    Consequences when no matching categories row exists

    • The bucket is omitted from transformed output (adapter continuation with diagnostic warnings where implemented).
    • Counts from omitted buckets never reach fact_yearly / fact_daily, so summed ingested values may remain below the upstream aggregate total until the catalog is repaired.

    Requirement: Each upstream bucket key intended for analytics MUST correspond to a (source_id, dim_code, cat_code) row before slices relying on that vocabulary are treated as production-complete. Missing-answer buckets (-1) MUST map to the source's Ikke besvart category where the API emits them (category-native-to-cat-code.md §5).

    Contrast with filter axes: Unknown or invalid filters.id values typically fail foreign-key checks or enforce_fact_filter_code_axes(); missing category mappings fail silently relative to totals unless reconciliation catches the gap (postgres-contract-rls-and-keys.md, pipeline-slices-idempotency-and-triggers.md §7).

    Upstream assertion vs landed atoms: Upstream api_total (on fire_data.ingest_runs or in IngestResult) is the native API total for the slice. SUM(fact_yearly.value) over the same slice keys can be lower when bucket keys lack categories rows — a catalog gap. Compare api_total to fact aggregates, or use probe_slice_coverage + fact SQL, to detect missing categories (pipeline-slices-idempotency-and-triggers.md §5.5–5.6, ../fire-data-schema/schema-history/README.md).


    2. Ingest orchestration (high level)

    flowchart LR subgraph sched["Scheduler / operator"] cron["Cron or authenticated POST"] end subgraph app["Application tier"] route["POST /api/admin/ingest"] dispatch["ingestBySourceId / _dispatch"] adapter["Source adapters BRIS · BRASK · SSB"] end subgraph db["fire_data schema"] registry["categories, filters SourceRegistry"] facts["fact_yearly, fact_daily"] guards["fact_*_enforce_filter_axes"] end cron --> route --> dispatch --> adapter dispatch --> registry adapter -->|"HTTP"| upstream["BRIS / BRASK / SSB"] upstream --> adapter adapter -->|"typed upsert"| facts guards -.-> facts

    Security boundary and Utforsk versus materialization paths for BRASK are specified in docs/fire-data-schema/schema-history/brask-ingest-loop-and-invariants.md §§3–4.


    3. Catalog and seed governance

    The following diagram summarizes governance flows; enumerated slice matrices (e.g. BRASK cross-product cardinalities) are defined in brask-ingest-loop-and-invariants.md §4, not restated here.

    flowchart TD start["Scope sources and dimensions"] start --> src["sources × dim_code inventory"] src --> cats["categories seeds native ↔ cat_code"] src --> filt["filters seeds native ↔ axis ± applicable_dim_codes"] cats --> qual["Quality posture"] filt --> qual qual -->|"required for production"| strict["BRIS sum assertion passes slice-wide"] qual -->|"non-production"| soft["Logged skips acceptable"]

    Governance parameters (environment-specific; not duplicated from companion docs):

    ParameterReference
    In-scope source_id × dim_codeProduct roadmap; seed migrations
    BRASK joint-distribution breadth vs narrowed bring-upbrask-ingest-loop-and-invariants.md §4 (naering enumeration, fetch counts)
    Taxonomy drift ownershipOperational RACI; seed PR workflow

    Granular-ingest design: crosstab-and-bucket-modeling.md §3.


    4. Adapter selection and temporal grain

    flowchart TD Q{"Chart period granularity"} Q -->|"year"| Y["RPC reads fact_yearly only"] Q -->|"day · week · month · quarter"| D["RPC reads fact_daily supports_subyear = true"] Y --> cap["Per-source yearly capability ETL §10"] D --> cap2["Sub-year excludes yearly-only sources incremental-patches-and-rpc-changelog.md"]

    Canonical tables: pipeline-slices-idempotency-and-triggers.md §10 (source × grain coverage), v2-design-rationale-overview.md §2 (supports_subyear), incremental-patches-and-rpc-changelog.md (get_fire_data branching).

    BRIS slice atomicity: one native mission-type identifier per ingest slice (pipeline-slices-idempotency-and-triggers.md §5.3; filter-axes-native-to-filter-id.md).


    5. Observability and quality gates

    flowchart LR run["Ingest run"] run --> http["IngestResult JSON"] run --> ir["ingest_runs: progress + counters + api_total / per_slice_metrics"] run --> ledger["ledger_events optional undo"] http --> ops["Alerting thresholds"] ir --> probe["probe_slice_coverage RPC"] ledger --> ops ops --> fix["Seed or adapter correction"] probe --> fix fix --> rerun["Re-ingest"]

    Signal taxonomy — how operator-facing signals line up with storage:

    SignalWhere it livesRole
    Per-slice status, errors, row countsIngestResult (HTTP / logs)Immediate run feedback
    Progress (slices_done / slices_total), write breakdown (rows_added / rows_updated / rows_skipped_identical), rows_upserted, scalar api_total / transformed_row_count (Utforsk) or per_slice_metrics (batch)fire_data.ingest_runsDurable dashboard + Explore polling (pipeline-slices-idempotency-and-triggers.md §5.5–5.6)
    Mid-slice progress for multi-stage adapters (v, kind, phase, done / total, canonical_progress, cursor, stats)fire_data.ingest_runs.phase_detail jsonbDaily adapters (brask_daily_fanout, bris_police_daily_fanout); UI bar reads canonical_progress (pipeline §10.3–10.4)
    Slice coverage / fact aggregates vs last complete runprobe_slice_coverage (yearly) / probe_slice_coverage_daily (daily, scoped to a date window)Covered / partial heal / cold ingest / login prompt on /explore
    Row-level revertledger_events + undo_ingest_run RPC (routes to fact_yearly or fact_daily per event)Optional undo path; not required for analytics

    Hard failures versus soft skips per adapter remain in pipeline-slices-idempotency-and-triggers.md §7.

    Operational parameters (set per deployment):

    ParameterNotes
    Severity mapping for IngestResult errorsE.g. paging on any slice error vs BRIS total mismatch only
    Acceptance of non-fatal skipsE.g. SSB region resolution gaps (pipeline-slices-idempotency-and-triggers.md §7)
    Staging gates using probe_slice_coverageOptional CI / pre-promotion checks

    6. Database-enforced invariants

    On insert or update to fact tables:

    • Foreign keys to filters(id) reject unknown identifiers.
    • enforce_fact_filter_code_axes() rejects incorrect filters.axis or dim_code outside applicable_dim_codes when populated (postgres-contract-rls-and-keys.md).

    Not enforced in Postgres: alignment of filters.source_id with fact.source_id; adapters MUST resolve identifiers through SourceRegistry for the active source (postgres-contract-rls-and-keys.md).

    BRIS adapters enforce sum(transformed values) = upstream total at slice level (pipeline-slices-idempotency-and-triggers.md §4.1).

    Operational hazards outside these guards are summarized in pipeline-slices-idempotency-and-triggers.md §5.4.


    7. Canonical bibliography

    TopicDocument
    Adapter pipeline, idempotency, BRIS/BRASK/SSB specificspipeline-slices-idempotency-and-triggers.md
    BRASK loop, cross-product fetches, transactionsdocs/fire-data-schema/schema-history/brask-ingest-loop-and-invariants.md
    Dimension vs filter design, cross-tabs, granular-ingest ruledocs/fire-data-schema/schema-history/crosstab-and-bucket-modeling.md
    Constraints, triggers, natural keyspostgres-contract-rls-and-keys.md
    Filter native ↔ standard mappingfilter-axes-native-to-filter-id.md
    Category mappingcategory-native-to-cat-code.md
    get_fire_data parametersdocs/fire-data-schema/schema-history/incremental-patches-and-rpc-changelog.md
    Schema rationaledocs/fire-data-schema/schema-history/v2-design-rationale-overview.md