OASC
Skill · team LESL
MIMathon Porto 2026 · reusable skill

The pivot harmonizer pattern.

One canonical model, written in Dolfin. N adapters that read source datasets and emit canonical entities. M writers that render the canonical entities to different open standards. The pivot makes the cross-format mapping go away by construction.

Authorsteam LESL
Validated on3 open-data use cases
Pivot languageDolfin
Target audiencedata engineers, smart-city teams

What this pattern is good for.

Use it when

Multiple sources, one entity.

Several departments or providers describe the same kind of thing with different field names, units, structures, or spelling variants. The pivot dedupes by construction.

Use it when

Multiple consumers, no drift allowed.

You must publish to several open standards (Smart Data Models, INSPIRE, DATEX II, schema.org). N×M direct mappings drift; one pivot stays consistent.

Use it when

The data is messy.

Multilingual fields, embedded vCard or iCalendar, Python dict-literals in CSV cells, free-text values that hide structured information. The pivot is where the cleanup converges.

Don't use it when

One source, one consumer, fixed schemas, no growth expected → just write a script. An existing standard already fits the source data without translation → use it directly. Dataset is tiny and one-off → not worth the structure.

N in, M out, one pivot.

        source A       source B       source C
           │              │              │
        adapter A      adapter B      adapter C
           │              │              │
           ▼              ▼              ▼
        ┌──────────────────────────────────────┐
        │ Canonical model (Dolfin pivot)       │
        │ + typed sub-entities                 │
        │ + external IRI references            │
        │ + closed enums                       │
        └──────────────────────────────────────┘
           │              │              │
        writer 1       writer 2       writer 3
           │              │              │
           ▼              ▼              ▼
        SDM JSON-LD    DATEX II      GeoJSON
        consumer       consumer       consumer

Each new source = one new adapter. Each new output = one new writer. The pivot stays small.

From a CSV to a multi-format Open Data deliverable.

Audit the data

Count records and columns, list distinct values for candidate enums, spot format quirks (embedded vCard, Python dict literals, spelling variants). The spelling variants are gold: they reveal the shared entities your model should lift.

Benchmark open standards

Search Smart Data Models, schema.org, domain-specific (DATEX II, INSPIRE, GBIF, Wikidata). Decide for each concept: align (standard fits), partial (extend with local namespace), or gap (define and propose).

Define the canonical model in Dolfin

Write <domain>.dolfin. Lift every shared real-world entity (Authority, Species, Category) to a separate concept. Use enums for closed sets. Add optional refExt attributes for dereferenceable IRIs. Be generous with optional.

Scaffold the harmonizer package

Create harmonize_<domain>/ with model.py, transforms.py, one writer per output format, __main__.py, and adapters/_template.py. transforms.py is portable: copy from any reference implementation.

Adapter contract

Every adapter exposes one function: read(path) -> Iterator[CanonicalEntity]. Source parsing, regex extraction, registry dedup live in the adapter. No writer logic, no API logic, no CLI logic.

Write the writers

One writer per output format. Each writer reads canonical entities, never source data. Two writers cannot drift, because they read the same input. Add a third format = add a third writer.

External references

Where a concept has global identity, attach a resolvable IRI. Cache lookups to disk (GBIF, schema.org). Prefer an explicit JSON lookup file (category_map.json) over an API call for small controlled vocabularies.

Wire the CLI

A single CLI orchestrates: python -m harmonize_<domain> --adapter <src> --input ... --output ... --<extra-format> .... Optional output flags enable additional writers without touching the core.

Validate

Re-run on a second dataset without changing the core. If you cannot, the canonical model is too source-specific. Check sanity metrics: record counts, distinct entity counts before vs after, spot checks by ID.

Ship

Page per use case, Dolfin file deposit, tarball of the harmonizer, slides in markdown. Make it reproducible: anyone should be able to clone, run, and get the same outputs.

Three concrete worked examples.

The pattern was developed and validated during the MIMathon Porto 2026 against three Porto Open Data sources covering three different standards positions.

Use caseDomainStandards stanceExternal backboneWriters
UC1 Classified urban trees Gap (no SDM Tree, proposed) GBIF Backbone Taxonomy JSON-LD, GeoJSON
UC2 Points of interest Align (SDM PointOfInterest) schema.org + Wikidata JSON-LD, GeoJSON
UC4 City traffic indicators Partial (SDM per-segment vs city-wide) SDM Transportation + DATEX II JSON-LD, DATEX II XML, GeoJSON

The full SKILL document.

The complete write-up, including all 10 steps with copy-paste code, file-tree templates, and external references, is shipped as a single markdown file. Use it as a working brief when you start a new harmonization project.

Suggested usage