MIMathon Porto 2026 — OASC Use Cases

Team LESL · deliverables

What team LESL shipped.

Team LESL (Lea, Eliott, Sattisvar, Louis, across Kereval, Dolfin and Askem) took three of the five tracks. Each one follows the same method: harmonize messy source data into a Dolfin canonical model, then derive every output format from that single pivot. The full method is documented as a reusable skill.

UC01 · Trees238 trees, GBIF taxonomy, SDM gap proposed UC02 · POIs58 POIs, schema.org + Wikidata, SDM PointOfInterest UC04 · Mobility2598 records, dual SDM JSON-LD + DATEX II Team LESLWho we are, the four members, the three orgs The skillThe pivot harmonizer method, reusable DownloadsSKILL.md + harmonizer-template.tar.gz, ready to fork

Dolfin pivot Smart Data Models GBIF · schema.org · Wikidata · DATEX II JSON-LD · GeoJSON · XML

Team GEX · deliverables

What team GEX shipped.

Team GEX (Marwen, Mohamed, Olaf, across GreenEarthXchange and Impact Funding Europe) took the energy track. They built the CityData Harmonizer, a nine-ranger AI plus deterministic pipeline trained on the OASC MIMs, tested on five energy datasets and proven domain-agnostic on water-quality data with zero pipeline change. They also propose a new EnergyConsumptionObserved Smart Data Model back to the community.

UC03 · EnergyCityData Harmonizer, 9 rangers, 5 + 2 sources, EnergyConsumptionObserved Team GEXWho we are, the three members, the two orgs Findings PDFThe full write-up, method behind the pipeline Pitch deck23 slides on the rangers and the architecture Glossary & architectureTerms + 7 architectural planes (L1–L7)

AI + deterministic pipeline 9 rangers / MIM champions EnergyConsumptionObserved Domain-agnostic (energy + water) MCP · Knowledge Base · Quality Gate

About the MIMathon

Working sessions on harmonized city data.

Municipal departments often describe the same reality in different ways: a tree, a museum, a meter reading, a scooter. The MIMathon gathers teams for a short, focused sprint to align schemas, agree on a canonical model, and publish the result as interoperable Open Data.

Each use case ships with a brief, sample data, and a clear objective. Teams can pick any of the five tracks below and build directly on Smart Data Models, INSPIRE, DATEX II, or compatible standards.

What you will produce

A canonical data model for the chosen entity
A mapping from local schemas to the canonical one
A working ETL or transformation prototype
A validated sample dataset, ready for Open Data

Case 01Characterization of trees Case 02Points of Interest Case 03Energy consumption Case 04Mobility export formats Case 05Shared scooter providers

01 Open Data, Metadata, Harmonization

Characterization of trees across city departments

Harmonize the representation of shared entities (such as trees) across municipal departments to ensure a consistent and usable Open Data publication.

→ Team LESL result, UC01

Context

The city publishes Open Data to support internal teams, researchers and the public. Each operational department maintains its own datasets independently, often producing multiple versions of the same entity with inconsistent structures and metadata.

Problem

Urban Planning and Green Spaces record similar entities using different field names, structures and levels of detail.

Tree species vs. tree name
Different attribute descriptions
Different units or granularities
Attributes present in one dataset, missing in another

Objective

Harmonize one shared entity (the tree) across departments by delivering: a common data model, aligned metadata, a reproducible mapping and ETL process, and a standardized Open Data output.

Concrete example

Source A: tree_name = "Oak" height = 12 Source B: tree_species = "Quercus robur" tree_height_m = 12.0 After harmonization species_common_name = "Oak" species_scientific_name = "Quercus robur" height_m = 12

Technical challenge

Define a canonical entity model
Map internal schemas to a unified model
Align and validate metadata
Select and apply an appropriate standard such as Smart Data Models or INSPIRE
Build transformation logic to convert existing datasets into the standardized model

Possible approach

Define a minimal canonical data model for the shared entity
Create a semantic mapping between internal schemas and the canonical model
Adopt an existing open standard such as Smart Data Models (SDM) or INSPIRE
Implement a prototype ETL or transformation logic
Validate the harmonized dataset before Open Data publication

CSV Smart Data Models INSPIRE ETL

Resources

CSV

Identification and characterization of classified trees, Porto

Porto Digital open dataset, 316 rows, 47 KB

Download

GEO

Classified trees, Porto, GeoJSON

Same dataset as Points (WGS84), 102 KB

Download

02 Data Models, POIs

Points of Interest, one taxonomy across tourism and mobility

Standardize and harmonize Points of Interest data to enable interoperable, searchable and extensible datasets for tourism, navigation and urban mobility.

→ Team LESL result, UC02

Context

The tourism department aims to build a digital tool helping visitors explore the city. The platform should provide routes of interest and highlight restaurants, museums, concert halls, parks and other relevant POIs, and centralize mobility options in a single place.

Problem

POIs are categorized, named and structured differently across departments and external sources.

Naming variations such as "museum", "cultural site", "heritage space"
Different category hierarchies or taxonomies
Different granularity levels, for example "restaurant" vs. "vegan restaurant"
Heterogeneous file formats and metadata structures

Objective

Enable standardized, extensible and interoperable POI datasets across domains by defining a unified taxonomy, metadata structure and data model usable by all city services and external partners.

Concrete example

Source A: "museum", "restaurant", "park" Source B: "cultural site", "dining", "green area" After harmonization category = "museum" (canonical) category = "restaurant" (canonical) category = "park" (canonical)

Technical challenge

Align POI categories across datasets
Define a canonical taxonomy linked to established standards
Ensure extensibility for local, culturally specific POI types
Maintain category governance and versioning over time

Possible approach

Adopt or align against existing taxonomies such as OpenStreetMap Tags, Google Places categories, or TomTom Places
Extend Smart Data Models or similar ontologies with local POI categories when needed
Establish a governance model for ongoing curation and updates
Include subcategorization and tagging for finer classification, for example religious sites subdivided into church, mosque, synagogue

CSV JSON NGSI-LD Smart Data Models

Resources

CSV

Points of Interest, Casas de Fado, Porto

Tourism, sport and leisure POIs sample, CitySDK schema, 4 records, 24 KB

Download

CSV

Points of Interest, Postos de Abastecimento, Porto

Petrol stations sample, same CitySDK schema, 54 records

Download

03 Energy, Observations

Modeling observed energy consumption

Create a unified, semantically rich and shareable data model for representing observed energy consumption across multiple providers and data sources.

→ Team GEX result, UC03

Context

To achieve carbon neutrality and meet the objectives of the City Climate Contracts, the city needs reliable and comparable tools to monitor energy consumption. Data from different providers and sectors must be integrated to support policy making, forecasting and performance tracking.

Problem

Consumption observations are collected by various providers using different semantics, definitions and file formats. No unified or standard model exists for structuring observed consumption across public and private contributors.

Hard to compare or aggregate data across organizations
Limited ability to build time series, forecasts or dashboards
Barriers to interoperability with smart city platforms and climate tools

Objective

Create a shareable, extensible and semantically rich model for observed energy consumption that enables interoperability across providers, analytics tools and smart city platforms.

Concrete example

Provider A: consumption = 150, unit = kWh timestamp = 2026-01-01T12:00:00Z Provider B: energy_value = 0.15, unit = MWh time = 2026-01-01 12:00:00 After harmonization consumptionObserved = 150 (kWh) timeObserved = 2026-01-01T12:00:00Z unitCode = kWh energySource = electricity

Technical challenge

No unified schema exists today for observed consumption across providers
Smart Data Models provide a foundation, but some attributes may be missing
Align semantics, units, timestamps and measurement contexts
Define relationships such as meter, building, location and time series

Possible approach

Leverage or extend the Smart Data Models Energy domain to define a canonical model
Add the missing attributes
Define context relationships, linking observations to meters, buildings and intervals
Validate the model through sample data transformation pipelines

CSV JSON NGSI-LD SDM Energy

04 Mobility, Export formats

Mobility data, published in DATEX II and SDM

Enable dual format publication of mobility and traffic data in DATEX II XML and Smart Data Models JSON-LD, while preserving a single canonical representation.

→ Team LESL result, UC04

Context

As part of an urban traffic management initiative, a private GPS company provides traffic data in DATEX II. The city uses Smart Data Models inside its smart city data architecture. To ensure interoperability and regulatory compliance, both formats must be supported for exchange and Open Data.

Problem

DATEX II and SDM use different structures, terminologies and encodings. Mobility datasets must be mapped, transformed and published in dual formats without introducing semantic inconsistencies.

Increased complexity of data pipelines
Potential loss of meaning when transforming between standards
Difficult to provide consistent, machine readable Open Data to different ecosystems

Objective

Provide flexible, standards compliant mobility data access by exporting in DATEX II XML and SDM compliant JSON-LD, backed by a single authoritative canonical model internally.

Concrete example

DATEX II XML (provider): vehicleFlowRate averageVehicleSpeed predefinedLocation SDM JSON-LD (city) flow averageVehicleSpeed location

Technical challenge

Define a canonical internal representation independent of DATEX II and SDM
Preserve semantic equivalence when mapping between standards
Support XML and JSON-LD while preserving structure and meaning
Handle DATEX II elements with no direct SDM equivalent, via extensions

Possible approach

Define a canonical mobility data model as the master representation
Build export transformers for DATEX II XML and SDM compliant JSON-LD
Maintain a semantic mapping table linking DATEX II elements to SDM attributes
Identify gaps in SDM and extend entities when necessary
Validate outputs using DATEX II schemas and SDM JSON-LD validators

DATEX II XML JSON-LD Smart Data Models

Resources

CSV

TomTom Traffic Data, Porto

Live and historical traffic indicators, 2 598 rows, 245 KB

Download

05 Mobility, Multi provider

Shared scooters, many providers, one data model

Enable unified access to mobility data from multiple electric scooter providers through quality rules, validation workflows and provenance aware metadata.

Context

The mobility department monitors the use of electric scooters across the city, including real time availability and illegal parking. Multiple providers push operational data to a monitoring platform, but incoming datasets vary significantly in structure, quality and semantics.

Problem

Providers send data in different formats (CSV, JSON, APIs), use different attribute names and follow different rules for timestamps, location accuracy and event reporting.

Hard to aggregate scooter data consistently
Errors and inconsistencies reduce trust in analytics and monitoring
Lack of provenance and quality indicators limits regulatory enforcement

Objective

Provide unified access to scooter mobility data with clear quality, provenance and schema consistency indicators, supporting city operations, enforcement and analytics.

Concrete example

Provider A: lat = 52.123, lon = 21.004 status = "available" battery = 75% Provider B: lat = 52.12, lon = 21.00 state = "active" battery = 0.75 After harmonization latitude = 52.123 longitude = 21.004 status = "available" (from "active") battery_percentage = 75

Technical challenge

Data quality varies greatly between providers
Events and statuses are reported differently
Validation must handle both structural (schema) and semantic (meaning) issues

Possible approach

Define conformance rules and quality metrics per provider: mandatory fields, coordinate accuracy, timestamp format
Implement validation pipelines checking structural and semantic integrity
Tag datasets with provenance metadata: provider, ingestion time, pipeline version
Assign a quality score based on completeness, freshness and accuracy

CSV JSON API Provenance

Five open use cases,one interoperable city.

What team LESL shipped.

What team GEX shipped.

Working sessions on harmonized city data.

What you will produce

Five tracks, five datasets, one interoperability goal.

Characterization of trees across city departments

Context

Problem

Objective

Concrete example

Technical challenge

Possible approach

Resources

Points of Interest, one taxonomy across tourism and mobility

Context

Problem

Objective

Concrete example

Technical challenge

Possible approach

Resources

Modeling observed energy consumption

Context

Problem

Objective

Concrete example

Technical challenge

Possible approach

Mobility data, published in DATEX II and SDM

Context

Problem

Objective

Concrete example

Technical challenge

Possible approach

Resources

Shared scooters, many providers, one data model

Context

Problem

Objective

Concrete example

Technical challenge

Possible approach

Sample data and references.

Five open use cases,
one interoperable city.