OASC — Foundational MIMs Joint Working Group

MIM2 — Representing data: mechanisms

A structured inventory of candidate mechanisms for MIM2 v9.0, organised along the data representation pipeline: from data models to validation.

Data models— What are the entities?

Provide consistent and machine-understandable definitions of all entities about which data is being captured in a data ecosystem. This includes syntactic and semantic definitions.

Smart Data Models

standardNGSI-LD

▾

Library of harmonised data models for smart cities (FIWARE / TM Forum / IUDX). Covers domains such as mobility, environment, buildings, water, tourism, and more. Models are directly linked to NGSI-LD @context definitions, making them both syntactically and semantically interoperable out of the box.

↗ MIM1: models use persistent URIs as entity identifiers ↗ MIM0: models are served via NGSI-LD API endpoints

SAREF (Smart Appliances REFerence)

ontology — ETSI

▾

ETSI ontology for IoT devices and smart buildings. Extensions include SAREF4City, SAREF4Ener, SAREF4Envi, and SAREF4Agri. Serves as an upper ontology to unify domain-specific vocabularies, acting as a bridge when cities use different terminologies for the same concepts.

↗ MIM1: enables semantic alignment across domains

Schema.org

vocabulary — W3C

▾

General-purpose W3C/Google vocabulary. Widely adopted on the web. Useful as a common base for generic entities (Organisation, Place, Event, etc.) before specialising with SAREF or Smart Data Models for domain-specific use cases.

INSPIRE data models

EU directive

▾

European data models for geographic information. Mandatory for public spatial data in Europe. Cover 34 themes (addresses, cadastre, hydrography, etc.). They are the natural bridge between MIM2 and geospatial data representation.

↗ MIM7: INSPIRE models are the natural bridge to geospatial

DCAT-AP

profile — W3C/EU

▾

European application profile of DCAT for data catalogues. Describes dataset metadata (description, licence, update frequency, distribution). De facto standard for open data portals across the EU.

↗ MIM0: DCAT-AP metadata exposed via catalogue APIs (CKAN)

Ontology alignment ✨

proposed — research

▾

Automated or semi-automated detection of correspondences between concepts in different ontologies (e.g. SAREF ↔ Smart Data Models ↔ Schema.org ↔ city-specific models). This is a critical gap: cities adopt different data models, but MIM2 currently offers no mechanism to bridge them. Ontology alignment — including emerging LLM-based approaches — could provide the missing link between Objectives 1 and 3, automatically generating mapping rules from model-to-model correspondences.

↗ MIM1: alignment IS semantic interlinking at the schema level ↗ Obj 3: alignment outputs feed directly into transformation rules

Models are serialised into a format

Serialisation— In what format?

Support the syntactic formats that make data machine-readable and exchangeable across systems and organisations.

JSON-LD

W3C standard

▾

Recommended pivot format. Combines the simplicity of JSON with the semantic richness of RDF via @context. NGSI-LD uses it natively. Supports compaction/expansion to adapt the level of detail depending on the consumer.

↗ MIM1: @context is the direct bridge to semantic interlinking

GeoJSON / GeoJSON-LD

IETF RFC 7946

▾

JSON extension for geometries (points, polygons, lines). GeoJSON-LD adds semantic context. Standard format for geospatial APIs (OGC API Features).

↗ MIM7: native format for geospatial exchanges

RDF / Turtle / N-Triples

W3C standard

▾

Native Semantic Web formats. Turtle for human readability, N-Triples for high-performance streaming. Used by triplestores and SPARQL endpoints.

↗ MIM1: native language of linked data

CSV-W (CSV on the Web)

W3C standard

▾

Metadata layer for CSV files: column types, identifiers, inter-table links. Makes the most commonly used format by cities for open data "semantic-aware" without changing source systems.

↗ MIM0: enriches existing tabular data without infrastructure changes

CityGML / CityJSON

OGC standard

▾

Model and format for 3D urban data (buildings, terrain, vegetation). CityJSON is the lighter JSON version. LoD 1 to 4 depending on the required level of detail.

↗ MIM7 / MIM8: key format for urban digital twins

Source data must be transformed

Transformation— How to convert?

Move from proprietary or heterogeneous formats to interoperable formats that conform to the agreed data models.

RML / YARRRML

W3C community grouptool

▾

Declarative mapping language for transforming CSV, JSON, XML into RDF/JSON-LD. YARRRML is the simplified YAML syntax. Allows defining reusable transformation rules that can be shared across cities.

★ Strong candidate: standardises data model transformation

Ontology alignment → mapping rules

proposed — research

▾

Automated ontology alignment (from Objective 1) can generate transformation rules. The pipeline: detect correspondences between source and target ontologies → produce SSSOM mappings or RML rules → execute transformation. This closes the loop between "knowing which models differ" and "actually converting data". Emerging approaches using LLMs (RAG-based alignment) make this increasingly feasible for complex, cross-domain mappings.

★ Bridges the gap: turns alignment output into executable transforms ↗ Obj 1: consumes alignment results as input

SPARQL CONSTRUCT

W3C standard

▾

SPARQL query that produces a new RDF graph from an existing one. Useful for transforming between ontologies (e.g. SAREF → Smart Data Models) without external ETL.

↗ MIM1: transformation operates on semantic links themselves

LDES (Linked Data Event Streams)

W3C community group

▾

Protocol for publishing versioned linked data streams. Enables incremental synchronisation between systems. Developed by IMEC/Flanders — already an MIM1 resource.

↗ MIM0 / MIM1: combines transport + semantics + structure

JSON-LD framing

W3C standard

▾

Native JSON-LD technique to restructure a graph into a "tree" view adapted to a specific use case. Allows producing different views of the same dataset without duplication or heavy ETL.

FME (Feature Manipulation Engine)

proprietary tool

▾

ETL platform widely used by European cities, especially for geospatial data. Native support for 450+ formats. Not an open standard but a de facto tool in practice.

↗ MIM7: reference tool for spatial data transformation

Transformed data must be validated

Validation— Is it conformant?

Automatically verify that produced data conforms to the expected data models and constraints.

SHACL (Shapes Constraint Language)

W3C recommendation

▾

W3C standard for validating RDF graphs against a set of constraints (shapes). Can verify: mandatory types, cardinalities, value ranges, regex patterns, etc. Shapes are themselves in RDF and thus shareable.

★ Strong candidate: makes MIM2 conformance rules executable

JSON Schema

IETF draft standard

▾

Structural validation for JSON/JSON-LD. Natively used by OpenAPI to define API responses. Smart Data Models already publishes JSON Schemas for each model.

↗ MIM0: validation embedded in the OpenAPI endpoint spec

SHACL + JSON Schema combined

proposed — combined approach

▾

Hybrid approach: JSON Schema for fast syntactic validation (structure, types) + SHACL for deep semantic validation (link consistency, business constraints). Both are complementary and together provide full-spectrum validation.

★ Proposed integrated mechanism for MIM2 v9.0

standard / spec

tool / framework

ontology / vocabulary

proposed / research