OASC — Foundational MIMs Joint Working Group

MIM2 — Representing data: mechanisms

A structured inventory of candidate mechanisms for MIM2 v9.0, organised along the data representation pipeline: from data models to validation.

1

Data models— What are the entities?

Provide consistent and machine-understandable definitions of all entities about which data is being captured in a data ecosystem. This includes syntactic and semantic definitions.

Smart Data Models
standardNGSI-LD

Library of harmonised data models for smart cities (FIWARE / TM Forum / IUDX). Covers domains such as mobility, environment, buildings, water, tourism, and more. Models are directly linked to NGSI-LD @context definitions, making them both syntactically and semantically interoperable out of the box.

SAREF (Smart Appliances REFerence)
ontology — ETSI

ETSI ontology for IoT devices and smart buildings. Extensions include SAREF4City, SAREF4Ener, SAREF4Envi, and SAREF4Agri. Serves as an upper ontology to unify domain-specific vocabularies, acting as a bridge when cities use different terminologies for the same concepts.

Schema.org
vocabulary — W3C

General-purpose W3C/Google vocabulary. Widely adopted on the web. Useful as a common base for generic entities (Organisation, Place, Event, etc.) before specialising with SAREF or Smart Data Models for domain-specific use cases.

INSPIRE data models
EU directive

European data models for geographic information. Mandatory for public spatial data in Europe. Cover 34 themes (addresses, cadastre, hydrography, etc.). They are the natural bridge between MIM2 and geospatial data representation.

DCAT-AP
profile — W3C/EU

European application profile of DCAT for data catalogues. Describes dataset metadata (description, licence, update frequency, distribution). De facto standard for open data portals across the EU.

Ontology alignment ✨
proposed — research

Automated or semi-automated detection of correspondences between concepts in different ontologies (e.g. SAREF ↔ Smart Data Models ↔ Schema.org ↔ city-specific models). This is a critical gap: cities adopt different data models, but MIM2 currently offers no mechanism to bridge them. Ontology alignment — including emerging LLM-based approaches — could provide the missing link between Objectives 1 and 3, automatically generating mapping rules from model-to-model correspondences.

Models are serialised into a format
2

Serialisation— In what format?

Support the syntactic formats that make data machine-readable and exchangeable across systems and organisations.

JSON-LD
W3C standard

Recommended pivot format. Combines the simplicity of JSON with the semantic richness of RDF via @context. NGSI-LD uses it natively. Supports compaction/expansion to adapt the level of detail depending on the consumer.

GeoJSON / GeoJSON-LD
IETF RFC 7946

JSON extension for geometries (points, polygons, lines). GeoJSON-LD adds semantic context. Standard format for geospatial APIs (OGC API Features).

RDF / Turtle / N-Triples
W3C standard

Native Semantic Web formats. Turtle for human readability, N-Triples for high-performance streaming. Used by triplestores and SPARQL endpoints.

CSV-W (CSV on the Web)
W3C standard

Metadata layer for CSV files: column types, identifiers, inter-table links. Makes the most commonly used format by cities for open data "semantic-aware" without changing source systems.

CityGML / CityJSON
OGC standard

Model and format for 3D urban data (buildings, terrain, vegetation). CityJSON is the lighter JSON version. LoD 1 to 4 depending on the required level of detail.

Source data must be transformed
3

Transformation— How to convert?

Move from proprietary or heterogeneous formats to interoperable formats that conform to the agreed data models.

RML / YARRRML
W3C community grouptool

Declarative mapping language for transforming CSV, JSON, XML into RDF/JSON-LD. YARRRML is the simplified YAML syntax. Allows defining reusable transformation rules that can be shared across cities.

Ontology alignment → mapping rules
proposed — research

Automated ontology alignment (from Objective 1) can generate transformation rules. The pipeline: detect correspondences between source and target ontologies → produce SSSOM mappings or RML rules → execute transformation. This closes the loop between "knowing which models differ" and "actually converting data". Emerging approaches using LLMs (RAG-based alignment) make this increasingly feasible for complex, cross-domain mappings.

SPARQL CONSTRUCT
W3C standard

SPARQL query that produces a new RDF graph from an existing one. Useful for transforming between ontologies (e.g. SAREF → Smart Data Models) without external ETL.

LDES (Linked Data Event Streams)
W3C community group

Protocol for publishing versioned linked data streams. Enables incremental synchronisation between systems. Developed by IMEC/Flanders — already an MIM1 resource.

JSON-LD framing
W3C standard

Native JSON-LD technique to restructure a graph into a "tree" view adapted to a specific use case. Allows producing different views of the same dataset without duplication or heavy ETL.

FME (Feature Manipulation Engine)
proprietary tool

ETL platform widely used by European cities, especially for geospatial data. Native support for 450+ formats. Not an open standard but a de facto tool in practice.

Transformed data must be validated
4

Validation— Is it conformant?

Automatically verify that produced data conforms to the expected data models and constraints.

SHACL (Shapes Constraint Language)
W3C recommendation

W3C standard for validating RDF graphs against a set of constraints (shapes). Can verify: mandatory types, cardinalities, value ranges, regex patterns, etc. Shapes are themselves in RDF and thus shareable.

JSON Schema
IETF draft standard

Structural validation for JSON/JSON-LD. Natively used by OpenAPI to define API responses. Smart Data Models already publishes JSON Schemas for each model.

SHACL + JSON Schema combined
proposed — combined approach

Hybrid approach: JSON Schema for fast syntactic validation (structure, types) + SHACL for deep semantic validation (link consistency, business constraints). Both are complementary and together provide full-spectrum validation.

standard / spec
tool / framework
ontology / vocabulary
proposed / research