A structured inventory of candidate mechanisms for MIM2 v9.0, organised along the data representation pipeline: from data models to validation.
Provide consistent and machine-understandable definitions of all entities about which data is being captured in a data ecosystem. This includes syntactic and semantic definitions.
Library of harmonised data models for smart cities (FIWARE / TM Forum / IUDX). Covers domains such as mobility, environment, buildings, water, tourism, and more. Models are directly linked to NGSI-LD @context definitions, making them both syntactically and semantically interoperable out of the box.
ETSI ontology for IoT devices and smart buildings. Extensions include SAREF4City, SAREF4Ener, SAREF4Envi, and SAREF4Agri. Serves as an upper ontology to unify domain-specific vocabularies, acting as a bridge when cities use different terminologies for the same concepts.
General-purpose W3C/Google vocabulary. Widely adopted on the web. Useful as a common base for generic entities (Organisation, Place, Event, etc.) before specialising with SAREF or Smart Data Models for domain-specific use cases.
European data models for geographic information. Mandatory for public spatial data in Europe. Cover 34 themes (addresses, cadastre, hydrography, etc.). They are the natural bridge between MIM2 and geospatial data representation.
European application profile of DCAT for data catalogues. Describes dataset metadata (description, licence, update frequency, distribution). De facto standard for open data portals across the EU.
Automated or semi-automated detection of correspondences between concepts in different ontologies (e.g. SAREF ↔ Smart Data Models ↔ Schema.org ↔ city-specific models). This is a critical gap: cities adopt different data models, but MIM2 currently offers no mechanism to bridge them. Ontology alignment — including emerging LLM-based approaches — could provide the missing link between Objectives 1 and 3, automatically generating mapping rules from model-to-model correspondences.
Support the syntactic formats that make data machine-readable and exchangeable across systems and organisations.
Recommended pivot format. Combines the simplicity of JSON with the semantic richness of RDF via @context. NGSI-LD uses it natively. Supports compaction/expansion to adapt the level of detail depending on the consumer.
JSON extension for geometries (points, polygons, lines). GeoJSON-LD adds semantic context. Standard format for geospatial APIs (OGC API Features).
Native Semantic Web formats. Turtle for human readability, N-Triples for high-performance streaming. Used by triplestores and SPARQL endpoints.
Metadata layer for CSV files: column types, identifiers, inter-table links. Makes the most commonly used format by cities for open data "semantic-aware" without changing source systems.
Model and format for 3D urban data (buildings, terrain, vegetation). CityJSON is the lighter JSON version. LoD 1 to 4 depending on the required level of detail.
Move from proprietary or heterogeneous formats to interoperable formats that conform to the agreed data models.
Declarative mapping language for transforming CSV, JSON, XML into RDF/JSON-LD. YARRRML is the simplified YAML syntax. Allows defining reusable transformation rules that can be shared across cities.
Automated ontology alignment (from Objective 1) can generate transformation rules. The pipeline: detect correspondences between source and target ontologies → produce SSSOM mappings or RML rules → execute transformation. This closes the loop between "knowing which models differ" and "actually converting data". Emerging approaches using LLMs (RAG-based alignment) make this increasingly feasible for complex, cross-domain mappings.
SPARQL query that produces a new RDF graph from an existing one. Useful for transforming between ontologies (e.g. SAREF → Smart Data Models) without external ETL.
Protocol for publishing versioned linked data streams. Enables incremental synchronisation between systems. Developed by IMEC/Flanders — already an MIM1 resource.
Native JSON-LD technique to restructure a graph into a "tree" view adapted to a specific use case. Allows producing different views of the same dataset without duplication or heavy ETL.
ETL platform widely used by European cities, especially for geospatial data. Native support for 450+ formats. Not an open standard but a de facto tool in practice.
Automatically verify that produced data conforms to the expected data models and constraints.
W3C standard for validating RDF graphs against a set of constraints (shapes). Can verify: mandatory types, cardinalities, value ranges, regex patterns, etc. Shapes are themselves in RDF and thus shareable.
Structural validation for JSON/JSON-LD. Natively used by OpenAPI to define API responses. Smart Data Models already publishes JSON Schemas for each model.
Hybrid approach: JSON Schema for fast syntactic validation (structure, types) + SHACL for deep semantic validation (link consistency, business constraints). Both are complementary and together provide full-spectrum validation.