Input: two ontologies (source + target). Output: a set of candidate correspondences with confidence scores. This is where most of the intelligence lives.
Lexical / string matching
rule-based
Compare concept labels using string similarity (Levenshtein, Jaccard, n-grams), synonym lookup (WordNet), and multilingual dictionaries. Fast but brittle — misses semantically equivalent concepts with different names.
Ex: "Temperature" ↔ "Temperatura" → match via translation table
Structural / graph matching
rule-based
Compare the topology of the ontology graphs: hierarchy position, shared sub-classes, domain/range of properties, cardinality constraints. Two concepts with similar "neighbourhoods" are likely equivalent.
Ex: both have sub-classes {Indoor, Outdoor} and property "hasUnit" → structural match
Logic-based reasoning
symbolic AI
Use Description Logic reasoners (HermiT, Pellet) to infer correspondences from OWL axioms. Can detect subsumption (A ⊑ B), equivalence (A ≡ B), and disjointness. Precise but requires well-formalised ontologies.
Ex: OWL axioms imply Sensor ⊑ Device → subsumption detected
Embedding-based (BERT, Sentence-BERT)
machine learning
Encode concept labels + descriptions into vector embeddings, then compute cosine similarity. Captures semantic proximity even when labels are very different. Not generative — it's a retrieval/similarity approach.
Ex: "AirQualityIndex" ↔ "PollutionLevel" → cosine sim 0.87
LLM-based alignment (RAG + prompting)
generative AI
Feed concept pairs (with their definitions, axioms, module context) to an LLM and ask it to judge the relationship. RAG retrieves the most relevant candidates first, then the LLM reasons about complex correspondences. State of the art for complex (1-to-N) alignments that previously required human experts.
Ex: "Does saref:Measurement subsume or equal schema:QuantitativeValue?" → LLM reasons with context
Hybrid / ensemble systems
combined
Most competitive systems (LogMap, AML, OntoAligner) combine multiple matchers: lexical first (fast filtering), then structural, then ML-based refinement. Outputs are aggregated with weighted voting or learned fusion.
Ex: LogMap uses lexical + structural + logic repair; OntoAligner chains retrieval + LLM