UC2, Points of Interest, Casas de Fado of Porto, team LESL, MIMathon Porto 2026

Step 1, dataset audit

What the data actually contains.

Two CitySDK CSV exports from Porto Digital: 4 Casas de Fado in the historic centre, and 54 Postos de Abastecimento (petrol stations) across the city. Same schema, two very different categories, perfect to validate the canonical model on a second dataset.

POIs total

Python dict literals as CSV cell values.

Multilingual fields (category, label, description, others) are exported as single-quoted Python list-of-dict strings, not JSON. Parsed via ast.literal_eval in the adapter.

Format quirk

vCard 2.1 inside the address column.

The address column is a flattened vCard with newlines collapsed to spaces. Street, locality, postal code, country are positional in ADR;WORK. Phone, website, email come from TEL, URL, EMAIL. The adapter rebuilds field boundaries by recognising the vCard keyword set.

Observation

Multilingual literals look natural in JSON-LD.

Each name and description maps cleanly to a JSON-LD language-tagged value ({"@language": "pt", "@value": "..."}). The canonical model holds them as a list of LocalizedText, which serializes to RDF correctly out of the box.

Observation

Category coupling is straightforward.

The Casas de Fado set has 1 category, mapped to schema:Restaurant + schema:MusicVenue + Wikidata Q3338148. The petrol stations set adds one more, schema:GasStation + Wikidata Q205495. Each new category is one entry in category_map.json.

Second dataset onboarded, zero adapter change

The petrol stations CSV uses the same CitySDK schema as the Casas de Fado set. We added three lines to category_map.json (one per language label for "Postos de Abastecimento" / "Petrol station" / "Gasolineras") and re-ran the harmonizer. 54 of 54 records harmonized, no new code, no re-test, no schema migration. This is the canonical-model-plus-adapter approach paying off.

Source attribute	Source example	Canonical attribute	Transformation
id	5cd04b43f979e000013bee37	PointOfInterest.localId	direct
label[term=primary]	{lang: pt-PT, value: "O Fado"}	PointOfInterest.names	ast.literal_eval, filter term=primary, dedupe by lang
description	{lang: pt-PT, value: "Situado num edifício..."}	PointOfInterest.descriptions	ast.literal_eval, dedupe by lang, first per lang
category	{lang: pt-PT, value: "Casas de Fado"}	PointOfInterest.category	ast.literal_eval, lookup pt-PT label in category_map.json
latitude, longitude	41.142313, -8.617495	Location.latitude/longitude	parse float, WGS84
address (vCard ADR;WORK)	;16-16A;Largo de S. João Novo;Porto;Porto;4050-554;Portugal	PostalAddress.*	regex split vCard fields, positional ADR parts
address (vCard TEL/URL/EMAIL)	+351 222026937 / www.ofado.com / info@ofado.com	ContactPoint.*	regex extract, take first email before /
others[type=x-citysdk/capacity]	70	PointOfInterest.capacity	parse int
others[type=x-citysdk/cost-rating]	3	PointOfInterest.costRating	parse int

Step 6, the data

Before and after.

Same record (O Fado, 5cd04b43...ee37), shown raw from the Porto CitySDK CSV on the left, and harmonized to the canonical model on the right.

Same JSON-LD vs GeoJSON split as UC1: JSON-LD for the data layer (semantic, dereferenceable category IRIs); GeoJSON for map tools with dotted-key flattened properties.

Source · CitySDK CSV rowPorto Digital

{
  "active": "True",
  "base": "https://city-api.wearebitmaker.com/CitySDK/pois",
  "category": "[{'lang': 'pt-PT', 'value': 'Casas de Fado'}, {'lang': 'en-GB', 'value': 'Fado houses'}, {'lang': 'es-ES', 'value': 'Casas de Fado'}]",
  "created": "2019-05-06T14:57:07.255000Z",
  "id": "5cd04b43f979e000013bee37",
  "label": "[{'lang': 'pt-PT', 'term': 'primary', 'value': 'O Fado'}, {'lang': 'en-GB', 'term': 'primary', 'value': 'O Fado'}, {'lang': 'es-ES', 'term': 'primary', 'value': 'O Fado'}]",
  "lang": "pt-PT",
  "address": "BEGIN:VCARD VERSION:2.1 REV:20190226T17:27:45Z N:O Fado;O Fado;;; FN:O Fado O Fado ORG:O Fado ADR;WORK:;16-16A;Largo de …",
  "latitude": "41.14231309679815",
  "longitude": "-8.61749513128281",
  "time": "[{'term': 'open', 'type': 'text/icalendar', 'value': 'BEGIN:VCALENDAR VERSION:2.0 BEGIN:VEVENT SUMMARY: DTSTART:20190101T203000Z DTEND:50190115T010000Z DESCRIPTION: LOCATION: RRULE:FREQ=WEEKLY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR,SA END:VEVENT END:VCALENDAR'}]",
  "updated": "2021-07-14T15:43:36.376000Z",
  "description": "[ … 3 multilingual entries truncated … ]",
  "others": "[ … 17 typed key-value pairs truncated … ]"
}

Canonical · JSON-LD nodehttp://mimathon.askem.eu/uc2/pois/

{
  "@id": "http://mimathon.askem.eu/uc2/pois/5cd04b43f979e000013bee37",
  "@type": "PointOfInterest",
  "localId": "5cd04b43f979e000013bee37",
  "names": [
    {
      "@language": "pt",
      "@value": "O Fado"
    },
    {
      "@language": "en",
      "@value": "O Fado"
    },
    {
      "@language": "es",
      "@value": "O Fado"
    }
  ],
  "descriptions": [
    {
      "@language": "pt",
      "@value": "Situado num edifício centenário o Restaurante Típico o Fado preserva todo o tipicismo inerente a este tipo de casas, ond…"
    },
    {
      "@language": "en",
      "@value": "Typical restaurant serving regional cuisine, and where you can appreciate the traditional \"fado\".…"
    },
    {
      "@language": "es",
      "@value": "Restaurante típico de cocina regional, donde se puede desfrutar del tradicional fado.…"
    }
  ],
  "category": {
    "@type": "Category",
    "sourceLabel": "Casas de Fado",
    "schemaOrgRefs": [
      "https://schema.org/Restaurant",
      "https://schema.org/MusicVenue"
    ],
    "wikidataRef": "https://www.wikidata.org/entity/Q3338148"
  },
  "location": {
    "@type": "Location",
    "latitude": 41.14231309679815,
    "longitude": -8.61749513128281
  },
  "capacity": 70,
  "costRating": 3,
  "address": {
    "@type": "PostalAddress",
    "streetName": "Largo de S. João Novo",
    "streetNumber": "16-16A",
    "locality": "Porto",
    "postalCode": "4050-554",
    "country": "Portugal"
  },
  "contact": {
    "@type": "ContactPoint",
    "telephone": "+351 222026937",
    "email": "info@ofado.com",
    "website": "www.ofado.com"
  }
}

JSON-LD as a graph

One POI as a property graph. Multilingual literals are RDF language-tagged values; the Category node carries both the source label and the canonical schema.org / Wikidata anchors, dereferenceable IRIs.

JSON-LD instance graph for POI O Fado, with nodes PointOfInterest, names+descriptions, Category, Location, PostalAddress, ContactPoint and external links to schema.org and Wikidata

What changed

Multilingual fields lifted from Python dict-literals to RDF language-tagged values ({"@language": "pt", "@value": "..."})
Category bound to schema:Restaurant + schema:MusicVenue + Wikidata Q3338148, source label preserved
vCard ADR parsed into typed PostalAddress with streetName, locality, postalCode, country
vCard TEL/URL/EMAIL split into a typed ContactPoint with telephone, website, email
x-citysdk/capacity and x-citysdk/cost-rating promoted to first-class POI attributes
Stable @id assigned, every fragment carries an @type

Step 7, source code

Read it, run it, fork it.

Full source of the POI harmonizer, hosted alongside this page. Each file is a one-click download as raw .py or .json. The whole package is bundled as a tarball at the top of the Data section.

model.py Canonical PointOfInterest model, mirrors pois.dolfin View raw

"""Canonical POI model, mirrors pois.dolfin v0.1.0."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional


LANGUAGES = {"pt", "en", "es", "fr", "de", "it", "other"}


def normalize_lang(raw: str | None) -> str:
    if not raw:
        return "other"
    raw = raw.lower().split("-")[0]
    return raw if raw in LANGUAGES else "other"


@dataclass(frozen=True)
class LocalizedText:
    lang: str
    value: str


@dataclass(frozen=True)
class Category:
    sourceLabel: str
    schemaOrgRefs: tuple[str, ...]
    wikidataRef: Optional[str] = None


@dataclass(frozen=True)
class PostalAddress:
    streetName: Optional[str] = None
    streetNumber: Optional[str] = None
    locality: Optional[str] = None
    postalCode: Optional[str] = None
    country: Optional[str] = None


@dataclass(frozen=True)
class ContactPoint:
    telephone: Optional[str] = None
    email: Optional[str] = None
    website: Optional[str] = None


@dataclass(frozen=True)
class Location:
    latitude: float
    longitude: float


@dataclass
class PointOfInterest:
    localId: str
    names: list[LocalizedText]
    category: Category
    location: Location
    descriptions: list[LocalizedText] = field(default_factory=list)
    address: Optional[PostalAddress] = None
    contact: Optional[ContactPoint] = None
    capacity: Optional[int] = None
    costRating: Optional[int] = None

transforms.py Reusable text helpers shared across adapters View raw

"""Reusable text transforms shared across adapters.

Adapters compose these helpers rather than reimplementing them. Helpers
are intentionally minimal: they only do generic text work (cleanup,
regex extraction, keyword routing). Anything dataset-specific belongs
in the adapter itself.
"""
from __future__ import annotations
import re
from typing import Optional


def clean_text(value: Optional[str]) -> Optional[str]:
    """Trim, collapse internal whitespace, return None for empty input."""
    if value is None:
        return None
    txt = re.sub(r"\s+", " ", str(value)).strip()
    return txt or None


def extract_count(value: Optional[str], pattern: str = r"\((\d+)") -> Optional[int]:
    """Pull an integer out of free text, e.g. '... (12 exemplares)' -> 12."""
    if value is None:
        return None
    m = re.search(pattern, value)
    return int(m.group(1)) if m else None


def match_keywords(value: Optional[str], keyword_map: dict[str, str]) -> Optional[str]:
    """Return the first enum value whose regex key matches the input.

    keyword_map: {regex_pattern: enum_value}, e.g.
        {r"conjunto\\s+arb[óo]re[op]": "TreeCluster",
         r"isolad": "IsolatedSpecimen"}
    Patterns are evaluated in insertion order, case-insensitive.
    """
    if not value:
        return None
    for pattern, enum_value in keyword_map.items():
        if re.search(pattern, value, re.IGNORECASE):
            return enum_value
    return None


class Registry:
    """Tiny dedupe registry for value-typed entities like Authority.

    Use when source data has many spelling variants of the same entity:
        reg = Registry({"ICNF": Authority(name="...", acronym="ICNF")})
        a = reg.resolve("ICNF (Instituto da Conservação ...)", needle="ICNF")
    The canonical instance is returned, ensuring downstream graphs share
    one node per real-world entity.
    """

    def __init__(self, known: dict | None = None):
        self._known = dict(known or {})

    def resolve(self, raw, needle: str | None = None, default=None):
        if raw is None:
            return default
        text = str(raw)
        if needle is not None and needle in text and needle in self._known:
            return self._known[needle]
        for key, val in self._known.items():
            if key in text:
                return val
        return default

    def get(self, key: str):
        return self._known.get(key)

jsonld.py JSON-LD writer, schema.org context for shared terms View raw

"""JSON-LD writer for the canonical POI model.

Aligns with schema.org for shared terms (name, description, address,
category) so the output is directly consumable by schema.org-aware
tools, while keeping a local namespace for the wrapping shape.
"""
from __future__ import annotations
from dataclasses import asdict
from typing import Iterable

from .model import PointOfInterest


NS = "http://mimathon.askem.eu/uc2/pois#"

CONTEXT = {
    "@vocab": NS,
    "schema": "https://schema.org/",
    "wd": "https://www.wikidata.org/entity/",
    "PointOfInterest": NS + "PointOfInterest",
    "Category": NS + "Category",
    "Location": NS + "Location",
    "PostalAddress": "schema:PostalAddress",
    "ContactPoint": "schema:ContactPoint",
    "LocalizedText": NS + "LocalizedText",
    "names": "schema:name",
    "descriptions": "schema:description",
    "category": "schema:category",
    "location": "schema:location",
    "address": "schema:address",
    "contact": "schema:contactPoint",
    "telephone": "schema:telephone",
    "email": "schema:email",
    "website": "schema:url",
    "streetName": "schema:streetAddress",
    "streetNumber": "schema:streetAddress",
    "locality": "schema:addressLocality",
    "postalCode": "schema:postalCode",
    "country": "schema:addressCountry",
    "schemaOrgRefs": {"@id": NS + "schemaOrgRefs", "@type": "@id"},
    "wikidataRef": {"@id": NS + "wikidataRef", "@type": "@id"},
    "geo": "https://www.w3.org/2003/01/geo/wgs84_pos#",
    "latitude": "geo:lat",
    "longitude": "geo:long",
    "lang": "@language",
    "value": "@value",
    "capacity": "schema:maximumAttendeeCapacity",
}


def _strip_none(d):
    if isinstance(d, dict):
        return {k: _strip_none(v) for k, v in d.items() if v is not None and v != []}
    if isinstance(d, list):
        return [_strip_none(x) for x in d]
    return d


def _localized(items) -> list[dict]:
    return [{"@language": t.lang, "@value": t.value} for t in items]


def poi_to_node(poi: PointOfInterest, base_id: str) -> dict:
    node = {
        "@id": f"{base_id}{poi.localId}",
        "@type": "PointOfInterest",
        "localId": poi.localId,
        "names": _localized(poi.names),
        "descriptions": _localized(poi.descriptions),
        "category": {
            "@type": "Category",
            "sourceLabel": poi.category.sourceLabel,
            "schemaOrgRefs": list(poi.category.schemaOrgRefs),
            "wikidataRef": poi.category.wikidataRef,
        },
        "location": {
            "@type": "Location",
            "latitude": poi.location.latitude,
            "longitude": poi.location.longitude,
        },
        "capacity": poi.capacity,
        "costRating": poi.costRating,
    }
    if poi.address is not None:
        node["address"] = {"@type": "PostalAddress", **asdict(poi.address)}
    if poi.contact is not None:
        node["contact"] = {"@type": "ContactPoint", **asdict(poi.contact)}
    return _strip_none(node)


def build_document(pois: Iterable[PointOfInterest], base_id: str) -> dict:
    return {
        "@context": CONTEXT,
        "@graph": [poi_to_node(p, base_id) for p in pois],
    }

geojson_out.py GeoJSON FeatureCollection writer for GIS tools View raw

"""GeoJSON writer for the canonical POI model.

FeatureCollection in WGS84. Properties are flattened with dotted keys
so non-LD tools (geojson.io, QGIS, Leaflet) can use the data directly.
"""
from __future__ import annotations
from dataclasses import asdict
from typing import Iterable

from .model import PointOfInterest


def _flatten(prefix: str, value, target: dict) -> None:
    if value is None or value == []:
        return
    if isinstance(value, dict):
        for k, v in value.items():
            _flatten(f"{prefix}.{k}" if prefix else k, v, target)
    elif isinstance(value, (list, tuple)):
        if value and isinstance(value[0], dict) and {"lang", "value"} <= set(value[0]):
            for item in value:
                target[f"{prefix}_{item['lang']}"] = item["value"]
        else:
            target[prefix] = list(value)
    else:
        target[prefix] = value


def poi_to_feature(poi: PointOfInterest, base_id: str) -> dict:
    props: dict = {"@id": f"{base_id}{poi.localId}", "@type": "PointOfInterest"}
    props["localId"] = poi.localId
    _flatten("name", [{"lang": t.lang, "value": t.value} for t in poi.names], props)
    _flatten("description", [{"lang": t.lang, "value": t.value} for t in poi.descriptions], props)
    _flatten("category", {
        "sourceLabel": poi.category.sourceLabel,
        "schemaOrgRefs": list(poi.category.schemaOrgRefs),
        "wikidataRef": poi.category.wikidataRef,
    }, props)
    if poi.address is not None:
        _flatten("address", asdict(poi.address), props)
    if poi.contact is not None:
        _flatten("contact", asdict(poi.contact), props)
    if poi.capacity is not None:
        props["capacity"] = poi.capacity
    if poi.costRating is not None:
        props["costRating"] = poi.costRating

    return {
        "type": "Feature",
        "id": poi.localId,
        "geometry": {
            "type": "Point",
            "coordinates": [poi.location.longitude, poi.location.latitude],
        },
        "properties": props,
    }


def build_collection(pois: Iterable[PointOfInterest], base_id: str) -> dict:
    return {
        "type": "FeatureCollection",
        "features": [poi_to_feature(p, base_id) for p in pois],
    }

__main__.py CLI orchestrating adapter, JSON-LD and GeoJSON output View raw

"""CLI entry point for the POI harmonizer.

Example:
    python -m harmonize_pois \
        --adapter porto_pois \
        --input ../uc2-pois-casas-de-fado.csv \
        --output ../out/pois.jsonld \
        --geojson ../out/pois.geojson \
        --base-id http://mimathon.askem.eu/uc2/pois/
"""
from __future__ import annotations
import argparse
import importlib
import json
import sys
from pathlib import Path

from .geojson_out import build_collection
from .jsonld import build_document


def _load_adapter(name: str):
    mod = importlib.import_module(f"harmonize_pois.adapters.{name}")
    if not hasattr(mod, "read"):
        raise SystemExit(f"adapter {name!r} has no read(path) function")
    return mod


def main(argv=None) -> int:
    p = argparse.ArgumentParser(prog="harmonize_pois", description="Harmonize a POI dataset to the canonical PointOfInterest model and emit JSON-LD plus optional GeoJSON.")
    p.add_argument("--adapter", required=True, help="Adapter module name under harmonize_pois.adapters, e.g. porto_pois")
    p.add_argument("--input", required=True, type=Path, help="Source dataset path")
    p.add_argument("--output", required=True, type=Path, help="Destination JSON-LD file")
    p.add_argument("--base-id", default="http://example.org/pois/", help="IRI prefix for POI @id values")
    p.add_argument("--geojson", type=Path, help="Also emit a GeoJSON FeatureCollection")
    args = p.parse_args(argv)

    adapter = _load_adapter(args.adapter)
    print(f"Reading via adapter '{args.adapter}' from {args.input}...")
    pois = list(adapter.read(args.input))
    print(f"  {len(pois)} POIs read")

    print(f"Writing JSON-LD to {args.output}...")
    doc = build_document(pois, base_id=args.base_id)
    args.output.parent.mkdir(parents=True, exist_ok=True)
    args.output.write_text(json.dumps(doc, ensure_ascii=False, indent=2), encoding="utf-8")
    print(f"  done, {len(doc['@graph'])} entities in @graph")

    if args.geojson:
        print(f"Writing GeoJSON to {args.geojson}...")
        fc = build_collection(pois, base_id=args.base_id)
        args.geojson.parent.mkdir(parents=True, exist_ok=True)
        args.geojson.write_text(json.dumps(fc, ensure_ascii=False, indent=2), encoding="utf-8")
        print(f"  done, {len(fc['features'])} features")

    return 0


if __name__ == "__main__":
    sys.exit(main())

category_map.json Source-label to schema.org and Wikidata IRI map View raw

{
  "Casas de Fado": {
    "schemaOrgRefs": [
      "https://schema.org/Restaurant",
      "https://schema.org/MusicVenue"
    ],
    "wikidataRef": "https://www.wikidata.org/entity/Q3338148"
  },
  "Fado houses": {
    "schemaOrgRefs": [
      "https://schema.org/Restaurant",
      "https://schema.org/MusicVenue"
    ],
    "wikidataRef": "https://www.wikidata.org/entity/Q3338148"
  },

  "Museum": {
    "schemaOrgRefs": ["https://schema.org/Museum"],
    "wikidataRef": "https://www.wikidata.org/entity/Q33506"
  },
  "Restaurant": {
    "schemaOrgRefs": ["https://schema.org/Restaurant"],
    "wikidataRef": "https://www.wikidata.org/entity/Q11707"
  },
  "Park": {
    "schemaOrgRefs": ["https://schema.org/Park"],
    "wikidataRef": "https://www.wikidata.org/entity/Q22698"
  },
  "Hotel": {
    "schemaOrgRefs": ["https://schema.org/Hotel"],
    "wikidataRef": "https://www.wikidata.org/entity/Q27686"
  },
  "Beach": {
    "schemaOrgRefs": ["https://schema.org/Beach"],
    "wikidataRef": "https://www.wikidata.org/entity/Q40080"
  }
}

adapters/_template.py Skeleton, copy and rename to add a new dataset View raw

"""Skeleton POI adapter, copy and rename to add a new dataset.

Quick start:
    1. Copy to harmonize_pois/adapters/<your_dataset>.py
    2. Replace the read() body with your own parsing
    3. Run:  python -m harmonize_pois --adapter <your_dataset> --input ...

Contract:
    Expose a single function `read(path) -> Iterator[PointOfInterest]`.

Look at adapters/porto_pois.py for a worked example covering CitySDK
multilingual fields, vCard address parsing, and category lookup.
"""
from __future__ import annotations
from pathlib import Path
from typing import Iterator

from ..model import (
    Category, ContactPoint, Location, LocalizedText, PointOfInterest,
    PostalAddress, normalize_lang,
)
from ..transforms import clean_text


def read(path: str | Path) -> Iterator[PointOfInterest]:
    """Yield canonical PointOfInterest records from `path`.

    Replace the body below with your dataset's parsing logic.
    """
    # Example: a CSV with one POI per row
    # import csv
    # with Path(path).open(encoding="utf-8") as f:
    #     for row in csv.DictReader(f):
    #         yield PointOfInterest(
    #             localId=row["id"],
    #             names=[LocalizedText(lang=normalize_lang("en"), value=row["name"])],
    #             category=Category(
    #                 sourceLabel=row["category"],
    #                 schemaOrgRefs=("https://schema.org/Place",),
    #             ),
    #             location=Location(
    #                 latitude=float(row["lat"]),
    #                 longitude=float(row["lon"]),
    #             ),
    #         )
    raise NotImplementedError("Implement read() for your dataset, see porto_pois.py")

adapters/porto_pois.py Adapter for Porto CitySDK CSV (multilingual, vCard, dict-literals) View raw

"""Adapter for the Porto Open Data Casas de Fado CSV (CitySDK schema).

Source quirks handled here:
    - The CSV uses Python dict-literal strings (single-quoted) inside
      multilingual fields, parsed via ast.literal_eval.
    - Address is a vCard 2.1 blob, ADR;WORK fields are extracted
      positionally per the spec (P.O. box; ext addr; street; locality;
      region; postal code; country).
    - The 'others' field is a list of {type, value} where type is a
      namespaced key like x-citysdk/capacity, x-citysdk/cost-rating, etc.

Maps to PointOfInterest via category_map.json lookup for schema.org IRIs.
"""
from __future__ import annotations
import ast
import csv
import json
import re
from pathlib import Path
from typing import Iterator

from ..model import (
    Category, ContactPoint, Location, LocalizedText, PointOfInterest,
    PostalAddress, normalize_lang,
)
from ..transforms import clean_text


def _load_category_map() -> dict:
    p = Path(__file__).resolve().parent.parent / "category_map.json"
    return json.loads(p.read_text(encoding="utf-8"))


_CATEGORY_MAP = _load_category_map()


def _safe_literal(raw: str | None):
    if not raw:
        return []
    try:
        return ast.literal_eval(raw)
    except (ValueError, SyntaxError):
        return []


def _localized_list(raw: str | None, key_lang: str = "lang", key_val: str = "value", filter_term: str | None = None) -> list[LocalizedText]:
    out: list[LocalizedText] = []
    seen_langs: set[str] = set()
    for entry in _safe_literal(raw):
        if not isinstance(entry, dict):
            continue
        if filter_term and entry.get("term") != filter_term:
            continue
        lang = normalize_lang(entry.get(key_lang))
        val = clean_text(entry.get(key_val))
        if not val or lang in seen_langs:
            continue
        seen_langs.add(lang)
        out.append(LocalizedText(lang=lang, value=val))
    return out


def _resolve_category(raw: str | None) -> Category:
    items = _safe_literal(raw)
    pt_label = None
    for item in items:
        if isinstance(item, dict) and item.get("lang") == "pt-PT":
            pt_label = item.get("value")
            break
    label = pt_label or (items[0].get("value") if items and isinstance(items[0], dict) else "Unknown")
    label = clean_text(label) or "Unknown"
    mapped = _CATEGORY_MAP.get(label, {})
    return Category(
        sourceLabel=label,
        schemaOrgRefs=tuple(mapped.get("schemaOrgRefs", ["https://schema.org/Place"])),
        wikidataRef=mapped.get("wikidataRef"),
    )


_VCARD_KEYS = ("BEGIN", "VERSION", "REV", "N", "FN", "ORG", "ADR", "TEL", "URL", "EMAIL", "END")
_VCARD_FIELD = re.compile(r"(?:^|\s)(" + "|".join(_VCARD_KEYS) + r")(?:;[^:]+)?:(.*?)(?=\s(?:" + "|".join(_VCARD_KEYS) + r")(?:;[^:]+)?:|$)", re.DOTALL)


def _parse_vcard(vcard: str) -> dict[str, str]:
    out: dict[str, str] = {}
    for m in _VCARD_FIELD.finditer(vcard or ""):
        key = m.group(1).upper()
        val = m.group(2).strip()
        if key not in out:
            out[key] = val
    return out


def _parse_vcard_address(vcard: str) -> PostalAddress | None:
    fields = _parse_vcard(vcard)
    raw = fields.get("ADR")
    if not raw:
        return None
    parts = raw.split(";")
    while len(parts) < 7:
        parts.append("")
    _po, _ext, street, locality, _region, postal, country = parts[:7]
    return PostalAddress(
        streetName=clean_text(street),
        streetNumber=clean_text(_ext),
        locality=clean_text(locality),
        postalCode=clean_text(postal),
        country=clean_text(country),
    )


def _parse_vcard_contact(vcard: str) -> ContactPoint | None:
    fields = _parse_vcard(vcard)
    if not fields:
        return None
    email_raw = fields.get("EMAIL", "")
    email = email_raw.split("/")[0].split(",")[0].strip() if email_raw else None
    cp = ContactPoint(
        telephone=clean_text(fields.get("TEL")),
        website=clean_text(fields.get("URL")),
        email=clean_text(email),
    )
    return cp if any([cp.telephone, cp.website, cp.email]) else None


def _extract_others(raw: str | None) -> dict[str, list[str]]:
    out: dict[str, list[str]] = {}
    for item in _safe_literal(raw):
        if isinstance(item, dict):
            t = item.get("type")
            v = item.get("value")
            if t and v is not None:
                out.setdefault(t, []).append(str(v))
    return out


def _safe_int(s) -> int | None:
    if s is None:
        return None
    try:
        return int(s)
    except (ValueError, TypeError):
        return None


def read(csv_path: str | Path) -> Iterator[PointOfInterest]:
    """Yield canonical PointOfInterest records from a Porto CitySDK CSV."""
    with Path(csv_path).open(encoding="utf-8") as f:
        reader = csv.DictReader(f, quotechar="'")
        for row in reader:
            others = _extract_others(row.get("others"))
            try:
                lat = float(row["latitude"])
                lon = float(row["longitude"])
            except (KeyError, ValueError, TypeError):
                continue

            names = _localized_list(row.get("label"), key_lang="lang", key_val="value", filter_term="primary")
            if not names:
                continue

            poi = PointOfInterest(
                localId=row.get("id") or "",
                names=names,
                descriptions=_localized_list(row.get("description")),
                category=_resolve_category(row.get("category")),
                location=Location(latitude=lat, longitude=lon),
                address=_parse_vcard_address(row.get("address") or ""),
                contact=_parse_vcard_contact(row.get("address") or ""),
                capacity=_safe_int(others.get("x-citysdk/capacity", [None])[0]),
                costRating=_safe_int(others.get("x-citysdk/cost-rating", [None])[0]),
            )
            yield poi

Points of Interest,
one taxonomy for them all.

What the data actually contains.

Python dict literals as CSV cell values.

vCard 2.1 inside the address column.

Multilingual literals look natural in JSON-LD.

Category coupling is straightforward.

This time, Smart Data Models has it.

Two-level taxonomy

The Dolfin pivot.

Design choices

From CitySDK CSV to the pivot.

One core, many datasets.

CLI

Add a new dataset

Before and after.

Casas de Fado

Postos de Abastecimento

Code & model

JSON-LD as a graph

What changed

Read it, run it, fork it.

Remaining work.

Lower the bar for new adapters

Onboard a third dataset, with a different schema

Ship as NGSI-LD against SDM PointOfInterest

Resolve categories beyond the lookup table

Parse opening hours

Points of Interest,one taxonomy for them all.

What the data actually contains.

Python dict literals as CSV cell values.

vCard 2.1 inside the address column.

Multilingual literals look natural in JSON-LD.

Category coupling is straightforward.

This time, Smart Data Models has it.

Two-level taxonomy

The Dolfin pivot.

Design choices

From CitySDK CSV to the pivot.

One core, many datasets.

CLI

Add a new dataset

Before and after.

Casas de Fado

Postos de Abastecimento

Code & model

JSON-LD as a graph

What changed

Read it, run it, fork it.

Remaining work.

Lower the bar for new adapters

Onboard a third dataset, with a different schema

Ship as NGSI-LD against SDM PointOfInterest

Resolve categories beyond the lookup table

Parse opening hours

Points of Interest,
one taxonomy for them all.