---
theme: slate
transition: fade
showIndex: true
align: left
---

# Classified Trees of Porto

### MIMathon Porto 2026 · Use case 01

team **LESL** · Dolfin & Askem

---

## The problem in one sentence

The same tree, described differently by every department.

- *Tree species* vs *tree name*
- Different attribute labels
- Different units, different granularities
- Some attributes only in one dataset

→ Open Data publication breaks.

---

## The dataset

**238 classified trees** of Porto. GeoJSON, WGS84.

Each tree has 8 attributes:

- scientific name (e.g. *Magnolia grandiflora*)
- common name in Portuguese (e.g. *Magnólia*)
- age range (e.g. *81-100*)
- legal authority (ICNF)
- classification type (cluster vs isolated specimen)
- legal reference (e.g. *D.R. nº 6 II Série de 10/01/2005*)
- classification date
- coordinates

---

## What we are looking for

One **canonical model** every department can map to.

One **mapping** from each local schema to the canonical.

One **transformation** that produces interoperable Open Data.

---

## Standards check: Smart Data Models

We searched the SDM registry. **No Tree entity exists.**

Closest is `dataModel.ParksAndGardens`:

- `Garden` — a garden as a space
- `FlowerBed` — a planted bed (with a `taxon` list)
- `GreenspaceRecord` — sensor observations

None describes an individual tree with species, age, classification.

→ Real gap. We define the model and propose it back to SDM.

---

## Standards check: Plant Ontology

OBO Foundry's Plant Ontology is about plant **anatomy**, **morphology**, **growth stages** for **genomics annotation**.

Out of scope for a municipal heritage register.

The right external reference for species names is **GBIF**, not PO.

---

## Our approach

**Dolfin** as the canonical pivot.

- Backend-independent: same file targets graph DB, RDB, document store
- Human-readable: domain experts can read it
- Defines entities, properties, constraints, relations

One file: `trees.dolfin`

From there: JSON-LD, GeoJSON, future SDM JSON-LD, all derived.

---

## The canonical model

```
concept Tree:
  has localId: one string
  has species: one Species
  has ageRange: optional AgeRange
  has location: one Location
  has classification: optional Classification
  has refFlowerBed: optional string
  has refGarden: optional string
```

`Classification` is optional → the model is **reusable beyond the heritage register**.

`refFlowerBed` / `refGarden` → ties to existing SDM entities.

---

## The model as a graph

![graph](sl-uc1-graph.svg)

---

## Cross-reference: GBIF

Each `Species` carries a `taxonRef` URL to the **GBIF Backbone Taxonomy**.

Example:

`Magnolia grandiflora` → `gbif.org/species/9605163`

The harmonizer calls the GBIF `species/match` API, caches results.

**Result: 237 of 238 trees** carry a resolvable GBIF URL.

1 unmatched: *Phoenix sp.*, a genus-only label (correct rejection).

---

## Before and after

**Source GeoJSON** (Porto Open Data):

```json
{
  "objectid": 1726,
  "especie": "Magnolia grandiflora",
  "classif_tutela": "ICNF (Instituto da Conservação ...)",
  "classif_tipo": "Conjunto arbóreo (12 exemplares)"
}
```

**Canonical JSON-LD**:

```json
"category": { "kind": "TreeCluster", "specimenCount": 12,
  "authority": { "name": "...", "acronym": "ICNF" } }
"species": { "scientificName": "Magnolia grandiflora",
  "taxonRef": "https://www.gbif.org/species/9605163" }
```

---

## What changed

- `Authority` is now a typed entity, not a free string
- `"Conjunto arbóreo (12 exemplares)"` split into `kind` + `specimenCount`
- `"81-100"` mapped to a controlled enum
- Species linked to GBIF, resolvable IRI
- Coordinates in a typed `Location`
- Stable `@id`, every fragment carries an `@type`

---

## Across the full dataset

| Layer | Before | After |
|---|---|---|
| Tree records | 238 | 238 |
| Distinct species | 11 strings | **11 Species nodes** + GBIF refs |
| Distinct authorities | 3 spellings of ICNF | **1 Authority node** |
| Distinct legal acts | 10 strings | **10 nodes** (4 are cosmetic variants of the same 2005 act, room for one more pass) |

---

## One core, many datasets

```
harmonize/
├── model.py          ← canonical types, never changes
├── transforms.py     ← shared helpers (clean_text, ...)
├── gbif.py           ← GBIF resolver
├── jsonld.py         ← JSON-LD writer
├── geojson_out.py    ← GeoJSON writer
├── __main__.py       ← CLI
└── adapters/
    ├── _template.py  ← skeleton
    └── porto.py      ← Porto adapter
```

A new dataset = one new adapter file. Nothing else.

---

## CLI

```bash
python -m harmonize \
    --adapter porto \
    --input uc1-trees-porto.geojson \
    --output out/trees.jsonld \
    --geojson out/trees-canonical.geojson \
    --base-id "http://mimathon.askem.eu/uc1/trees/"
```

JSON-LD for data, GeoJSON for maps. Same canonical content, two views.

---

## Find it all

Page: **askem.eu/mimathon/sl-uc1.html**

- canonical model (`trees.dolfin`)
- the data (source + canonical, JSON-LD + GeoJSON)
- the source code (model, GBIF resolver, writers, adapter)
- a single-tarball download

---

## What's next

1. Lower the bar for new adapters: YAML config, scaffolder, library of shared parsers
2. Draft a Smart Data Model "Tree" proposal for `dataModel.ParksAndGardens`
3. Onboard a second city's tree inventory to harden the model
4. Wire `refFlowerBed` / `refGarden` to live SDM instances

---

# Thank you

**team LESL** · Dolfin & Askem

askem.eu/mimathon
