# DataLab-OpenOP — Federated DataOps for ETSI OpenOP
**Status:** Concept / Design
**Date:** 2026-05-26
**Origin:** 6G-DALI project (SNS-JU)
**Scope:** A reusable OpenOP capability for federated DataOps across operator nodes
---
## 1. Vision
Each telecom operator running an ETSI OpenOP instance hosts a **local mini data catalogue and data lake**. Datasets generated by that operator's 5G/6G testbeds are registered locally. A central **"datalab-openop"** catalogue federates all operator catalogues, giving users a unified view of all available data assets across the network.
Each operator node is fully self-contained: users discover datasets and services, compose pipelines, and monitor execution entirely through **the node's own DataOps UI**. The node's local catalogue registers everything produced on that node.
The central **"datalab-openop"** piveau-hub is a pure federation layer — machine-to-machine only. It harvests metadata from all operator nodes and exposes it to external systems (other data spaces, GAIA-X, SLICES-RI, auditors, consortium reporting). It is not user-facing for DataOps workflows. There is no central orchestrator. Raw data never leaves the operator's domain.
No user DataOps workflow touches the central catalogue.
```
### Core Principles
> **Each node is fully autonomous. The central catalogue is federation infrastructure, not a user surface.**
-**Users** interact exclusively with their operator node's DataOps UI — for discovery, pipeline composition, execution, and monitoring
-**Datasets** stay in the operator's local lake at all times
-**Dataset metadata** (DCAT-AP RDF) travels from local catalogues to the central catalogue via harvesting
-**Data service metadata** (`dcat:DataService`) is published by each node and harvested centrally — the central catalogue records what services exist, but plays no role in running them
-**Derived dataset metadata** is registered in the local catalogue after pipeline execution, then harvested centrally
- The central catalogue is **machine-to-machine only**: GAIA-X federation, SLICES-RI cross-registration, external data space interoperability, consortium reporting
---
## 3. Components per OpenOP Node
Each operator node is made up of four logical blocks:
### Local Data Space
The metadata catalogue for the node. Registers and exposes all datasets, derived datasets, and data services produced on this node. Feeds the central federation layer via DCAT-AP harvesting.
| Component | Role |
|---|---|
| **piveau-hub** (node instance) | DCAT-AP catalogue; stores dataset and service metadata; exposes REST + SPARQL; publishes to central via harvest |
### Local Data Lake
The storage layer. All raw and derived data lives here and never leaves the node.
| Component | Role |
|---|---|
| **Object store** (MinIO / S3-compatible) | Stores raw testbed datasets and derived pipeline outputs |
### Local DataOps
The execution and user-facing layer. Users discover assets, compose pipelines, trigger execution, and monitor jobs entirely here.
| Component | Role |
|---|---|
| **DataOps Orchestrator** (FastAPI) | REST API; bridges the UI and Airflow; manages datasets, services, DAG creation, and pipeline triggers |
| **DataOps UI** (React) | User interface for dataset discovery, service browsing, pipeline composition, and job monitoring |
| **Apache Airflow** | Executes DataOps pipelines locally against the local data lake |
| **Local task library** | DataOps service implementations available as Airflow `PythonOperator` tasks; authored locally or adopted from the consortium service library |
### Local Data Connector
The policy enforcement and data transfer layer. Sits in front of the local data lake and governs all data access — both inbound (other nodes requesting data from this node) and outbound (this node's pipelines accessing data from another node in a cross-operator scenario).
| Component | Role |
|---|---|
| **Eclipse Dataspace Connector (EDC)** | Implements the IDSA Dataspace Protocol (DSP); negotiates data contracts based on ODRL policies; proxies data transfers so raw lake credentials are never shared; produces a transfer audit trail |
#### Why the connector matters
Without a data connector the data lake is a raw S3 bucket and the GAIA-X / ODRL policies in the metadata are descriptive only — nothing enforces them at transfer time. The EDC is what makes the data space real:
-`dcat:accessURL` in each dataset distribution points to the **connector endpoint**, not the raw lake URL
- Any cross-node data access goes through an EDC-to-EDC contract negotiation before a byte is transferred
- Usage policies (`gax:policy`, ODRL) attached to datasets in piveau-hub are evaluated by the connector at negotiation time
- Every transfer is logged and can be registered back in piveau-hub as a `prov:Activity` record for provenance and audit
---
## 4. Central "datalab-openop" Components
| Component | Role |
|---|---|
| **Central piveau-hub** | Harvests DCAT-AP metadata (datasets + services) from all operator nodes; exposes it to external systems and data spaces |
| **Node index** | Registry of all participating OpenOP nodes and their catalogue URLs; used by the harvester to know where to pull from |
The central layer has **no user-facing DataOps role** and **no orchestrator**. It exists for federation, interoperability, and reporting only.