Add Data to the Ocean Data Platform (ODP)
This guide explains how to add data to the Ocean Data Platform (ODP). It covers how to create a dataset, upload files and/or ingest tabular data, enrich the dataset with metadata, manage access, and (optionally) publish the dataset to the public catalogue.
What is a Dataset in ODP?
A dataset is the primary container in ODP. It can contain:
- Any number of files of any file type
- One (or no) table, derived from structured files or ingested directly via code
- Rich metadata describing provenance, licensing, spatial/temporal coverage, and attribution
- Access control and publication status
Creating a Dataset
Create a dataset via the web UI:
- Go to: https://app.hubocean.earth/my_data
- Click Add New Dataset
- Provide a title and description
- Save
You can refine all metadata and add data at any time after creation.
Collections
A data collection can be created in almost exactly the same way. This acts as a folder to group together several datasets that belong together. You can either:
- Create a collection first, then create datasets within it
- Create a collection and then add it to an existing dataset from your My Data page by copying the collection UUID
Adding Rich Metadata
Rich metadata improves discoverability, usability, and correct attribution. From the dataset UI, you can add or edit:
- Title, description, and tags
- Geographical coverage (drawn interactively on the globe)
- Temporal coverage (time range represented by the data)
- Provider
The organisation or individual that owns, authored, or should be credited for the data - License
The terms under which the data may be used - Citation
A citation statement and (if applicable) DOI
Tip
We can create DOIs via our DataCite.org membership for data that is published for the first time via the ODP.
Adding Data to Your Dataset
ODP supports multiple ways to attach data to a dataset. The available features depend on how structured your data is.
Data types and capabilities
| Data type | Example formats | How to add | Capabilities |
|---|---|---|---|
| Files only | Any: NetCDF, ZIP, TIFF, Zarr, Excel, proprietary formats | UI or code | Storage, download, metadata |
| Files with preview | PDF, images, text files | UI or code | Storage, preview within dataset page, download, metadata |
| Structured files | CSV, GeoJSON, Parquet | UI or code | Can be ingested into a table |
| Table | Ingested tabular data | Via structured files or direct code ingest | Querying, APIs, interactive maps, column statistics |
Note
A table is created either by ingesting a supported structured file or by ingesting tabular data programmatically. The tabular functionality unlocks more advanced capabilities of the ODP, so we encourage using it where possible.
Uploading Files via the Web UI
- Click Select Files or drag and drop files into the dataset
- Click Upload All (or upload files individually)
- For large files, uploading one at a time is recommended
If a file’s MIME type is not detected automatically (e.g. rare or proprietary formats), you may need to specify it manually.
Ingesting Structured Files into a Table
For supported structured formats (.csv, .geojson, .parquet),
the UI displays an “ingest file to table” icon next to the file.
To ingest:
- Upload the structured file
- Click the ingest file to table icon
- Confirm ingestion
This converts the file into ODP’s columnar tabular storage and enables:
- Fast querying
- Map visualisation
- API access
Editing the Table Schema
After ingestion, use Edit table to refine how the data behaves:
- Rename columns
- Add human-readable column descriptions (tooltips)
- Classify geometry columns (WKB/WKT, WGS84) or latitude/longitude pairs
- Choose columns to index (partition) for improved query performance
Tip
Editing the schema alters the way the data is structured. Adding the correct class (e.g. Geometry or Latitude/Longitude) unlocks some of ODP’s features.
For large datasets, restructuring may take some time — even if you close the browser, it will continue until finished.
Adding Data Programmatically
For programmatic workflows, large datasets, or automation, use the Python SDK:
Programmatic uploads also allow:
- Direct ingestion without intermediate files
- More precise schema control
- Adding metadata at upload time (UI support is limited and evolving)
Geospatial Data
Tabular data on ODP can handle geospatial data inputs. Defining this correctly allows the platform to unlock powerful features such as map visualisation, spatial queries, OGC Features, and (soon) vector tile endpoints.
| Format | Geometry handling |
|---|---|
| GeoJSON | Geometry is detected automatically |
| CSV | Geometry must be provided as WKT or latitude/longitude float columns |
| Parquet | Geometry may be WKT, WKB, or latitude/longitude float columns |
| Direct table ingest | Geometry may be WKT, WKB, or latitude/longitude float columns |
Automatic Table Profiling
When a dataset contains a table, ODP automatically derives column-level statistics, including:
- Data types
- Value ranges
- Spatial and temporal extents
These statistics power search, filtering, and visualisation features. You can view them on the main dataset page (or from My Data, click Preview).
Note
Automatic profiling applies to tables, not to files alone.
Sharing Data via Access Control
You can control who can view or modify your dataset.
From the dataset UI, open Manage Access and assign roles:
- Admin – full control to add permissions, publish, and edit
- Editor – can modify data and metadata
- Viewer – read-only access
Permissions can be granted to:
- Individual already registered ODP users (via email address)
- Groups (coming soon)
Public access
A dataset can be public without being published. If you enable Public Sharing: Share via link, anyone with the link will be able to access the dataset, but it will not be findable in the catalogue. Publication controls catalogue visibility, not access mechanics.
Publishing Your Dataset
Publishing makes a dataset discoverable in the public ODP catalogue. All functionality (files, tables, APIs) works the same before and after publication.
Warning
Once published, only Admins of the dataset can make edits (Editors no longer have edit rights).
What to Check Before Publishing
Before submitting a dataset for publication, ensure that:
- You have the right to publish the data
- The provider (owner/author) is clearly specified
- The license accurately reflects usage rights
- Any required citation is included
Rich metadata (spatial/temporal coverage, descriptions, tags) is strongly encouraged, but the most critical requirements are licensing and provenance.
Publishing Process
- Review the dataset metadata
- Confirm ownership, licensing, and citation
- Submit the dataset for review
- The ODP team will review and contact you if changes are required
- Once approved, the dataset is published to the public catalogue
Licensing Guidance
Choosing the correct license is essential.
- If the data has been published previously, the original license must be reproduced unless explicitly agreed otherwise with the provider
- If the data is derived from one or more sources, the resulting license must be compatible with the original licenses
- If the data is entirely your own, you may choose an appropriate license
If it is your choice, we recommend choosing a widely adopted open license such as Creative Commons CC BY 4.0 (only attribution required) or the least restrictive CC 0 1.0 Universal whenever possible. Alternatively you can browse other Creative Commons license options.
Warning
Please do not publish data unless you are confident you have the right to do so under the stated license.
Where possible, it is both polite and safest to check with the original author or provider. This is also our best practice when publishing data through the ODP.