Skip to content

Add Data to the Ocean Data Platform (ODP)

This guide explains how to add data to the Ocean Data Platform (ODP). It covers how to create a dataset, upload files and/or ingest tabular data, enrich the dataset with metadata, manage access, and (optionally) publish the dataset to the public catalogue.


What is a Dataset in ODP?

A dataset is the primary container in ODP. It can contain:

  • Any number of files of any file type
  • One (or no) table, derived from structured files or ingested directly via code
  • Rich metadata describing provenance, licensing, spatial/temporal coverage, and attribution
  • Access control and publication status

Creating a Dataset

Create a dataset via the web UI:

  1. Go to: https://app.hubocean.earth/my_data
  2. Click Add New Dataset
  3. Provide a title and description
  4. Save

You can refine all metadata and add data at any time after creation.

Collections

A data collection can be created in almost exactly the same way. This acts as a folder to group together several datasets that belong together. You can either:

  • Create a collection first, then create datasets within it
  • Create a collection and then add it to an existing dataset from your My Data page by copying the collection UUID

Adding Rich Metadata

Rich metadata improves discoverability, usability, and correct attribution. From the dataset UI, you can add or edit:

  • Title, description, and tags
  • Geographical coverage (drawn interactively on the globe)
  • Temporal coverage (time range represented by the data)
  • Provider
    The organisation or individual that owns, authored, or should be credited for the data
  • License
    The terms under which the data may be used
  • Citation
    A citation statement and (if applicable) DOI

Tip

We can create DOIs via our DataCite.org membership for data that is published for the first time via the ODP.


Adding Data to Your Dataset

ODP supports multiple ways to attach data to a dataset. The available features depend on how structured your data is.

Data types and capabilities

Data type Example formats How to add Capabilities
Files only Any: NetCDF, ZIP, TIFF, Zarr, Excel, proprietary formats UI or code Storage, download, metadata
Files with preview PDF, images, text files UI or code Storage, preview within dataset page, download, metadata
Structured files CSV, GeoJSON, Parquet UI or code Can be ingested into a table
Table Ingested tabular data Via structured files or direct code ingest Querying, APIs, interactive maps, column statistics

Note

A table is created either by ingesting a supported structured file or by ingesting tabular data programmatically. The tabular functionality unlocks more advanced capabilities of the ODP, so we encourage using it where possible.


Uploading Files via the Web UI

  1. Click Select Files or drag and drop files into the dataset
  2. Click Upload All (or upload files individually)
  3. For large files, uploading one at a time is recommended

If a file’s MIME type is not detected automatically (e.g. rare or proprietary formats), you may need to specify it manually.


Ingesting Structured Files into a Table

For supported structured formats (.csv, .geojson, .parquet), the UI displays an “ingest file to table” icon next to the file.

To ingest:

  1. Upload the structured file
  2. Click the ingest file to table icon
  3. Confirm ingestion

This converts the file into ODP’s columnar tabular storage and enables:

  • Fast querying
  • Map visualisation
  • API access

Editing the Table Schema

After ingestion, use Edit table to refine how the data behaves:

  • Rename columns
  • Add human-readable column descriptions (tooltips)
  • Classify geometry columns (WKB/WKT, WGS84) or latitude/longitude pairs
  • Choose columns to index (partition) for improved query performance

Tip

Editing the schema alters the way the data is structured. Adding the correct class (e.g. Geometry or Latitude/Longitude) unlocks some of ODP’s features.
For large datasets, restructuring may take some time — even if you close the browser, it will continue until finished.


Adding Data Programmatically

For programmatic workflows, large datasets, or automation, use the Python SDK:

Programmatic uploads also allow:

  • Direct ingestion without intermediate files
  • More precise schema control
  • Adding metadata at upload time (UI support is limited and evolving)

Geospatial Data

Tabular data on ODP can handle geospatial data inputs. Defining this correctly allows the platform to unlock powerful features such as map visualisation, spatial queries, OGC Features, and (soon) vector tile endpoints.

Format Geometry handling
GeoJSON Geometry is detected automatically
CSV Geometry must be provided as WKT or latitude/longitude float columns
Parquet Geometry may be WKT, WKB, or latitude/longitude float columns
Direct table ingest Geometry may be WKT, WKB, or latitude/longitude float columns

Automatic Table Profiling

When a dataset contains a table, ODP automatically derives column-level statistics, including:

  • Data types
  • Value ranges
  • Spatial and temporal extents

These statistics power search, filtering, and visualisation features. You can view them on the main dataset page (or from My Data, click Preview).

Note

Automatic profiling applies to tables, not to files alone.


Sharing Data via Access Control

You can control who can view or modify your dataset.

From the dataset UI, open Manage Access and assign roles:

  • Admin – full control to add permissions, publish, and edit
  • Editor – can modify data and metadata
  • Viewer – read-only access

Permissions can be granted to:

  • Individual already registered ODP users (via email address)
  • Groups (coming soon)

Public access

A dataset can be public without being published. If you enable Public Sharing: Share via link, anyone with the link will be able to access the dataset, but it will not be findable in the catalogue. Publication controls catalogue visibility, not access mechanics.


Publishing Your Dataset

Publishing makes a dataset discoverable in the public ODP catalogue. All functionality (files, tables, APIs) works the same before and after publication.

Warning

Once published, only Admins of the dataset can make edits (Editors no longer have edit rights).


What to Check Before Publishing

Before submitting a dataset for publication, ensure that:

  • You have the right to publish the data
  • The provider (owner/author) is clearly specified
  • The license accurately reflects usage rights
  • Any required citation is included

Rich metadata (spatial/temporal coverage, descriptions, tags) is strongly encouraged, but the most critical requirements are licensing and provenance.


Publishing Process

  1. Review the dataset metadata
  2. Confirm ownership, licensing, and citation
  3. Submit the dataset for review
  4. The ODP team will review and contact you if changes are required
  5. Once approved, the dataset is published to the public catalogue

Licensing Guidance

Choosing the correct license is essential.

  • If the data has been published previously, the original license must be reproduced unless explicitly agreed otherwise with the provider
  • If the data is derived from one or more sources, the resulting license must be compatible with the original licenses
  • If the data is entirely your own, you may choose an appropriate license

If it is your choice, we recommend choosing a widely adopted open license such as Creative Commons CC BY 4.0 (only attribution required) or the least restrictive CC 0 1.0 Universal whenever possible. Alternatively you can browse other Creative Commons license options.

Warning

Please do not publish data unless you are confident you have the right to do so under the stated license.
Where possible, it is both polite and safest to check with the original author or provider. This is also our best practice when publishing data through the ODP.