Accessing Data

Continuing from the "Finding data" section, we can access the data.

Continuing from Finding Data, we see Aker biomarine data collection contains 1 raw dataset.

img_42.png
Aker biomarine datasets


Click dataset to open dataset's page. Dataset's page contains information about the dataset and files. To see dataset's UUID, expand API section.

img_43.png
Dataset's UUID and qualified name

Download raw dataset by clicking download button.

img_44.png
Files section


If you open tabular dataset (for example GLODAP), you can download data by clicking "Export to CSV"

img_45.png
Tabular data

Raw datasets contain this data:

  • List of dataset's files
  • Files

List of dataset's files contains files' matadata. Find list of dataset's files by sending POST https://api.hubocean.earth/data/{dataset_uuid}/list request. Example of "Aker BioMarine EK60, EK80 Ecosounder AKBM data" dataset's files list: https://api.hubocean.earth/data/a22c1e17-b00e-43f3-91f1-9aefedf58ec0/list.

Paginate files info

Files info can be paginated. To limit and pagination results, send request with page_size parameter: https://api.hubocean.earth/data/{dataset_uuid}/list?page={cursor}&page_size={page_size}. Response will contain next parameter used to paginate results. To continue downloading paginated metadata, send https://api.hubocean.earth/data/{dataset_uuid}/list?page={cursor}&page_size={page_size}.

Filter results list by adding Object Query Structure (OQS) filter to request body.

Following example of request body, which returns list of files, whose matadata contains "custom_label" parameter with value "custom":

{"metadata": {"custom_label": "custom"}}

Download raw file by sending GET https://api.hubocean.earth/data/{dataset_uuid}/{file_name} request. Example URL of "World Port Index" dataset's file: https://api.hubocean.earth/data/a22c1e17-b00e-43f3-91f1-9aefedf58ec0/AKBM-SagaSea-2023-D20230119-T220626.raw.


Download tabular file by sending POST https://api.hubocean.earth/data/{dataset_uuid}/list request. Example URL of "GLODAP" dataset's data: https://api.hubocean.earth/data/8a477f7b-8fd5-403e-b021-89dda7848997/list.

Partial tabular data return

Requests process data up to 30 seconds. If remaining data exists, request adds next parameter. To continue downloading data, send https://api.hubocean.earth/data/{dataset_uuid}/list?cursor={next} request.

To download raw file with SDK:

  1. Get file's metadata.
  2. Download file.

Following an example of getting files' metadata and downloading files:

from odp.client import OdpClient

client = OdpClient()

# Getting Raw dataset with file
my_dataset = client.catalog.get("a22c1e17-b00e-43f3-91f1-9aefedf58ec0")

# Example of filter:
# filter = {"metadata": {"custom_label": "custom"}}
filter = None

# .list() returns files' metadata
for file_metadata in client.raw.list(my_dataset, filter):
    print(file_metadata)

    # File save location
    destination_path = file_metadata.name

    client.raw.download_file(my_dataset, file_metadata, destination_path)
    print("File downloaded successfully.")
    # We only download single file for demonstration purposes
    break

Following example of tabular data download in SDK:

from odp.client import OdpClient

client = OdpClient()

# Getting Tabular dataset whose data we want to see
my_dataset = client.catalog.get("8a477f7b-8fd5-403e-b021-89dda7848997")

limit = 100

data = client.tabular.select_as_list(my_dataset, limit=limit)
print(data)

Data download

Differently than with API, SDK will download all of the data if limit is not set. Therefore, there is no cursor parameter in select_as_list function.