Accessing Data

Continuing from the "Finding data" section, we can access the data.

DashboardAPISDK

Continuing from Finding Data, we see Aker biomarine data collection contains 1 raw dataset.

Click dataset to open dataset's page. Dataset's page contains information about the dataset and files. To see dataset's UUID, expand API section.

Download raw dataset by clicking download button.

If you open tabular dataset (for example GLODAP), you can download data by clicking "Export to CSV"

Raw datasets contain this data:

List of dataset's files
Files

List of dataset's files contains files' matadata. Find list of dataset's files by sending POST https://api.hubocean.earth/data/{dataset_uuid}/list request. Example of "Aker BioMarine EK60, EK80 Ecosounder AKBM data" dataset's files list: https://api.hubocean.earth/data/a22c1e17-b00e-43f3-91f1-9aefedf58ec0/list.

Paginate files info

Files info can be paginated. To limit and pagination results, send request with page_size parameter: https://api.hubocean.earth/data/{dataset_uuid}/list?page={cursor}&page_size={page_size}. Response will contain next parameter used to paginate results. To continue downloading paginated metadata, send https://api.hubocean.earth/data/{dataset_uuid}/list?page={cursor}&page_size={page_size}.

Filter results list by adding Object Query Structure (OQS) filter to request body.

Following example of request body, which returns list of files, whose matadata contains "custom_label" parameter with value "custom":

{"metadata": {"custom_label": "custom"}}

Download raw file by sending GET https://api.hubocean.earth/data/{dataset_uuid}/{file_name} request. Example URL of "World Port Index" dataset's file: https://api.hubocean.earth/data/a22c1e17-b00e-43f3-91f1-9aefedf58ec0/AKBM-SagaSea-2023-D20230119-T220626.raw.

Download tabular file by sending POST https://api.hubocean.earth/data/{dataset_uuid}/list request. Example URL of "GLODAP" dataset's data: https://api.hubocean.earth/data/8a477f7b-8fd5-403e-b021-89dda7848997/list.

Partial tabular data return

Requests process data up to 30 seconds. If remaining data exists, request adds next parameter. To continue downloading data, send https://api.hubocean.earth/data/{dataset_uuid}/list?cursor={next} request.

To download raw file with SDK:

Get file's metadata.
Download file.

Following an example of getting files' metadata and downloading files:

from odp.client import OdpClient

client = OdpClient()

# Getting Raw dataset with file
my_dataset = client.catalog.get("a22c1e17-b00e-43f3-91f1-9aefedf58ec0")

# Example of filter:
# filter = {"metadata": {"custom_label": "custom"}}
filter = None

# .list() returns files' metadata
for file_metadata in client.raw.list(my_dataset, filter):
    print(file_metadata)

    # File save location
    destination_path = file_metadata.name

    client.raw.download_file(my_dataset, file_metadata, destination_path)
    print("File downloaded successfully.")
    # We only download single file for demonstration purposes
    break

Following example of tabular data download in SDK:

from odp.client import OdpClient

client = OdpClient()

# Getting Tabular dataset whose data we want to see
my_dataset = client.catalog.get("8a477f7b-8fd5-403e-b021-89dda7848997")

limit = 100

data = client.tabular.select_as_list(my_dataset, limit=limit)
print(data)

Data download

Differently than with API, SDK will download all of the data if limit is not set. Therefore, there is no cursor parameter in select_as_list function.