Accessing Data
Continuing from the "Finding data" section, we can access the data.
Continuing from Finding Data, we see Aker biomarine data collection contains 1 raw dataset.
Click dataset to open dataset's page. Dataset's page contains information about the dataset and files. To see dataset's UUID, expand API section.
Download raw dataset by clicking download button.
If you open tabular dataset (for example GLODAP), you can download data by clicking "Export to CSV"
Raw datasets contain this data:
- List of dataset's files
- Files
List of dataset's files contains files' matadata. Find list of dataset's files by sending POST
https://api.hubocean.earth/data/{dataset_uuid}/list
request. Example of "Aker BioMarine EK60, EK80 Ecosounder
AKBM data" dataset's files list:
https://api.hubocean.earth/data/a22c1e17-b00e-43f3-91f1-9aefedf58ec0/list
.
Paginate files info
Files info can be paginated. To limit and pagination results, send request with page_size parameter:
https://api.hubocean.earth/data/{dataset_uuid}/list?page={cursor}&page_size={page_size}
.
Response will contain next
parameter used to paginate results. To continue downloading paginated metadata,
send https://api.hubocean.earth/data/{dataset_uuid}/list?page={cursor}&page_size={page_size}
.
Filter results list by adding Object Query Structure (OQS) filter to request body.
Following example of request body, which returns list of files, whose matadata contains "custom_label" parameter with value "custom":
{"metadata": {"custom_label": "custom"}}
Download raw file by sending GET https://api.hubocean.earth/data/{dataset_uuid}/{file_name}
request. Example
URL of "World Port Index" dataset's file:
https://api.hubocean.earth/data/a22c1e17-b00e-43f3-91f1-9aefedf58ec0/AKBM-SagaSea-2023-D20230119-T220626.raw
.
Download tabular file by sending POST https://api.hubocean.earth/data/{dataset_uuid}/list
request. Example
URL of "GLODAP" dataset's data: https://api.hubocean.earth/data/8a477f7b-8fd5-403e-b021-89dda7848997/list
.
Partial tabular data return
Requests process data up to 30 seconds. If remaining data exists, request adds next
parameter. To
continue downloading data, send https://api.hubocean.earth/data/{dataset_uuid}/list?cursor={next}
request.
To download raw file with SDK:
- Get file's metadata.
- Download file.
Following an example of getting files' metadata and downloading files:
from odp.client import OdpClient
client = OdpClient()
# Getting Raw dataset with file
my_dataset = client.catalog.get("a22c1e17-b00e-43f3-91f1-9aefedf58ec0")
# Example of filter:
# filter = {"metadata": {"custom_label": "custom"}}
filter = None
# .list() returns files' metadata
for file_metadata in client.raw.list(my_dataset, filter):
print(file_metadata)
# File save location
destination_path = file_metadata.name
client.raw.download_file(my_dataset, file_metadata, destination_path)
print("File downloaded successfully.")
# We only download single file for demonstration purposes
break
Following example of tabular data download in SDK:
from odp.client import OdpClient
client = OdpClient()
# Getting Tabular dataset whose data we want to see
my_dataset = client.catalog.get("8a477f7b-8fd5-403e-b021-89dda7848997")
limit = 100
data = client.tabular.select_as_list(my_dataset, limit=limit)
print(data)
Data download
Differently than with API, SDK will download all of the data if limit is not set. Therefore, there is no cursor
parameter in select_as_list
function.