Ocean Data Platform (ODP) R SDK
The Ocean Data Platform exposes curated ocean datasets through a tabular API.
This R SDK focuses on the same read-only workflows: authenticate, stream Arrow
batches, and transform results into native data.frames and tibbles.
Notice: The SDK is still in beta. Writing/mutating tables is intentionally out-of-scope for this first version. The sdk cover inspecting tabular metadata, streaming Arrow batches, and performing server-backed aggregations. More features will be added in later versions.
Installation
Install straight from GitHub (the examples below use remotes, but pak or
devtools work the same way). If you do not have remotes installed yet,
install it once via install.packages("remotes").
install.packages("remotes") # skip if already installed
remotes::install_github("C4IROcean/odp-sdkr")
Load the package in your session via library(odp) after the installation
finishes.
Authentication
odp_client() expects an API key. Provide it explicitly or via the
ODP_API_KEY environment variable or by passing it to the client on startup.
client <- odp_client(api_key = "Sk_....")
Sys.setenv(ODP_API_KEY = "Sk_...")
client <- odp_client()
Connecting to a Dataset
Use the dataset ID that you would normally copy from the catalog UI (https://app.hubocean.earth/catalog). The snippet below targets the public GLODAP dataset:
glodap <- client$dataset("aea06582-fc49-4995-a9a8-2f31fcc65424")
table <- glodap$table
If the dataset has an attached tabular store you can now work with the table using the helpers described below. When a dataset is not tabular the table calls will raise errors.
Tabular helpers
schema()
The schema call returns the Arrow layout for the table so you can plan your queries:
schema <- table$schema()
print(schema)
select()
select() returns an OdpCursor that lazily streams arrow::RecordBatch
chunks. Iterate with next_batch() or materialise into a data.frame, Arrow
Table, or tibble when you need the full result.
cursor <- table$select()
while (!is.null(batch <- cursor$next_batch())) {
cat("chunk rows:", batch$num_rows, "\n")
# process or transform each RecordBatch on the fly
}
# collect into familiar structs when ready
df <- cursor$dataframe()
arrow_tbl <- cursor$arrow()
# optional tidyverse helper
# tib_tbl <- cursor$tibble()
Materialisation helpers only drain batches that have not been streamed yet. To collect the full result after iterating with
next_batch(), start a new cursor and calldataframe()/collect()before consuming chunks.
filter
Filters use SQL/Arrow-style expressions, including geospatial helpers such as
within, contains, and intersect.
cursor <- table$select(filter = "G2year >= 2020 AND G2year < 2025")
bbox <- 'geometry within "POLYGON ((-10 50, -5 50, -5 55, -10 55, -10 50))"'
cursor <- table$select(filter = bbox)
columns
Restrict the projection when you only need specific fields:
cursor <- table$select(columns = c("G2tco2", "G2year"))
vars
Bind parameters inside the filter using either named or positional variables:
cursor <- table$select(
filter = "G2year >= $start AND G2year < $end",
vars = list(start = 2020, end = 2025)
)
# positional form
table$select(filter = "G2year >= ? AND G2year < ?", vars = list(2020, 2025))
stats()
Fetch read-only stats for the table:
stats <- table$stats()
str(stats)
Aggregations
table$aggregate() performs the heavy lifting on the backend and stitches
partials together locally.
agg <- table$aggregate(
group_by = "G2year",
filter = "G2year >= 2020 AND G2year < 2025",
aggr = list(G2salinity = "mean", G2tco2 = "max")
)
print(agg)
aggr entries specify which aggregation to apply ("sum", "min", "max",
"count", or "mean"). Advanced expressions such as h3(geometry, 6) or
bucket(depth, 0, 200, 400) are accepted in group_by.
Troubleshooting
- Ensure
arrowis installed; the SDK relies on Arrow IPC streams for transport - Increase
timeoutwhen scanning very large tables - Keep the
ODP_API_KEYsecret—never commit it to source control
These primitives let pipelines, Shiny dashboards, and notebooks pull from ODP tabular datasets using idiomatic R.