Ocean Data Platform (ODP) R SDK
The Ocean Data Platform exposes curated ocean datasets through a tabular API.
This R SDK lets you authenticate, stream Arrow batches, transform results into
native data.frames, and modify data (insert, update, delete). You can also
inspect schemas, perform server-backed aggregations, and manage raw files.
Quick Start
Want a fast tour? Download and run the quick-start script.
Notice: The SDK is still in beta. Please report any bugs and give us some feedback!
Installation
Install straight from GitHub (the examples below use remotes, but pak or
devtools work the same way). If you do not have remotes installed yet,
install it once via install.packages("remotes").
install.packages("remotes") # skip if already installed
remotes::install_github("C4IROcean/odp-sdkr")
Load the package in your session via library(odp) after the installation
finishes.
Authentication
Authenticate using your API key:
client <- odp_client(api_key = "Sk_....")
Or simply call without arguments to log in via browser or environment key:
# Sys.setenv(ODP_API_KEY = "Sk_...")
client <- odp_client()
Connecting to a Dataset
Use the dataset ID that you would normally copy from the catalog UI (https://app.hubocean.earth/catalog). The snippet below targets the public GLODAP dataset:
glodap <- client$dataset("aea06582-fc49-4995-a9a8-2f31fcc65424")
table <- glodap$table
If the dataset has an attached tabular store you can now work with the table using the helpers described below. When a dataset is not tabular the table calls will raise errors.
Tabular helpers
schema()
The schema call returns the Arrow layout for the table so you can plan your queries:
schema <- table$schema()
print(schema)
select()
select() returns an OdpCursor that lazily streams arrow::RecordBatch
chunks. Iterate with next_batch() or materialise into a data.frame, Arrow
Table, or tibble when you need the full result.
cursor <- table$select()
while (!is.null(batch <- cursor$next_batch())) {
cat("chunk rows:", batch$num_rows, "\n")
# process or transform each RecordBatch on the fly
}
# collect into familiar structs when ready
df <- cursor$dataframe()
arrow_tbl <- cursor$arrow()
# optional tidyverse helper
# tib_tbl <- cursor$tibble()
Materialisation helpers only drain batches that have not been streamed yet. To collect the full result after iterating with
next_batch(), start a new cursor and calldataframe()/collect()before consuming chunks.
filter
Filters use SQL/Arrow-style expressions, including geospatial helpers such as
within, contains, and intersect.
cursor <- table$select(filter = "G2year >= 2020 AND G2year < 2025")
bbox <- 'geometry within "POLYGON ((-10 50, -5 50, -5 55, -10 55, -10 50))"'
cursor <- table$select(filter = bbox)
columns
Restrict the projection when you only need specific fields:
cursor <- table$select(columns = c("G2tco2", "G2year"))
vars
Bind parameters inside the filter using either named or positional variables:
cursor <- table$select(
filter = "G2year >= $start AND G2year < $end",
vars = list(start = 2020, end = 2025)
)
# positional form
table$select(filter = "G2year >= ? AND G2year < ?", vars = list(2020, 2025))
stats()
Fetch read-only stats for the table:
stats <- table$stats()
str(stats)
Aggregations
table$aggregate() performs the heavy lifting on the backend and stitches
partials together locally.
agg <- table$aggregate(
group_by = "G2year",
filter = "G2year >= 2020 AND G2year < 2025",
aggr = list(G2salinity = "mean", G2tco2 = "max")
)
print(agg)
aggr entries specify which aggregation to apply ("sum", "min", "max",
"count", or "mean"). Advanced expressions such as h3(geometry, 6) or
bucket(depth, 0, 200, 400) are accepted in group_by.
Writing data
Creating and inserting
Create a table from a schema or data and optionally insert rows:
# Create from schema
schema <- arrow::schema(latitude = arrow::float64(), depth = arrow::int32())
table$create(schema)
# Or create and insert in one call
df <- data.frame(latitude = c(10.5, 20.3), depth = c(100L, 200L))
table$create(df)
# Append more data
table$insert(df_more)
Transactions
For multi-step workflows (insert, update, delete) within a transaction:
tx <- table$begin()
tx$insert(df)
df_replace <- tx$replace(filter = "depth > 50")$dataframe()
df_replace$depth <- df_replace$depth + 5
tx$insert(df_replace)
tx$delete(query = "depth > 1000")
tx$commit()
Schema management
Alter the table schema and re-ingest data, or remove data:
new_schema <- arrow::schema(latitude = arrow::float64(), depth = arrow::int32(), quality = arrow::int8())
table$alter(new_schema) # re-ingest with new schema
table$truncate() # remove all rows
table$drop() # remove table entirely
Files
Manage raw files attached to a dataset:
dataset <- client$dataset("dataset-id")
dataset$files$list() # list files
dataset$files$upload("file.csv") # upload
dataset$files$download(file_id) # download
Troubleshooting
- Ensure
arrowis installed; the SDK relies on Arrow IPC streams for transport - Increase
timeoutwhen scanning very large tables - Keep the
ODP_API_KEYsecret—never commit it to source control
These operations let you read, write, and manage ODP datasets from pipelines, Shiny dashboards, and notebooks using idiomatic R.