Skip to content

Ocean Data Platform (ODP) R SDK

The Ocean Data Platform exposes curated ocean datasets through a tabular API. This R SDK lets you authenticate, stream Arrow batches, transform results into native data.frames, and modify data (insert, update, delete). You can also inspect schemas, perform server-backed aggregations, and manage raw files.

Quick Start

Want a fast tour? Download and run the quick-start script.

Notice: The SDK is still in beta. Please report any bugs and give us some feedback!

Installation

Install straight from GitHub (the examples below use remotes, but pak or devtools work the same way). If you do not have remotes installed yet, install it once via install.packages("remotes").

install.packages("remotes")  # skip if already installed
remotes::install_github("C4IROcean/odp-sdkr")

Load the package in your session via library(odp) after the installation finishes.

Authentication

Authenticate using your API key:

client <- odp_client(api_key = "Sk_....")

Or simply call without arguments to log in via browser or environment key:

# Sys.setenv(ODP_API_KEY = "Sk_...")
client <- odp_client()

Connecting to a Dataset

Use the dataset ID that you would normally copy from the catalog UI (https://app.hubocean.earth/catalog). The snippet below targets the public GLODAP dataset:

glodap <- client$dataset("aea06582-fc49-4995-a9a8-2f31fcc65424")
table <- glodap$table

If the dataset has an attached tabular store you can now work with the table using the helpers described below. When a dataset is not tabular the table calls will raise errors.

Tabular helpers

schema()

The schema call returns the Arrow layout for the table so you can plan your queries:

schema <- table$schema()
print(schema)

select()

select() returns an OdpCursor that lazily streams arrow::RecordBatch chunks. Iterate with next_batch() or materialise into a data.frame, Arrow Table, or tibble when you need the full result.

cursor <- table$select()
while (!is.null(batch <- cursor$next_batch())) {
  cat("chunk rows:", batch$num_rows, "\n")
  # process or transform each RecordBatch on the fly
}

# collect into familiar structs when ready
df <- cursor$dataframe()
arrow_tbl <- cursor$arrow()
# optional tidyverse helper
# tib_tbl <- cursor$tibble()

Materialisation helpers only drain batches that have not been streamed yet. To collect the full result after iterating with next_batch(), start a new cursor and call dataframe()/collect() before consuming chunks.

filter

Filters use SQL/Arrow-style expressions, including geospatial helpers such as within, contains, and intersect.

cursor <- table$select(filter = "G2year >= 2020 AND G2year < 2025")
bbox <- 'geometry within "POLYGON ((-10 50, -5 50, -5 55, -10 55, -10 50))"'
cursor <- table$select(filter = bbox)

columns

Restrict the projection when you only need specific fields:

cursor <- table$select(columns = c("G2tco2", "G2year"))

vars

Bind parameters inside the filter using either named or positional variables:

cursor <- table$select(
  filter = "G2year >= $start AND G2year < $end",
  vars = list(start = 2020, end = 2025)
)

# positional form
table$select(filter = "G2year >= ? AND G2year < ?", vars = list(2020, 2025))

stats()

Fetch read-only stats for the table:

stats <- table$stats()
str(stats)

Aggregations

table$aggregate() performs the heavy lifting on the backend and stitches partials together locally.

agg <- table$aggregate(
  group_by = "G2year",
  filter = "G2year >= 2020 AND G2year < 2025",
  aggr = list(G2salinity = "mean", G2tco2 = "max")
)
print(agg)

aggr entries specify which aggregation to apply ("sum", "min", "max", "count", or "mean"). Advanced expressions such as h3(geometry, 6) or bucket(depth, 0, 200, 400) are accepted in group_by.

Writing data

Creating and inserting

Create a table from a schema or data and optionally insert rows:

# Create from schema
schema <- arrow::schema(latitude = arrow::float64(), depth = arrow::int32())
table$create(schema)

# Or create and insert in one call
df <- data.frame(latitude = c(10.5, 20.3), depth = c(100L, 200L))
table$create(df)

# Append more data
table$insert(df_more)

Transactions

For multi-step workflows (insert, update, delete) within a transaction:

tx <- table$begin()
tx$insert(df)
df_replace <- tx$replace(filter = "depth > 50")$dataframe()
df_replace$depth <- df_replace$depth + 5
tx$insert(df_replace)
tx$delete(query = "depth > 1000")
tx$commit()

Schema management

Alter the table schema and re-ingest data, or remove data:

new_schema <- arrow::schema(latitude = arrow::float64(), depth = arrow::int32(), quality = arrow::int8())
table$alter(new_schema)   # re-ingest with new schema

table$truncate()          # remove all rows
table$drop()              # remove table entirely

Files

Manage raw files attached to a dataset:

dataset <- client$dataset("dataset-id")
dataset$files$list()              # list files
dataset$files$upload("file.csv")  # upload
dataset$files$download(file_id)   # download

Troubleshooting

  • Ensure arrow is installed; the SDK relies on Arrow IPC streams for transport
  • Increase timeout when scanning very large tables
  • Keep the ODP_API_KEY secret—never commit it to source control

These operations let you read, write, and manage ODP datasets from pipelines, Shiny dashboards, and notebooks using idiomatic R.