Skip to content

SDK

The current Python SDK for the Ocean Data Platform is in an experimental phase and only supports Tabular v2, our new generation of tabular storage. Previous versions and features have been deprecated in preparation for a more streamlined and powerful SDK release coming soon.

Tabular v2

Key features:

  • Serialization: json is not very efficient for large datasets, and has limitation for some data types. the new tabular storage is based on arrow IPC format, which is much more efficient both in data usage and speed of serialization/deserialization.
  • Schema: The new tabular storage uses arrow Schema for the schema definition, instead of a custom json schema.
  • Partitions: Data is incrementally partitioned when ingested, without the need to specify a partition key. A custom engine will then use any query and validate if each partition might have candidates, dropping the ones that don't.

Getting Started

  • quick: quick overview on how to use the new table_v2() with python examples.
  • reference: detailed documentation on the new tabular storage in the python SDK.

Coming soon

We're actively working on the next major version of the SDK, which will offer a unified and intuitive interface for accessing datasets on the Ocean Data Platform. This upcoming release will integrate Tabular v2 as a core component and provide a more consistent and user-friendly developer experience across the platform.

If you're interested in trying a preview version or contributing feedback, we'd love to hear from you — feel free to reach out or contact us through our community channels.