Skip to content

Tabular v2

Experimental

The Ocean Data Platform (ODP) new Tabular v2 is a new version of the tabular storage based on the lesson learned from the v1.

key features:

  • serialization: json is not very efficient for large datasets, and has limitation for some data types. the new tabular storage is based on arrow IPC format, which is much more efficient both in data usage and speed of serialization/deserialization.
  • query: OQS has a steep learning curve, and is not very readable, we switched to a more standard SQL-like query language.
  • schema: the new tabular storage uses arrow Schema for the schema definition, instead of a custom json schema.
  • partitions: data is incrementally partitioned when ingested, without the need to specify a partition key. A custom engine will then use any query and validate if each partition might have candidates, dropping the ones that don't.

Getting Started

  • quick: quick overview on how to use the new table_v2() with python examples.
  • reference: detailed documentation on the new tabular storage in the python SDK.
  • usage guide: notebook with examples

Coming soon

  • expand the API to other formats than pyarrow IPC
  • add aggregation functions