Tabular v2
Experimental
The Ocean Data Platform (ODP) new Tabular v2 is a new version of the tabular storage based on the lesson learned from the v1.
key features:
- serialization:
json
is not very efficient for large datasets, and has limitation for some data types. the new tabular storage is based onarrow IPC
format, which is much more efficient both in data usage and speed of serialization/deserialization. - query:
OQS
has a steep learning curve, and is not very readable, we switched to a more standardSQL-like
query language. - schema: the new tabular storage uses
arrow Schema
for the schema definition, instead of a customjson
schema. - partitions: data is incrementally partitioned when ingested, without the need to specify a partition key. A custom engine will then use any query and validate if each partition might have candidates, dropping the ones that don't.
Getting Started
- quick: quick overview on how to use the new table_v2() with python examples.
- reference: detailed documentation on the new tabular storage in the python SDK.
- usage guide: notebook with examples
Coming soon
- expand the API to other formats than
pyarrow IPC
- add aggregation functions