Skip to content

Tabular Storage

Tabular Storage Key Features

Schemas:

Schemas are the recipe of the data that will be in a dataset. It contains the field names, types and partitioning info of the dataset. The column information of the dataset is provided in the TableSpec format.

  • Create Schema: Easily create a schema object to represent the shape of the data that will be stored in the Tabular Storage.

  • Get Schema: Retrieve schema from the Tabular Storage.

  • Delete Schema: Manage schemas by deleting unwanted or obsolete schemas.

Examples

Data Operations:

  • Select Data: Retrieve data from the Tabular Storage as stream, list or Pandas Dataframe.

  • Insert Data: Insert data into the Tabular Storage.

  • Update Data: Update data in the Tabular Storage.

  • Delete Data: Delete data from the Tabular Storage.

Querying:

For querying the data in the Tabular Storage, ODP provides a query language called Object Query Structure (OQS). It is a powerful query language that can be used to filter, sort and aggregate data. For more information on OQS, please refer to the OQS documentation

Functions

Create dataset's schema

.create_schema() example

Create Schema.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • table_spec (TableSpec): Specifications of the schema to be created.

Returns

  • Specifications of the schema that is being created.

Raises

  • OdpResourceExistsError: If the schema already exists with the same identifier

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  
from odp.client.dto.table_spec import TableSpec

client = OdpClient()

# Getting Tabular dataset
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

table_schema = {"Data": {"type": "string"}}
my_table_spec = TableSpec(table_schema=table_schema)

mt_table_spec = client.tabular.create_schema(resource_dto=my_dataset, table_spec=my_table_spec)
print(mt_table_spec)

Get dataset's schema

.get_schema() example

Get schema.

Arguments

  • resource_dto (DatasetDto): Dataset resource.

Returns

  • Specifications of the schema that is being queried.

Raises

  • OdpResourceNotFoundError: If the schema cannot be found

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset whose schema we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

mt_table_spec = client.tabular.get_schema(resource_dto=my_dataset)
print(mt_table_spec)

Delete dataset's schema

.delete_schema() example

Delete schema.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • delete_data (Optional(bool), default=False): Bool to specify whether the data should be deleted as well

Raises

  • OdpResourceNotFoundError: If the schema cannot be found

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset whose schema we want to delete
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

client.tabular.delete_schema(my_dataset)
print("Dataset deleted successfully")

Get dataset's data in stream format

.select_as_stream() example

Select data from dataset as stream.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • filter_query (Optional[dict]): Filter query in OQS format. Read more about OQS here.
  • limit (int): Limit for the number of rows returned.

Yields

  • Data that is queried as a stream.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset whose data we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

# OQS filter
filter_query = {...}

limit = 1000

data = client.tabular.select_as_stream(my_dataset, filter_query, limit)
print("Dataset's data:", f"{[datapoint for datapoint in data]}")

Get dataset's data in list format

.select_as_list() example

Select data from dataset as list.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • filter_query (Optional[dict]): Filter query in OQS format. Read more about OQS here.
  • limit (int): Limit for the number of rows returned.

Returns

  • Data that is queried as a list.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset whose dataset we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

# OQS filter
filter_query = {...}

limit = 1000

data = client.tabular.select_as_list(my_dataset, filter_query, limit)
print(data)

Get dataset's data in DataFrame format

.select_as_dataframe() example

Select data from dataset as Pandas Dataframe.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • filter_query (Optional[dict]): Filter query in OQS format. Read more about OQS here.

Returns

  • Data that is queried as a Pandas Dataframe.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  
from pandas import DataFrame

client = OdpClient()

# Getting Tabular dataset whose data we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

# OQS filter
filter_query = {...}

dataframe = client.tabular.select_as_dataframe(my_dataset, filter_query)
print(dataframe)

Upload dataset's data

.write() example

Write data to dataset.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • data (list): Data to ingest.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset to whom we will write data
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

test_data = [{...}, {...}, {...}]

client.tabular.write(resource_dto=my_dataset, data=test_data)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))

Upload dataset's data in DataFrame format

.write_dataframe() example

Write data to dataset in Pandas DataFrame format.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • data (Dataframe): Data to ingest.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset to whom we will write data
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

test_data = DataFrame({...}, {...}, {...})

client.tabular.write_dataframe(resource_dto=my_dataset, data=test_data)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))

Update dataset's data

.update() example

Update data from dataset

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • data (list): Data to ingest.
  • filter_query (Optional[dict]): Filter query in OQS format. Read more about OQS here.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset, whose data we will update
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

data = [{...}, {...}, {...}]

filter_query = {...}

client.tabular.update(resource_dto=my_dataset, data=data, filter_query=filter_query)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))

Update dataset's data in DataFrame format

.update_dataframe() example

Update data from dataset in Pandas DataFrame format

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • data (DataFrame): Data to ingest.
  • filter_query (Optional[dict]): Filter query in OQS format. Read more about OQS here.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  
from pandas import DataFrame

client = OdpClient()

# Getting Tabular dataset whose data we will update
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

dataframe = DataFrame({...}, {...}, {...})

filter_query = {...}

client.tabular.update_dataframe(resource_dto=my_dataset, data=dataframe, filter_query=filter_query)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))

Delete dataset's data matching OQS filter

.delete() example

Delete data from dataset.

Arguments

  • resource_dto (DatasetDto): Dataset resource.
  • filter_query (Optional[dict]): Filter query in OQS format. Read more about OQS here.

Example

from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo  

client = OdpClient()

# Getting Tabular dataset whose data we will delete
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")

# OQS filter. Only the data matching filter will be deleted. Check SDK documentation's arguments section for more information.
filter_query = {...}

client.tabular.delete(resource_dto=my_dataset, filter_query=filter_query)
print("Data deleted successfully")