Tabular Storage
Tabular Storage Key Features
Schemas:
Schemas are the recipe of the data that will be in a dataset. It contains the field names, types and partitioning info of the dataset. The column information of the dataset is provided in the TableSpec format.
-
Create Schema: Easily create a schema object to represent the shape of the data that will be stored in the Tabular Storage.
-
Get Schema: Retrieve schema from the Tabular Storage.
-
Delete Schema: Manage schemas by deleting unwanted or obsolete schemas.
Data Operations:
-
Select Data: Retrieve data from the Tabular Storage as stream, list or Pandas Dataframe.
-
Insert Data: Insert data into the Tabular Storage.
-
Update Data: Update data in the Tabular Storage.
-
Delete Data: Delete data from the Tabular Storage.
Querying:
For querying the data in the Tabular Storage, ODP provides a query language called Object Query Structure (OQS). It is a powerful query language that can be used to filter, sort and aggregate data. For more information on OQS, please refer to the OQS documentation
Functions
Create dataset's schema
.create_schema() example
Create Schema.
Arguments
resource_dto
(DatasetDto): Dataset resource.table_spec
(TableSpec): Specifications of the schema to be created.
Returns
- Specifications of the schema that is being created.
Raises
- OdpResourceExistsError: If the schema already exists with the same identifier
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
from odp.client.dto.table_spec import TableSpec
client = OdpClient()
# Getting Tabular dataset
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
table_schema = {"Data": {"type": "string"}}
my_table_spec = TableSpec(table_schema=table_schema)
mt_table_spec = client.tabular.create_schema(resource_dto=my_dataset, table_spec=my_table_spec)
print(mt_table_spec)
Get dataset's schema
.get_schema() example
Get schema.
Arguments
resource_dto
(DatasetDto): Dataset resource.
Returns
- Specifications of the schema that is being queried.
Raises
- OdpResourceNotFoundError: If the schema cannot be found
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset whose schema we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
mt_table_spec = client.tabular.get_schema(resource_dto=my_dataset)
print(mt_table_spec)
Delete dataset's schema
.delete_schema() example
Delete schema.
Arguments
resource_dto
(DatasetDto): Dataset resource.delete_data
(Optional(bool), default=False): Bool to specify whether the data should be deleted as well
Raises
- OdpResourceNotFoundError: If the schema cannot be found
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset whose schema we want to delete
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
client.tabular.delete_schema(my_dataset)
print("Dataset deleted successfully")
Get dataset's data in stream format
.select_as_stream() example
Select data from dataset as stream.
Arguments
resource_dto
(DatasetDto): Dataset resource.filter_query
(Optional[dict]): Filter query in OQS format. Read more about OQS here.limit
(int): Limit for the number of rows returned.
Yields
- Data that is queried as a stream.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset whose data we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
# OQS filter
filter_query = {...}
limit = 1000
data = client.tabular.select_as_stream(my_dataset, filter_query, limit)
print("Dataset's data:", f"{[datapoint for datapoint in data]}")
Get dataset's data in list format
.select_as_list() example
Select data from dataset as list.
Arguments
resource_dto
(DatasetDto): Dataset resource.filter_query
(Optional[dict]): Filter query in OQS format. Read more about OQS here.limit
(int): Limit for the number of rows returned.
Returns
- Data that is queried as a list.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset whose dataset we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
# OQS filter
filter_query = {...}
limit = 1000
data = client.tabular.select_as_list(my_dataset, filter_query, limit)
print(data)
Get dataset's data in DataFrame format
.select_as_dataframe() example
Select data from dataset as Pandas Dataframe.
Arguments
resource_dto
(DatasetDto): Dataset resource.filter_query
(Optional[dict]): Filter query in OQS format. Read more about OQS here.
Returns
- Data that is queried as a Pandas Dataframe.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
from pandas import DataFrame
client = OdpClient()
# Getting Tabular dataset whose data we want to see
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
# OQS filter
filter_query = {...}
dataframe = client.tabular.select_as_dataframe(my_dataset, filter_query)
print(dataframe)
Upload dataset's data
.write() example
Write data to dataset.
Arguments
resource_dto
(DatasetDto): Dataset resource.data
(list): Data to ingest.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset to whom we will write data
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
test_data = [{...}, {...}, {...}]
client.tabular.write(resource_dto=my_dataset, data=test_data)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))
Upload dataset's data in DataFrame format
.write_dataframe() example
Write data to dataset in Pandas DataFrame format.
Arguments
resource_dto
(DatasetDto): Dataset resource.data
(Dataframe): Data to ingest.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset to whom we will write data
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
test_data = DataFrame({...}, {...}, {...})
client.tabular.write_dataframe(resource_dto=my_dataset, data=test_data)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))
Update dataset's data
.update() example
Update data from dataset
Arguments
resource_dto
(DatasetDto): Dataset resource.data
(list): Data to ingest.filter_query
(Optional[dict]): Filter query in OQS format. Read more about OQS here.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset, whose data we will update
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
data = [{...}, {...}, {...}]
filter_query = {...}
client.tabular.update(resource_dto=my_dataset, data=data, filter_query=filter_query)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))
Update dataset's data in DataFrame format
.update_dataframe() example
Update data from dataset in Pandas DataFrame format
Arguments
resource_dto
(DatasetDto): Dataset resource.data
(DataFrame): Data to ingest.filter_query
(Optional[dict]): Filter query in OQS format. Read more about OQS here.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
from pandas import DataFrame
client = OdpClient()
# Getting Tabular dataset whose data we will update
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
dataframe = DataFrame({...}, {...}, {...})
filter_query = {...}
client.tabular.update_dataframe(resource_dto=my_dataset, data=dataframe, filter_query=filter_query)
print("Dataset's data:", client.tabular.select_as_list(my_dataset))
Delete dataset's data matching OQS filter
.delete() example
Delete data from dataset.
Arguments
resource_dto
(DatasetDto): Dataset resource.filter_query
(Optional[dict]): Filter query in OQS format. Read more about OQS here.
Example
from odp.client import OdpClient
from odp.dto import Metadata
from odp.dto.catalog import DatasetDto, DatasetSpec
from odp.dto.common.contact_info import ContactInfo
client = OdpClient()
# Getting Tabular dataset whose data we will delete
my_dataset = client.catalog.get("3d797de8-f4ec-48a5-b211-cae1bcfa432c")
# OQS filter. Only the data matching filter will be deleted. Check SDK documentation's arguments section for more information.
filter_query = {...}
client.tabular.delete(resource_dto=my_dataset, filter_query=filter_query)
print("Data deleted successfully")