Concepts

Before we proceed there are some key concepts in our platform that you should be aware of. Let's go through the most important ones.

Catalog

The catalog is the part of our platform that organizes our data.

Collections

Data often come in parts - so associating them with each other into collections makes it easier to find - and manage them.

Tip

Think of collections as folders.

Datasets

This is the actual data. But they take different shapes. Some are images, like satellite imagery, some are tabular data organized in rows and columns. These differences are called storage classes.

Storage Classes

Storage classes are different ways to store data.

Note

On creation of a new dataset our platform needs to know what storage class the dataset is, that way it's prepared for the data about to be uploaded.

Raw storage

A dataset that is set up for raw storage can accept anything you can store as a file on your computer.

Tabular storage

Tabular data is organized in rows and columns, great for making queries against.

Gridded storage - Coming soon

Gridded storage divides areas into squares, instead of points and gives way for more dimensions than rows and columns - like time, and place.

Metadata

Metadata is important to describe and add meaning to data. Useful for our users, and for our platform. The platform reads the metadata to understand how to handle the data. This is - by the way - where the storage class is defined when creating a new dataset.

Data Transfer Object (DTO)

For our platform to understand metadata, it has to be written in a certain structured way. Data transfer objects are used in our API and SDK to communicate with our platform.

When interacting with the Ocean Data Platform, you don't have to handle Data Transfer Objects. But they are helpful when using the SDK.

Here's an example of a Data Transfer Object being defined for use with our Python SDK to interact with raw storage using a bare bones DTO, there are more fields, but these are the required ones. The code creates the metadata dto for a new raw storage dataset, and sends it to the platform.

# When passing metadata to the ResourceDTO class, 
# it checks that the fields are correct for that datatype.
my_dataset = ResourceDto(
    **{
        "kind": "catalog.hubocean.io/dataset",  # It's a dataset
        "version": "v1alpha3",
        "metadata": {
            "name": "sdk-raw-example",  # Computer friendly dataset name
        },
        "spec": DatasetSpec(
            # I want this to be a raw dataset.
            storage_class="registry.hubocean.io/storageClass/raw",
            # Which controller to use
            storage_controller="registry.hubocean.io/storageController/storage-raw-cdffs",

            maintainer={
                # Who I am
                "contact": "Just Me <raw_client_example@hubocean.earth>"  # <- Strict syntax here
            },
        ),
    }
)

# The DTO is then sent to the platform with the create function of the catalog. 
my_dataset = client.catalog.create(my_dataset)

In the next part we will start finding data ...