Quickstart Guide for Consuming API from Jupyter Notebook

Welcome to the quickstart guide for consuming our low-level API from our Jupyter Notebook environment. This guide will walk you through the process of making API requests and demonstrate examples using Python code. Our Jupyter Notebook environment provides proximity to our data, making it convenient for data analysis and exploration.

In this guide we will cover the following topics:

Fetching our token and setting headers.
Listing available data collections.
Creating a data collection.
Creating a dataset.
Creating a schema.
Uploading data.
Querying data.
Deleting data collection.
Deleting dataset.

To get started, please access our Jupyter Notebook environment by clicking on the following link: Jupyter Notebook Environment

Import the requests library and define variables

python

import requests

base_url = "https://api.hubocean.earth"
dataset_collection_name = "my-test-collection"  # Use the existing name or make-your-own-computer-friendly-name
dataset_name = "my-seahorses"  # Use the existing name or make-your-own-computer-friendly-name

Get a token and set headers

This is the first step to make any API request. The token is used to authenticate your requests. The token is valid for 24 hours, after which you need to request a new one. The token is stored in a variable called token and is used in the headers variable. As you are already logged in with your user in the workspace, we can fetch what we need from the workspace localhost on our server.

python

token = requests.post("http://localhost:8000/access_token").json()['token']
headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

List available data collections

By using the /catalog/list endpoint and defining what we are looking for in our selector, we can get a list of all the collections of datasets available to us.

python

endpoint = "/catalog/list"
body = {
    "#EQUALS": [
        "$kind",
        "catalog.hubocean.io/dataCollection"
    ]
}

url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with:

json

{
  'results': [
    ..lots
    of
    collections..
  ]
}

Create data collection

We'll define our own data collection. Let's use the computer friendly name we defined, and set other information we want to store as metadata.

python

# Create data collection

endpoint = f"/catalog"
body = {
    "kind": "catalog.hubocean.io/dataCollection",
    "version": "v1alpha1",
    "metadata": {
        "name": f"{dataset_collection_name}",
        "display_name": "My Test Collection",
        "description": "A test data collection, containing datasets that i want to interact with.",
        "labels": {
            "hubocean.io/test": 'true'
        }
    },
    "spec": {
        "distribution": {
            "published_by": {
                "contact": "LastName, FirstName <mail@address.earth>",
                "organisation": "HUB Ocean"
            },
            "published_date": "2019-06-19T06:00:00",
            "website": "https://hubocean.earth",
            "license": {
                "name": "proprietary",
                "full_text": "This is a very strict legal text describing the data license for this data collection. The lawyer would be proud.",
                "href": "www.license.com"
            }
        },
        "tags": ["test", "hubocean"]
    }
}

url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with the information you provided on creation:

json

{
  'kind': 'catalog.hubocean.io/dataCollection',
  'version': 'v1alpha1',
  'metadata': {
    'name': 'henriks-test-collection',
    'display_name': "Henrik's Test Collection",
    'description': 'A test data collection, containing datasets that i want to interact with.',
    'uuid': '686dbb41-d9cb-47ec-8b11-57ba893e7706',
    'labels': {
      'hubocean.io/test': 'true'
    },
    'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
  },
  'status': {
    'num_updates': 0,
    'created_time': '2023-08-09T08:48:27.595433',
    'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
    'updated_time': '2023-08-09T08:48:27.595433',
    'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
    'deleted_time': None,
    'deleted_by': None
  },
  'spec': {
    'distribution': {
      'published_by': {
        'contact': 'Torget, Henrik <henrik.torget@oceandata.earth>',
        'organisation': 'HUB Ocean'
      },
      'published_date': '2019-06-19T06:00:00',
      'website': 'https://hubocean.earth',
      'license': {
        'name': 'propriatary',
        'href': 'www.license.com',
        'full_text': 'This is a very strict legal text describing the data license for this data collection. The lawyer would be proud.'
      }
    },
    'tags': [
      'hubocean',
      'test'
    ]
  }
}

Get Data Collection

Did it work? Let's try to fetch it for our own sanity. We'll define what resource group and type along with the name of the data collection in the url.

python

resource_group = "catalog.hubocean.io"
resource_type = "dataCollection"

endpoint = f"/catalog/{resource_group}/{resource_type}/{dataset_collection_name}"
url = base_url + endpoint
response = requests.get(url, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

If it worked, the data collection object should be returned again.

json

{
  'kind': 'catalog.hubocean.io/dataCollection',
  'version': 'v1alpha1',
  'metadata': {
    'name': 'my-test-collection',
    'display_name': 'My Test Collection',
    'description': 'A test data collection, containing datasets that i want to interact with.',
    'uuid': '6eb61f98-3a68-489f-9d86-5e0cb6a6da17',
    'labels': {
      'hubocean.io/test': 'true'
    },
    'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
  },
  'status': {
    'num_updates': 0,
    'created_time': '2023-08-10T10:24:19.044762',
    'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
    'updated_time': '2023-08-10T10:24:19.044762',
    'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
    'deleted_time': None,
    'deleted_by': None
  },
  'spec': {
    'distribution': {
      'published_by': {
        'contact': 'LastName, FirstName <mail@address.earth>',
        'organisation': 'HUB Ocean'
      },
      'published_date': '2019-06-19T06:00:00',
      'website': 'https://hubocean.earth',
      'license': {
        'name': 'propriatary',
        'href': 'www.license.com',
        'full_text': 'This is a very strict legal text describing the data license for this data collection. The lawyer would be proud.'
      }
    },
    'tags': [
      'hubocean',
      'test'
    ]
  }
}

Create a dataset within the new collection

A collection is useful to keep our datasets organized - like a folder. Let's create a new empty dataset within our newly made collection. Here we are using the dataset name we have defined earlier. And we are specifying the collection we want to store it within.

python

# Create dataset inside collection

endpoint = "/catalog"
body = {
    "kind": "catalog.hubocean.io/dataset",
    "version": "v1alpha3",
    "metadata": {
        "name": f"{dataset_name}",
        "display_name": "My Seahorses",
        "description": "Testing seahorses",
        "labels": {
            "hubocean.io/test": "true"
        }
    },
    "spec": {
        "data_collection": f"catalog.hubocean.io/dataCollection/{dataset_collection_name}",
        "storage_class": "registry.hubocean.io/storageClass/tabular",
        "storage_controller": "registry.hubocean.io/storageController/storage-tabular",
        "maintainer": {
            "contact": "LastName, FirstName <mail@address.earth>",
            "organisation": "HUB Ocean"
        }
    }
}

url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

If it worked, the response should be the dataset object we defined:

json

{
  'kind': 'catalog.hubocean.io/dataset',
  'version': 'v1alpha3',
  'metadata': {
    'name': 'my-seahorses',
    'display_name': 'My Seahorses',
    'description': 'Testing seahorses',
    'uuid': '5109dfca-89c6-4bd7-8378-07c764ad7ba2',
    'labels': {
      'hubocean.io/test': 'true'
    },
    'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
  },
  'status': {
    'num_updates': 0,
    'created_time': '2023-08-10T10:31:28.212523',
    'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
    'updated_time': '2023-08-10T10:31:28.212523',
    'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
    'deleted_time': None,
    'deleted_by': None
  },
  'spec': {
    'storage_class': 'registry.hubocean.io/storageClass/tabular',
    'storage_controller': 'registry.hubocean.io/storageController/storage-tabular',
    'data_collection': 'catalog.hubocean.io/dataCollection/henriks-test-collection',
    'maintainer': {
      'contact': 'LastName, FirstName <mail@address.earth>',
      'organisation': 'HUB Ocean'
    },
    'citation': None,
    'documentation': [],
    'attributes': [],
    'tags': []
  }
}

List datasets within a collection

Let's list all the datasets within our collection. We can do this by using the collection name we defined earlier.

python

# List datasets within collection

endpoint = "/catalog/list"
body = {
    "selectors": [
        {"kind": "catalog.hubocean.io/dataset"},
        {
            "path": {
                "spec.data_collection": f"catalog.hubocean.io/dataCollection/{dataset_collection_name}"
            }
        }
    ]
}

url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with something like this:

json

{
  'results': [
    {
      'kind': 'catalog.hubocean.io/dataset',
      'version': 'v1alpha3',
      'metadata': {
        'name': 'my-seahorses',
        'display_name': 'My Seahorses',
        'description': 'Testing seahorses',
        'uuid': '5109dfca-89c6-4bd7-8378-07c764ad7ba2',
        'labels': {
          'hubocean.io/test': 'true'
        },
        'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
      },
      'status': {
        'num_updates': 0,
        'created_time': '2023-08-10T10:31:28.212523',
        'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
        'updated_time': '2023-08-10T10:31:28.212523',
        'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
        'deleted_time': None,
        'deleted_by': None
      },
      'spec': {
        'storage_class': 'registry.hubocean.io/storageClass/tabular',
        'storage_controller': 'registry.hubocean.io/storageController/storage-tabular',
        'data_collection': 'catalog.hubocean.io/dataCollection/henriks-test-collection',
        'maintainer': {
          'contact': 'LastName, FirstName <mail@address.earth>',
          'organisation': 'HUB Ocean'
        },
        'citation': None,
        'documentation': [],
        'attributes': [],
        'tags': []
      }
    }
  ],
  'prev': None,
  'next': None,
  'num_results': 1
}

Create a schema for the new dataset

Now we need to define how our data will look like when we upload it. For each "column" in our dataset, we define it's name and what datatype it is.

python

# Create table schema

kind = "catalog.hubocean.io/dataset"

endpoint = f"/data/{kind}/{dataset_name}/schema"
body = {
    "table_schema": {
        "Name": {
            "type": "string"
        },
        "Recommended": {
            "type": "bool"
        },
        "Rating": {
            "type": "int"
        },
        "Station": {
            "type": "string"
        },
        "ObservationDate": {
            "type": "string"
        },
        "Location": {
            "type": "geometry"
        }
    },
    "table_description": "Seahorses dataset with geospatial filtering",
    "geospatial_partition_columns": [
        "Location"
    ],
    "geospatial_partition_hash_precision": 5,
    "table_metadata": {
        "geometry": {
            "primary_location": "Location"
        }
    }
}

url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with:

json

{
  'status': 'OK',
  'message': 'Schema created.'
}

Get schema

Let's fetch the newly created schema to see what it looks like.

python

# Get Schema by dataset name

kind = "catalog.hubocean.io/dataset"

endpoint = f"/data/{kind}/{dataset_name}/schema"

url = base_url + endpoint
response = requests.get(url, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with:

json

{
  'Name': {
    'type': 'string',
    'metadata': {},
    'nullable': True
  },
  'Recommended': {
    'type': 'bool',
    'metadata': {},
    'nullable': True
  },
  'Rating': {
    'type': 'int',
    'metadata': {},
    'nullable': True
  },
  'Station': {
    'type': 'string',
    'metadata': {},
    'nullable': True
  },
  'ObservationDate': {
    'type': 'string',
    'metadata': {},
    'nullable': True
  },
  'Location': {
    'type': 'geometry',
    'metadata': {},
    'nullable': True
  }
}

Add datapoints to the dataset

This is where we actually add data to the dataset. We need to provide the data in a specific format, which is defined by the schema we created earlier.

python

# Create datapoints


kind = "catalog.hubocean.io/dataset"

endpoint = f"/data/{kind}/{dataset_name}"

body = {"data": [
    {
        "Name": "Gustav",
        "Recommended": True,
        "Rating": 5,
        "Station": "North Pole Station",
        "ObservationDate": "2015-09-01",
        "Location": {
            "coordinates": [
                4.887465709902727,
                59.32141637236472
            ],
            "type": "Point"
        }
    },
    {
        "Name": "Aurora",
        "Recommended": True,
        "Rating": 3,
        "Station": "Coral Cove",
        "ObservationDate": "2022-06-15",
        "Location": {
            "coordinates": [
                -87.630489,
                23.016650
            ],
            "type": "Point"
        }
    },
    {
        "Name": "Marlin",
        "Recommended": False,
        "Rating": 4,
        "Station": "Seagrass Haven",
        "ObservationDate": "2018-07-22",
        "Location": {
            "coordinates": [
                -80.104475,
                25.023989
            ],
            "type": "Point"
        }
    },
    {
        "Name": "Neptune",
        "Recommended": True,
        "Rating": 4,
        "Station": "Underwater Canyon",
        "ObservationDate": "2017-08-08",
        "Location": {
            "coordinates": [
                -67.518795,
                -35.205904
            ],
            "type": "Point"
        }
    },
    {
        "Name": "Sebastian",
        "Recommended": False,
        "Rating": 2,
        "Station": "Mangrove Coast",
        "ObservationDate": "2016-11-12",
        "Location": {
            "coordinates": [
                -87.218176,
                25.980086
            ],
            "type": "Point"
        }
    },
    {
        "Name": "Pearl",
        "Recommended": True,
        "Rating": 4,
        "Station": "Seahorse Cove",
        "ObservationDate": "2019-07-05",
        "Location": {
            "coordinates": [
                -73.662101,
                40.659304
            ],
            "type": "Point"
        }
    }
]
}

url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with:

json

{
  'status': 'OK'
}

Query for datapoints with OQS

Let's query for our dataset with the OQS query language. We can do that by calling the query endpoint.

python

# Query for our dataset with the OQS syntax.

resource_group = "catalog.hubocean.io"
resource_type = "dataset"

endpoint = f"/data/{resource_group}/{resource_type}/{dataset_name}/list"

body = {
    "#EQUALS": [
        "$Name",
        "Pearl"
    ]
}

url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)

if response.status_code == 200:
    print(response.json())

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

If you want to know more about the OQS syntax and it's flexibility, check out our guide on querying.

Delete dataset

If we want to delete the dataset we can do that by calling the delete endpoint.

python

# Delete dataset by name

resource_group = "catalog.hubocean.io"
resource_type = "dataset"

endpoint = f"/catalog/{resource_group}/{resource_type}/{dataset_name}"

url = base_url + endpoint
response = requests.delete(url, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with:

json

{
  'status': 'OK'
}

Delete Data Collection

If we want to delete a data collection we have created, we can send a delete request like the following.

python

# Delete data collection

resource_group = "catalog.hubocean.io"
resource_type = "dataCollection"
dataset_collection_name = "my-test-collection"

endpoint = f"/catalog/{resource_group}/{resource_type}/{dataset_collection_name}"

url = base_url + endpoint
response = requests.delete(url, json=body, headers=headers)

if response.status_code == 200:
    json_response = response.json()
    print(json_response)

else:
    print(f"Request failed with status code {response.status_code} - {response.text}")

Should respond with:

json

{
  'status': 'OK'
}

Congratulations! You have successfully consumed our API from the Jupyter Notebook environment. Feel free to explore more API endpoints in our api documentation.

If you have any further questions or need assistance, please reach out to our support team. Enjoy your data exploration!