Quickstart Guide for Consuming API from Jupyter Notebook
Welcome to the quickstart guide for consuming our low-level API from our Jupyter Notebook environment. This guide will walk you through the process of making API requests and demonstrate examples using Python code. Our Jupyter Notebook environment provides proximity to our data, making it convenient for data analysis and exploration.
In this guide we will cover the following topics:
- Fetching our token and setting headers.
- Listing available data collections.
- Creating a data collection.
- Creating a dataset.
- Creating a schema.
- Uploading data.
- Querying data.
- Deleting data collection.
- Deleting dataset.
To get started, please access our Jupyter Notebook environment by clicking on the following link: Jupyter Notebook Environment
Import the requests library and define variables
import requests
base_url = "https://api.hubocean.earth"
dataset_collection_name = "my-test-collection" # Use the existing name or make-your-own-computer-friendly-name
dataset_name = "my-seahorses" # Use the existing name or make-your-own-computer-friendly-name
Get a token and set headers
This is the first step to make any API request. The token is used to authenticate your requests. The token is valid for
24 hours, after which you need to request a new one. The token is stored in a variable called token
and is used in
the headers
variable. As you are already logged in with your user in the workspace, we can fetch what we need from the
workspace localhost on our server.
token = requests.post("http://localhost:8000/access_token").json()['token']
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
List available data collections
By using the /catalog/list endpoint and defining what we are looking for in our selector, we can get a list of all the collections of datasets available to us.
endpoint = "/catalog/list"
body = {
"#EQUALS": [
"$kind",
"catalog.hubocean.io/dataCollection"
]
}
url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with:
{
'results': [
..lots
of
collections..
]
}
Create data collection
We'll define our own data collection. Let's use the computer friendly name we defined, and set other information we want to store as metadata.
# Create data collection
endpoint = f"/catalog"
body = {
"kind": "catalog.hubocean.io/dataCollection",
"version": "v1alpha1",
"metadata": {
"name": f"{dataset_collection_name}",
"display_name": "My Test Collection",
"description": "A test data collection, containing datasets that i want to interact with.",
"labels": {
"hubocean.io/test": 'true'
}
},
"spec": {
"distribution": {
"published_by": {
"contact": "LastName, FirstName <mail@address.earth>",
"organisation": "HUB Ocean"
},
"published_date": "2019-06-19T06:00:00",
"website": "https://hubocean.earth",
"license": {
"name": "proprietary",
"full_text": "This is a very strict legal text describing the data license for this data collection. The lawyer would be proud.",
"href": "www.license.com"
}
},
"tags": ["test", "hubocean"]
}
}
url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with the information you provided on creation:
{
'kind': 'catalog.hubocean.io/dataCollection',
'version': 'v1alpha1',
'metadata': {
'name': 'henriks-test-collection',
'display_name': "Henrik's Test Collection",
'description': 'A test data collection, containing datasets that i want to interact with.',
'uuid': '686dbb41-d9cb-47ec-8b11-57ba893e7706',
'labels': {
'hubocean.io/test': 'true'
},
'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
},
'status': {
'num_updates': 0,
'created_time': '2023-08-09T08:48:27.595433',
'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'updated_time': '2023-08-09T08:48:27.595433',
'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'deleted_time': None,
'deleted_by': None
},
'spec': {
'distribution': {
'published_by': {
'contact': 'Torget, Henrik <henrik.torget@oceandata.earth>',
'organisation': 'HUB Ocean'
},
'published_date': '2019-06-19T06:00:00',
'website': 'https://hubocean.earth',
'license': {
'name': 'propriatary',
'href': 'www.license.com',
'full_text': 'This is a very strict legal text describing the data license for this data collection. The lawyer would be proud.'
}
},
'tags': [
'hubocean',
'test'
]
}
}
Get Data Collection
Did it work? Let's try to fetch it for our own sanity. We'll define what resource group and type along with the name of the data collection in the url.
resource_group = "catalog.hubocean.io"
resource_type = "dataCollection"
endpoint = f"/catalog/{resource_group}/{resource_type}/{dataset_collection_name}"
url = base_url + endpoint
response = requests.get(url, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
If it worked, the data collection object should be returned again.
{
'kind': 'catalog.hubocean.io/dataCollection',
'version': 'v1alpha1',
'metadata': {
'name': 'my-test-collection',
'display_name': 'My Test Collection',
'description': 'A test data collection, containing datasets that i want to interact with.',
'uuid': '6eb61f98-3a68-489f-9d86-5e0cb6a6da17',
'labels': {
'hubocean.io/test': 'true'
},
'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
},
'status': {
'num_updates': 0,
'created_time': '2023-08-10T10:24:19.044762',
'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'updated_time': '2023-08-10T10:24:19.044762',
'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'deleted_time': None,
'deleted_by': None
},
'spec': {
'distribution': {
'published_by': {
'contact': 'LastName, FirstName <mail@address.earth>',
'organisation': 'HUB Ocean'
},
'published_date': '2019-06-19T06:00:00',
'website': 'https://hubocean.earth',
'license': {
'name': 'propriatary',
'href': 'www.license.com',
'full_text': 'This is a very strict legal text describing the data license for this data collection. The lawyer would be proud.'
}
},
'tags': [
'hubocean',
'test'
]
}
}
Create a dataset within the new collection
A collection is useful to keep our datasets organized - like a folder. Let's create a new empty dataset within our newly made collection. Here we are using the dataset name we have defined earlier. And we are specifying the collection we want to store it within.
# Create dataset inside collection
endpoint = "/catalog"
body = {
"kind": "catalog.hubocean.io/dataset",
"version": "v1alpha3",
"metadata": {
"name": f"{dataset_name}",
"display_name": "My Seahorses",
"description": "Testing seahorses",
"labels": {
"hubocean.io/test": "true"
}
},
"spec": {
"data_collection": f"catalog.hubocean.io/dataCollection/{dataset_collection_name}",
"storage_class": "registry.hubocean.io/storageClass/tabular",
"storage_controller": "registry.hubocean.io/storageController/storage-tabular",
"maintainer": {
"contact": "LastName, FirstName <mail@address.earth>",
"organisation": "HUB Ocean"
}
}
}
url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
If it worked, the response should be the dataset object we defined:
{
'kind': 'catalog.hubocean.io/dataset',
'version': 'v1alpha3',
'metadata': {
'name': 'my-seahorses',
'display_name': 'My Seahorses',
'description': 'Testing seahorses',
'uuid': '5109dfca-89c6-4bd7-8378-07c764ad7ba2',
'labels': {
'hubocean.io/test': 'true'
},
'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
},
'status': {
'num_updates': 0,
'created_time': '2023-08-10T10:31:28.212523',
'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'updated_time': '2023-08-10T10:31:28.212523',
'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'deleted_time': None,
'deleted_by': None
},
'spec': {
'storage_class': 'registry.hubocean.io/storageClass/tabular',
'storage_controller': 'registry.hubocean.io/storageController/storage-tabular',
'data_collection': 'catalog.hubocean.io/dataCollection/henriks-test-collection',
'maintainer': {
'contact': 'LastName, FirstName <mail@address.earth>',
'organisation': 'HUB Ocean'
},
'citation': None,
'documentation': [],
'attributes': [],
'tags': []
}
}
List datasets within a collection
Let's list all the datasets within our collection. We can do this by using the collection name we defined earlier.
# List datasets within collection
endpoint = "/catalog/list"
body = {
"selectors": [
{"kind": "catalog.hubocean.io/dataset"},
{
"path": {
"spec.data_collection": f"catalog.hubocean.io/dataCollection/{dataset_collection_name}"
}
}
]
}
url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with something like this:
{
'results': [
{
'kind': 'catalog.hubocean.io/dataset',
'version': 'v1alpha3',
'metadata': {
'name': 'my-seahorses',
'display_name': 'My Seahorses',
'description': 'Testing seahorses',
'uuid': '5109dfca-89c6-4bd7-8378-07c764ad7ba2',
'labels': {
'hubocean.io/test': 'true'
},
'owner': '2883db88-4205-488a-aea6-6273ccbaad87'
},
'status': {
'num_updates': 0,
'created_time': '2023-08-10T10:31:28.212523',
'created_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'updated_time': '2023-08-10T10:31:28.212523',
'updated_by': '2883db88-4205-488a-aea6-6273ccbaad87',
'deleted_time': None,
'deleted_by': None
},
'spec': {
'storage_class': 'registry.hubocean.io/storageClass/tabular',
'storage_controller': 'registry.hubocean.io/storageController/storage-tabular',
'data_collection': 'catalog.hubocean.io/dataCollection/henriks-test-collection',
'maintainer': {
'contact': 'LastName, FirstName <mail@address.earth>',
'organisation': 'HUB Ocean'
},
'citation': None,
'documentation': [],
'attributes': [],
'tags': []
}
}
],
'prev': None,
'next': None,
'num_results': 1
}
Create a schema for the new dataset
Now we need to define how our data will look like when we upload it. For each "column" in our dataset, we define it's name and what datatype it is.
# Create table schema
kind = "catalog.hubocean.io/dataset"
endpoint = f"/data/{kind}/{dataset_name}/schema"
body = {
"table_schema": {
"Name": {
"type": "string"
},
"Recommended": {
"type": "bool"
},
"Rating": {
"type": "int"
},
"Station": {
"type": "string"
},
"ObservationDate": {
"type": "string"
},
"Location": {
"type": "geometry"
}
},
"table_description": "Seahorses dataset with geospatial filtering",
"geospatial_partition_columns": [
"Location"
],
"geospatial_partition_hash_precision": 5,
"table_metadata": {
"geometry": {
"primary_location": "Location"
}
}
}
url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with:
{
'status': 'OK',
'message': 'Schema created.'
}
Get schema
Let's fetch the newly created schema to see what it looks like.
# Get Schema by dataset name
kind = "catalog.hubocean.io/dataset"
endpoint = f"/data/{kind}/{dataset_name}/schema"
url = base_url + endpoint
response = requests.get(url, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with:
{
'Name': {
'type': 'string',
'metadata': {},
'nullable': True
},
'Recommended': {
'type': 'bool',
'metadata': {},
'nullable': True
},
'Rating': {
'type': 'int',
'metadata': {},
'nullable': True
},
'Station': {
'type': 'string',
'metadata': {},
'nullable': True
},
'ObservationDate': {
'type': 'string',
'metadata': {},
'nullable': True
},
'Location': {
'type': 'geometry',
'metadata': {},
'nullable': True
}
}
Add datapoints to the dataset
This is where we actually add data to the dataset. We need to provide the data in a specific format, which is defined by the schema we created earlier.
# Create datapoints
kind = "catalog.hubocean.io/dataset"
endpoint = f"/data/{kind}/{dataset_name}"
body = {"data": [
{
"Name": "Gustav",
"Recommended": True,
"Rating": 5,
"Station": "North Pole Station",
"ObservationDate": "2015-09-01",
"Location": {
"coordinates": [
4.887465709902727,
59.32141637236472
],
"type": "Point"
}
},
{
"Name": "Aurora",
"Recommended": True,
"Rating": 3,
"Station": "Coral Cove",
"ObservationDate": "2022-06-15",
"Location": {
"coordinates": [
-87.630489,
23.016650
],
"type": "Point"
}
},
{
"Name": "Marlin",
"Recommended": False,
"Rating": 4,
"Station": "Seagrass Haven",
"ObservationDate": "2018-07-22",
"Location": {
"coordinates": [
-80.104475,
25.023989
],
"type": "Point"
}
},
{
"Name": "Neptune",
"Recommended": True,
"Rating": 4,
"Station": "Underwater Canyon",
"ObservationDate": "2017-08-08",
"Location": {
"coordinates": [
-67.518795,
-35.205904
],
"type": "Point"
}
},
{
"Name": "Sebastian",
"Recommended": False,
"Rating": 2,
"Station": "Mangrove Coast",
"ObservationDate": "2016-11-12",
"Location": {
"coordinates": [
-87.218176,
25.980086
],
"type": "Point"
}
},
{
"Name": "Pearl",
"Recommended": True,
"Rating": 4,
"Station": "Seahorse Cove",
"ObservationDate": "2019-07-05",
"Location": {
"coordinates": [
-73.662101,
40.659304
],
"type": "Point"
}
}
]
}
url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with:
{
'status': 'OK'
}
Query for datapoints with OQS
Let's query for our dataset with the OQS query language. We can do that by calling the query endpoint.
# Query for our dataset with the OQS syntax.
resource_group = "catalog.hubocean.io"
resource_type = "dataset"
endpoint = f"/data/{resource_group}/{resource_type}/{dataset_name}/list"
body = {
"#EQUALS": [
"$Name",
"Pearl"
]
}
url = base_url + endpoint
response = requests.post(url, json=body, headers=headers)
if response.status_code == 200:
print(response.json())
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
If you want to know more about the OQS syntax and it's flexibility, check out our guide on querying.
Delete dataset
If we want to delete the dataset we can do that by calling the delete endpoint.
# Delete dataset by name
resource_group = "catalog.hubocean.io"
resource_type = "dataset"
endpoint = f"/catalog/{resource_group}/{resource_type}/{dataset_name}"
url = base_url + endpoint
response = requests.delete(url, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with:
{
'status': 'OK'
}
Delete Data Collection
If we want to delete a data collection we have created, we can send a delete request like the following.
# Delete data collection
resource_group = "catalog.hubocean.io"
resource_type = "dataCollection"
dataset_collection_name = "my-test-collection"
endpoint = f"/catalog/{resource_group}/{resource_type}/{dataset_collection_name}"
url = base_url + endpoint
response = requests.delete(url, json=body, headers=headers)
if response.status_code == 200:
json_response = response.json()
print(json_response)
else:
print(f"Request failed with status code {response.status_code} - {response.text}")
Should respond with:
{
'status': 'OK'
}
Congratulations! You have successfully consumed our API from the Jupyter Notebook environment. Feel free to explore more API endpoints in our api documentation.
If you have any further questions or need assistance, please reach out to our support team. Enjoy your data exploration!