Skip to content

Geographic Data

This guide aims to highlight the vital parts of interacting with geographical data in a tabular dataset.

It contains excepts from an example in our SDK repository.

Let's look at the data we want to use first.

GeoJSON example
{
  "name": "Oslo",
  "location": {
    "type": "Point",
    "coordinates": [
      10.74609,
      59.91273
    ]
  }
}

The objects look like this, simply a name and a position. The position is defined with GeoJSON, which suits our platform well.

We also accept WKT (Well Known Text) and WKB (Well Known Binary).

In WKT the point would be POINT(10.74609 59.91273).

WKT example
{
  "name": "Oslo",
  "location": "POINT(10.74609 59.91273)"
}

And in WKB: 0101000000CBBE2B82FF7D2540FF092E56D4F44D40.

WKB example
{
  "name": "Oslo",
  "location": "0101000000CBBE2B82FF7D2540FF092E56D4F44D40"
}

Well Known Binary

WKB is not very human-readable, but machines love binary. That's why we are converting datapoints from what format you choose to put in - and store them as WKB.

This is the schema for the data we looked at above. It defines what each column should be stored as. String for the name of a feature, and geometry the position of it.

Schema
{
  "name": {
    "type": "string"
  },
  "location": {
    "type": "geometry"
  }
}

Partitioning

Partitioning is not mandatory when ingesting data to the Ocean Data Platform, but greatly affects querying speed. When it comes to geospatial data management and optimization within our system, we utilize partitioning strategies to enhance query performance and data organization. For geopartitioning, we use the "geohash" technique. If you are curious to how it works, there is a great geohash Wikipedia page on it.

Geohashing

Geohashing is a way to map the Earth's surface using a mix of letters and numbers to create unique "addresses" for different areas. Imagine dividing the whole planet into a big grid where each square of the grid gets a special code. At first, these squares are pretty big, covering large areas. But here's where it gets interesting: we can make the squares smaller and smaller by adding more characters to the code. Each time we add a character, we divide a square into smaller squares, zooming in on a specific spot. This process can repeat many times, making the address more precise with each step.

The beauty of geohashing lies in its simplicity and precision. With just a string of characters, we can pinpoint any location on Earth, from a broad region to a specific point in a city. It's like giving the world an easy-to-use digital address system.

This method is fantastic for storing, searching, and organizing geographical data because it's straightforward yet powerful. Whether you're mapping out cities or pinpointing a hiking spot, geohashing provides a neat way to say exactly where it is on the planet.

When we want to geohash, we can decide how much precision we need.

img_27.png
Geohash length table from Wikipedia

Why not just pick 8 every time?

If your datasett is relatively small, then having too much precision can affect the performance negatively. Try to find a good balance.

Choosing the right partitioning precision for your data

Here's a list of examples to give an idea of what to pick.

1 Digit Geohash: ±2,500 km error.

Ideal for broad oceanic regions, such as distinguishing between the Pacific, Atlantic, and Indian Oceans.

2 Digits Geohash: ±630 km error.

Suitable for large marine areas like the Gulf of Mexico or the Coral Sea, useful for regional marine management and overview analyses.

3 Digits Geohash: ±78 km error.

Good for identifying specific marine ecosystems or large bays, such as the Chesapeake Bay or the Great Barrier Reef areas.

4 Digits Geohash: ±20 km error. Enables mapping of smaller marine protected areas, significant coral reefs, or coastal

zones. Useful for fisheries management at a more localized level.

5 Digits Geohash: ±2.4 km error.

Useful for detailed study of specific habitats or smaller islands and their surrounding waters, aiding in conservation efforts or site-specific studies.

6 Digits Geohash: ±0.61 km error.

Allows for precise tracking and mapping of individual marine assets, such as buoys, underwater vehicles, or detailed mapping of coral reefs.

7 Digits Geohash: ±0.076 km error.

Excellent for pinpointing specific locations such as shipwrecks, underwater archaeological sites, or precise locations for scientific sampling.

8 Digits Geohash: ±0.019 km error.

Provides extremely high precision, perfect for tracking marine wildlife with fine-scale movements or for detailed asset tracking, like monitoring individual containers on ships or detailed navigation routes.

Defining partitioning for your schema

Let's define a partitioning scheme for our data.

The name of the column we want to add the geopartitioning to is the location column. We have a list of large cities all over the world, so precision 2 seems suitable, so we'll put that as the argument for our transformer.

Partitioning
[
  {
    "columns": [
      "location"
    ],
    "transformer_name": "geohash",
    "args": [
      2
    ]
  }
]

Note: The geohash will only work with Point-type geometries. Other geometry types will result in an error.