Spatialdata - A quick introduction#

This tutorial provides a quick overview over key functionalities of the spatialdata format. This includes

  1. Data modalities

  2. Plotting, and

  3. Visualization in napari image viewers.

For detailed explanations and tutorials, check out the official documentation

Getting started#

In the following, we will just explore the spatialdata format a little more. Note that you should also checkout the official documentation documentation, and especially the tutorials on the spatialdata format

and spatialelements

Modules#

import numpy as np
import shapely
import geopandas
import matplotlib.pyplot as plt

import spatialdata as sd
from spatialdata.models import Image2DModel

import spatialdata_plot
from napari_spatialdata import Interactive
%%capture
spatialdata_plot

Data#

Let’s get a built-in mock dataset from spatialdata

sdata = sd.datasets.blobs(extra_coord_system="second_coordinate_system")

Read/Write#

Spatialdata has a native on-disk format, based on the .zarr standard, that can be used to store the data. De-facto, .zarr stores all data in a directory with subdirectories and specific metadata files. (.zattrs)

path = "/path/to/file/"

# Write spatialdata 
sdata.write(path)

# Read spatialdata 
sdata = sd.read_zarr(path)

Access attributes#

Let’s explore the structure of the spatialdata object. You can see that the dataset contains multiple attributes

sdata
SpatialData object
├── Images
│     ├── 'blobs_image': DataArray[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': DataTree[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': DataArray[yx] (512, 512)
│     └── 'blobs_multiscale_labels': DataTree[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)
    ▸ 'second_coordinate_system', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)

You can access these attributes as class attributes (output hidden)

%%capture
# Access `images`
sdata.images

Their contents behave like a dictionary (output hidden)

%%capture
# Access the `blobs_image` layer in `.images`
sdata.images["blobs_image"]

Images#

Images are stored as xarray.DataArrays (or, in case of pyramidal/multiscale images as xarray.DataTree). xarray supports the lazy loading of data via its dask interface, i.e. it only loads the parts of the image that are required for your computations. This allows users to access imaging data that is significanly larger than their memory.

type(sdata.images["blobs_image"])
xarray.core.dataarray.DataArray

Images have the dimensions \((c, y, x)\), where \(c\) is the number of channels (e.g \(c=3\) for RGB images, \(c=C\) for \(C\) different fluorescence channels), \(y\) the number of rows, and \(x\) the number of columns.

sdata.images["blobs_image"]
<xarray.DataArray 'image' (c: 3, y: 512, x: 512)> Size: 6MB
dask.array<array, shape=(3, 512, 512), dtype=float64, chunksize=(3, 512, 512), chunktype=numpy.ndarray>
Coordinates:
  * c        (c) int64 24B 0 1 2
  * y        (y) float64 4kB 0.5 1.5 2.5 3.5 4.5 ... 508.5 509.5 510.5 511.5
  * x        (x) float64 4kB 0.5 1.5 2.5 3.5 4.5 ... 508.5 509.5 510.5 511.5
Attributes:
    transform:  {'global': Identity , 'second_coordinate_system': Identity }

Visualise#

Let’s plot the image

sdata.pl.render_images("blobs_image").pl.show(coordinate_systems="global")
../_images/6ec7d373947bfa78b5986a814f3cffc1e1a2d32ea6deb96081cd8961d642c7df.png

Shapes#

Shapes represent vectorized annotations. This means that each shape is defined as a set of points in a coordinate space. For example to define a small triangle as vectorized shape, you would define it with 4 points (3 unique points and 1 point to close the loop):

(Pseudocode) triangle := { Point1(0, 0), Point2(1, 0), Point3(0, 1), Point4(0, 0) }

Spatialdata stores such vectorized annotations in its .shapes attribute. The shapes attribute can store different types of shapes, including Polygons (arbitrary shapes) and Points. The shapes attribute can store multiple distinct and independent spatial elements.

# Access shapes (3 independent elements: 'blobs_circles', 'blobs_polygons', 'blobs_multipolygons')
list(sdata.shapes.keys())
['blobs_circles', 'blobs_polygons', 'blobs_multipolygons']

Each spatialelement is represented as geopandas.GeoDataFrame

sdata.shapes["blobs_polygons"]
geometry
0 POLYGON ((340.197 258.214, 316.177 197.065, 29...
1 POLYGON ((284.141 420.454, 267.249 371.319, 25...
2 POLYGON ((203.195 229.528, 285.506 204.414, 19...
3 POLYGON ((240.46 196.846, 227.206 188.623, 149...
4 POLYGON ((446.703 349.433, 421.086 258.9, 369....

Let’s plot the data

(
    sdata.pl.render_shapes("blobs_circles", alpha=0.5, color="#16A085")  # green
    .pl.render_shapes("blobs_polygons", alpha=0.5, color="#2980B9")  # blue
    .pl.render_shapes("blobs_multipolygons", alpha=0.5, color="#cccccc")  # gray
    .pl.show(coordinate_systems="global")
)
INFO     Value for parameter 'color' appears to be a color, using it as such.                                      
INFO     Value for parameter 'color' appears to be a color, using it as such.                                      
INFO     Value for parameter 'color' appears to be a color, using it as such.
../_images/af5c45293f6e21c2ec18ed791ff1df1ccd098e2cb62c8c852f460e25ccab028f.png

Background - Shapes representation in spatialdata#

shapely#

spatialdata uses shapely to define the shapes. We can define our own shape

triangle = shapely.Polygon([[0, 0], [1, 0], [0, 1], [0, 0]])
triangle
../_images/81b81b179d35216fafff0a71e41f180935756e0adc4ac9243f7117fd5e2098e5.svg

geopandas#

Spatialdata stores multiple shape annotations as geopandas.GeoDataFrame.

geopandas.GeoDataFrames behave exactly like a pandas dataframe, except that they support spatial data in a designated geopandas.GeoSeries column (usually denoted with geometry)

We can add the previously defined shape to a geopandas dataframe:

geodf = geopandas.GeoDataFrame(data={"name": ["triangle", "triangle2"]}, geometry=[triangle, triangle])
geodf
name geometry
0 triangle POLYGON ((0 0, 1 0, 0 1, 0 0))
1 triangle2 POLYGON ((0 0, 1 0, 0 1, 0 0))
# Geopandas behaves like pandas. E.g. Select the first row in the geodataframe
geodf.iloc[[0]]
name geometry
0 triangle POLYGON ((0 0, 1 0, 0 1, 0 0))

Tables#

Tables are used in spatialdata to store omics modalities. Under the hood, tables are represented by an anndata object. Anndata is commonly used to store -omics data as it provides functionalities to jointly anlayse the actual measurements and associated metadata.

!!! Tables support by DVP IO#

DVP IO does currently not provide reading functionalities for proteomics data. This feature will be introduced in early 2025

sdata.tables["table"]
AnnData object with n_obs × n_vars = 26 × 3
    obs: 'instance_id', 'region'
    uns: 'spatialdata_attrs'

Get observation-specific metadata

adata = sdata.tables["table"]

adata.obs.head()
instance_id region
1 1 blobs_labels
2 2 blobs_labels
3 3 blobs_labels
4 4 blobs_labels
5 5 blobs_labels

Get variable-specific metadata

adata.var.head()
channel_0_sum
channel_1_sum
channel_2_sum

Assigning values#

We can also add new attributes to a spatialdata object. To assure the compatibility of data with its format, spatialdata implements spatialdata.models classes, that define required attributes and processing rules for the different types of data. To assign a new object to spatialdata, we need to parse it with the suitable model class. For details, please see

Example#

Say, that we would like to add another image to our spatialdata object. Let’s define a small 100 px x 100px RGB image (all white) as array and pass it to spatialdata.

# Define new image (all white, 3 channels [RGB])
new_image = np.ones(shape=(100, 100, 3))

We cannot immediately assign this image to spatialdata. This will fail:

try:
    # This code block fails
    sdata["new_image"] = new_image
except TypeError as e:
    print(f"TypeError: {e}")
TypeError: Unsupported type <class 'numpy.ndarray'>

Instead, we first have to parse the image with the designated model class spatialdata.models.Image2DModel

new_image_parsed = Image2DModel.parse(
    new_image,
    dims=("x", "y", "c"),  # define axes of image (x, y, channels)
    c_coords=("r", "g", "b"),  # optional: name channels
)


# Assign image to existing spatialdata, this works
sdata.images["new_image"] = new_image_parsed
INFO     Transposing `data` of type: <class 'dask.array.core.Array'> to ('c', 'y', 'x').

We have to proceed similarly for other attributes

  • .tables: spatialdata.models.TableModel

  • .shapes: spatialdata.models.ShapesModel

Plotting#

As showcased above, spatialdata implements plotting functionalities that allow you to visualize your omics dataset in static images. To do so, spatialdata relies on its extension package spatialdata_plot. Make sure to check out its documentation

spatialdata_plot can be used by employing the .pl accessor, followed by a render_<attribute> command. To show the final plot, call .pl.show()

(sdata.pl.render_images().pl.render_shapes().pl.show(coordinate_systems="global"))
../_images/7e866b78c52a3e02399a60342ec4b7b5c3e0cade50d6f90fa00973ade0fbf2ee.png

Interactive viewing in Napari#

You can inspect the data in napari using the napari_spatialdata plugin. For details, take a look at the official documentation

# Start session
interactive = Interactive(sdata)

# # Basic viz:
interactive.add_element("blobs_image", element_coordinate_system="global")
interactive.add_element("blobs_circles", element_coordinate_system="global")
interactive.add_element("blobs_labels", element_coordinate_system="global")


# If you would like to explore it locally, uncomment the following line:
# interactive.run()

# Visualize screenshot
screenshot = interactive.screenshot()
plt.imshow(screenshot)
plt.axis("off")
(np.float64(-0.5), np.float64(3023.5), np.float64(1885.5), np.float64(-0.5))
../_images/25a413c7a6a4733213434990e129c6b8ec8bc4e21a84effe04538a713b6090eb.png