Splane&Scube tutorial (1/2): Identify uniform spatial domain on human brain MERFISH dataset

July 2023

Dataset: 33 MERFISH slices of mouse brain (here)

Data preprocessing

[1]:

from SPACEL.setting import set_environ_seed
set_environ_seed(42)
from SPACEL import Splane
import scanpy as sc
import numpy as np
import pandas as pd
import matplotlib

[2]:

st_merfish = sc.read_h5ad('../data/merfish_mouse_brain/merfish_mouse_brain.h5ad')

Here, we will incorporate the cell type composition predicted by Spoint into the spatial anndata object for subsequent spatial domain identification in Splane using the add_cell_type_composition function. This function takes a DataFrame containing the cell type composition matrix as input for spot-based spatial transcriptomic data or a series of cell type annotations as input for single-cell resolution spatial transcriptomic data.

[ ]:

Splane.utils.add_cell_type_composition(st_merfish, celltype_anno=st_merfish.obs['label'])
adata_list = Splane.utils.split_ad(st_merfish,'slice_id')

Training Splane model

In this step, we initialize the Splane model by Splane.init_model(...) using the anndata object list as input. The n_clusters parameter determines the number of spatial domains to be identified. The k parameter controls the degree of neighbors considered in the model, with a larger k value resulting in more emphasis on global structure rather than local structure. The gnn_dropout parameter influences the level of smoothness in the model’s predictions, with a higher gnn_dropout value resulting in a smoother output that accommodates the sparsity of the spatial transcriptomics data.

We train the model by splane.train(...) to obtain latent feature of each spots/cells. The parameter d_l affects the level of batch effect correction between slices. By default, d_l is 0.2 for spatial transcriptomics data with single cell resolution.

Then, we can identify the spatial domain to which each spot/cell belongs by splane.identify_spatial_domain(...). By default, the results will be saved in spatial_domain column in .obs. If the key parameter is provided, the results will be saved in .obs[key].

[6]:

splane_model = Splane.init_model(adata_list, n_clusters=7,use_gpu=False,n_neighbors=25, gnn_dropout=0.5)
splane_model.train(d_l=0.2)
splane_model.identify_spatial_domain()

Setting environment seed: 42
Setting global seed: 42
Calculating cell type weights...
Generating GNN inputs...
Calculating largest eigenvalue of normalized graph Laplacian...
Calculating Chebyshev polynomials up to order 2...

The best epoch 115 total loss=-16.317 g loss=-15.619 d loss=3.488 d acc=0.060 simi loss=-0.997 db loss=0.614:  17%|█▋        | 170/1000 [7:43:09<37:41:19, 163.47s/it]

Stop trainning because of loss convergence

[7]:

sc.concat(adata_list).write(f'../data/merfish_mouse_brain/merfish_mouse_brain.h5ad')