Splane&Scube tutorial (1/2): Identify uniform spatial domain on human brain MERFISH dataset
July 2023
Dataset: 33 MERFISH slices of mouse brain (here)
Data preprocessing
[1]:
from SPACEL.setting import set_environ_seed
set_environ_seed(42)
from SPACEL import Splane
import scanpy as sc
import numpy as np
import pandas as pd
import matplotlib
[2]:
st_merfish = sc.read_h5ad('../data/merfish_mouse_brain/merfish_mouse_brain.h5ad')
Here, we will incorporate the cell type composition predicted by Spoint into the spatial anndata object for subsequent spatial domain identification in Splane using the add_cell_type_composition
function. This function takes a DataFrame containing the cell type composition matrix as input for spot-based spatial transcriptomic data or a series of cell type annotations as input for single-cell resolution spatial transcriptomic data.
[ ]:
Splane.utils.add_cell_type_composition(st_merfish, celltype_anno=st_merfish.obs['label'])
adata_list = Splane.utils.split_ad(st_merfish,'slice_id')
Training Splane model
In this step, we initialize the Splane model by Splane.init_model(...)
using the anndata object list as input. The n_clusters
parameter determines the number of spatial domains to be identified. The k
parameter controls the degree of neighbors considered in the model, with a larger k
value resulting in more emphasis on global structure rather than local structure. The gnn_dropout
parameter influences the level of smoothness in the model’s predictions, with a higher
gnn_dropout
value resulting in a smoother output that accommodates the sparsity of the spatial transcriptomics data.
We train the model by splane.train(...)
to obtain latent feature of each spots/cells. The parameter d_l
affects the level of batch effect correction between slices. By default, d_l
is 0.2
for spatial transcriptomics data with single cell resolution.
Then, we can identify the spatial domain to which each spot/cell belongs by splane.identify_spatial_domain(...)
. By default, the results will be saved in spatial_domain
column in .obs
. If the key parameter is provided, the results will be saved in .obs[key]
.
[6]:
splane_model = Splane.init_model(adata_list, n_clusters=7,use_gpu=False,n_neighbors=25, gnn_dropout=0.5)
splane_model.train(d_l=0.2)
splane_model.identify_spatial_domain()
Setting environment seed: 42
Setting global seed: 42
Calculating cell type weights...
Generating GNN inputs...
Calculating largest eigenvalue of normalized graph Laplacian...
Calculating Chebyshev polynomials up to order 2...
The best epoch 115 total loss=-16.317 g loss=-15.619 d loss=3.488 d acc=0.060 simi loss=-0.997 db loss=0.614: 17%|█▋ | 170/1000 [7:43:09<37:41:19, 163.47s/it]
Stop trainning because of loss convergence
[7]:
sc.concat(adata_list).write(f'../data/merfish_mouse_brain/merfish_mouse_brain.h5ad')