SPACEL workflow (1/3): Deconvolution by Spoint on mouse brain ST dataset

July 2023

Dataset: 75 ST slices of mouse brain (here)

[2]:
import pandas as pd
import scanpy as sc
import anndata
import os
from tqdm import tqdm
import scanpy as sc
import numpy as np
import sys

Load spatial transcriptomics data

The input data are anndata objects stored raw counts for scRNA-seq and ST. The scRNA-seq anndata must have cell type annotation in .obs.

[3]:
adata = sc.read('../data/ST_mouse_brain/mouse_brain_st.h5ad')
scadata = sc.read_h5ad('../data/ST_mouse_brain/scRNA_Mouse_Nervous_System.h5ad')
[4]:
scadata.var_names_make_unique()
scadata.obs_names_make_unique()

Initialize and train the Spoint model

In this step, we initialize the Spoint model using anndata objects for scRNA-seq and ST as input. Thecelltype_key parameter represents the column name of the cell type annotation in the .obs attribute of the scRNA-seq anndata object. The sm_size parameter controls the number of simulated spots, and it is important to have a sufficient sm_size for accurate prediction. However, it should be noted that increasing the sm_size will also increase the simulation and training time. In general, we recommend setting sm_size to a value greater than 100,000.

[5]:
import SPACEL
from SPACEL.setting import set_environ_seed
set_environ_seed()
from SPACEL import Spoint
Setting environment seed: 42
Using GPU: 1
Global seed set to 0
[6]:
spoint_model = Spoint.init_model(scadata,adata,celltype_key='Description',sm_size=500000,use_gpu=True,n_threads=2)
spoint_model.train(max_steps=5000, batch_size=1024)
Setting global seed: 42
### Finding marker genes...
Adrenergic cell groups of the medulla              200
Noradrenergic neurons of the medulla               200
Non-telencephalon astrocytes, fibrous              200
Non-border Cck interneurons, hippocampus           200
Non-border Cck interneurons, cortex/hippocampus    200
                                                  ...
Neuroblasts, cerebellum                            122
Purkinje cells                                     116
Granular layer interneurons, cerebellum             67
Pmch neurons, hypothalamus                          46
Neuroblast-like, habenula                           18
Name: Description, Length: 103, dtype: int64
### Used gene numbers: 5723
### Initializing sample probability
### Genetating simulated spatial data using scRNA data with mode: unbalance
### Genetating simulated spatial data using scRNA data with mode: sqrt
### Genetating simulated spatial data using scRNA data with mode: balance
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
Epoch 100/100: 100%|██████████| 100/100 [51:24<00:00, 32.87s/it, loss=3.79e+03, v_num=1]
`Trainer.fit` stopped: `max_epochs=100` reached.
Epoch 100/100: 100%|██████████| 100/100 [51:24<00:00, 30.84s/it, loss=3.79e+03, v_num=1]
Step 5000: test inf loss=-0.787, train inf loss=-0.736, test rec loss=-0.381, train rec loss=-0.345, st test rec loss=-0.399, mmd loss=0.015: 100%|██████████| 5000/5000 [4:36:21<00:00,  3.32s/it]

Then, we utilize the trained model to predict the cell type composition of each spot in the spatial transcriptomics data. This prediction will generate a DataFrame object, where each row corresponds to a spot in the spatial transcriptomics data, each column represents a cell type from the single-cell RNA-seq data, and each entry indicates the proportion of a particular cell type in a spot. Additionally, we can obtain the anndata object of the spatial transcriptomics data with the deconvolution results embedded in the .obs attribute.

[7]:
pre = spoint_model.deconv_spatial()
st_ad = spoint_model.st_ad
st_ad.write('../data/ST_mouse_brain/mouse_brain_st.h5ad')

Visualization results

As a demonstration, we plotted the predicted compositions of cell types on a slice, where the cell types are selected as the top 20 with the highest proportion within each spot.

[8]:
import matplotlib
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['ps.fonttype'] = 42
matplotlib.rcParams['font.serif'] = ['Arial']
sc.settings.set_figure_params(dpi=50,dpi_save=300,facecolor='white',fontsize=10,vector_friendly=True,figsize=(3,3))
sc.settings.verbosity = 3
[9]:
st_ad = st_ad[st_ad.obs.slice==37]
celltype = st_ad.obs.loc[:,pre.columns].max(0)
[10]:
sc.pl.embedding(st_ad,color=celltype.sort_values(ascending=False)[:20].index,basis='spatial',ncols=5)
../_images/tutorials_ST_mouse_brain_Spoint_18_0.png