Scube tutorial: 3D expression modeling with gaussian process regression

July 2023

Dataset: 3D STARmap of mouse brain (here).

Note: If you want to run 3D expression GPR model in Scube, you need to install the Open3D python library first.

[2]:

from SPACEL.setting import set_environ_seed
set_environ_seed()
from SPACEL import Scube
from SPACEL.Scube.utils_3d import create_mesh, smooth_mesh, sample_in_mesh, get_surface_colors, save_view_parameters, load_view_parameters
import scanpy as sc
import numpy as np
import pandas as pd
np.random.seed(42)

Setting environment seed: 42

Load data

The input data type of 3D expression model is normalized expression and 3D coordinates of each spot/cell.

[5]:

st_ad = sc.read_h5ad('../data/starmap_3d_mouse_brain/starmap_3d_mouse_brain.h5ad')
sc.pp.normalize_total(st_ad,target_sum=1e4)
sc.pp.log1p(st_ad)

# Normalized expression data (rows as spots/cells, columns as genes)
norm_expr = st_ad.to_df()

# 3D location (rows as spots/cells, columns as x,y,z axis coordinate)
loc = st_ad.obsm['spatial'].copy()

Generate 3D surface

In this step, we create a triangle mesh from 3D location coordinates of spots/cells by using alpha shape methods. By default the parameter α is automatically estimated, which affects the number of triangles in the mesh.

[6]:

mesh = create_mesh(loc.values,alpha=80,show=False)

alpha=80.000
[Open3D WARNING] [CreateFromPointCloudAlphaShape] invalid tetra in TetraMesh

To obtain smoothed mesh, we can used Taubin filter and mesh subdivision for the mesh. By increasing taubin_iter and subdivide_iter will generate a smoother mesh.

[7]:

mesh = smooth_mesh(mesh,taubin_iter=10,subdivide_iter=3,show=False)

filter with Taubin with 10 iterations
After subdivision it has 345092 vertices and 690176 triangles

Sampling 500000 location coordinates from inside the mesh and getting the vertices of the mesh surface.

[8]:

loc_sampled,loc_surface = sample_in_mesh(mesh,loc.values)

Gaussian process regression

For better performance of GPR model, we recommend to resize the location coordinates value to about 0 to 10.

[9]:

loc = loc/1000
loc_sampled = loc_sampled/1000

Then, we construct the GPR model and training it until convergence. If GPU devices avaliable, set use_gpu to True

[10]:

gpr_model = Scube.GPRmodel(norm_expr,loc.values,loc_sampled,used_genes=norm_expr.columns,use_gpu=False)

[11]:

gpr_model.train(lr=.02,training_iter=50,save_model=True,cal_bf=True,optim_l=True,optimize_method='Adam')

Modeling Slc17a7
Optimize lenthscale...

/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/SPACEL-1.0a0-py3.8.egg/SPACEL/Scube/gpr.py:107: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  constant = torch.tensor(self.train_y.mean())
/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/gpytorch/functions/_pivoted_cholesky.py:118: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release.
torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
X = torch.triangular_solve(B, A).solution
should be replaced with
X = torch.linalg.solve_triangular(A, B). (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/BatchLinearAlgebra.cpp:2189.)
  [L, torch.triangular_solve(Krows[..., m:, :].transpose(-1, -2), L, upper=False)[0].transpose(-1, -2)],

Initialize lenthscale alpha as 1.000
Best model loss: 1.1666253805160522
Best model loss: 1.548324704170227
0.3816993236541748
Modeling Mgp
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.6543543338775635
Best model loss: 0.991144061088562
0.33678972721099854
Modeling Gad1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.7622023820877075
Best model loss: 1.8489148616790771
0.08671247959136963
Modeling Nov
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.706119179725647
Best model loss: 1.7335630655288696
0.027443885803222656
Modeling Rasgrf2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.5395879149436951
Best model loss: 0.8652480244636536
0.3256601095199585
Modeling Rorb
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.9758266806602478
Best model loss: 1.1724785566329956
0.1966518759727478
Modeling Cux2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.439920425415039
Best model loss: 2.089233636856079
0.64931321144104
Modeling Plcxd2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.360386848449707
Best model loss: 1.7752330303192139
0.41484618186950684
Modeling Sulf2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.696236789226532
Best model loss: 0.8046963810920715
0.10845959186553955
Modeling Ctgf
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3529804944992065
Best model loss: 1.5123544931411743
0.15937399864196777
Modeling Pcp4
Optimize lenthscale...

/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/gpytorch/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-06 to the diagonal
  warnings.warn(

Initialize lenthscale alpha as 1.000
Best model loss: 1.4984480142593384
Best model loss: 1.8526679277420044
0.354219913482666
Modeling Sema3e
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.415098786354065
Best model loss: 1.5002400875091553
0.08514130115509033
Modeling Npy
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.6847648620605469
Best model loss: 2.013796091079712
0.32903122901916504
Modeling Sst
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 2.00138521194458
Best model loss: 2.1906580924987793
0.18927288055419922
Modeling Pvalb
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 2.120755910873413
Best model loss: 2.386117696762085
0.2653617858886719
Modeling Vip
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 2.1072800159454346
Best model loss: 2.3525805473327637
0.2453005313873291
Modeling Calb2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.5619483590126038
Best model loss: 0.9096982479095459
0.34774988889694214
Modeling Cck
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.456984281539917
Best model loss: 1.6856448650360107
0.22866058349609375
Modeling Reln
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3323649168014526
Best model loss: 1.425437092781067
0.09307217597961426
Modeling Fos
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.4171642065048218
Best model loss: 1.7169770002365112
0.29981279373168945
Modeling Egr1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.6372140645980835
Best model loss: 1.9656829833984375
0.328468918800354
Modeling Prok2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3214738368988037
Best model loss: 1.5150748491287231
0.19360101222991943
Modeling Egr2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.829037070274353
Best model loss: 2.1909055709838867
0.3618685007095337
Modeling Bdnf
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3676759004592896
Best model loss: 1.5427753925323486
0.17509949207305908
Modeling Gja1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.0627855062484741
Best model loss: 1.2011107206344604
0.13832521438598633
Modeling Ctss
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.0996112823486328
Best model loss: 1.183321237564087
0.0837099552154541
Modeling Mbp
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.1970020532608032
Best model loss: 1.6298285722732544
0.43282651901245117
Modeling Flt1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.5456669330596924
Best model loss: 1.8425877094268799
0.2969207763671875

By calculating bayes factor, we can statistical identification the spatially varying genes in 3D expression model. For a gene, larger Bayes factor values indicate more spatial variation.

[12]:

gpr_model.log_bf.sort_values(by='log_bf',ascending=False)

[12]:

	log_bf
Cux2	0.649313
Mbp	0.432827
Plcxd2	0.414846
Slc17a7	0.381699
Egr2	0.361869
Pcp4	0.35422
Calb2	0.34775
Mgp	0.33679
Npy	0.329031
Egr1	0.328469
Rasgrf2	0.32566
Fos	0.299813
Flt1	0.296921
Pvalb	0.265362
Vip	0.245301
Cck	0.228661
Rorb	0.196652
Prok2	0.193601
Sst	0.189273
Bdnf	0.175099
Ctgf	0.159374
Gja1	0.138325
Sulf2	0.10846
Reln	0.093072
Gad1	0.086712
Sema3e	0.085141
Ctss	0.08371
Nov	0.027444

Visualization

Visualization of the GRP model result for any genes.

[13]:

g = 'Cux2'

# Expression of sampled location from the 3D mesh
sampled_expr = gpr_model.plot_gpr_expr(g,s=.1,training_iter=50,lr=.02,save=False,show=True,elev=45,azim=45,cmap='Spectral_r',zlim=(-0.3,0.4))

# Expression of surface location from the 3D mesh
surface_expr = gpr_model.plot_gpr_expr(g,data=loc_surface/1000,s=.1,training_iter=50,lr=.02,save=False,show=True,elev=45,azim=45,cmap='Spectral_r',zlim=(-0.3,0.4))

# Expression of measurement location from the observation
Scube.plot_3d(loc=loc.values,val=norm_expr[g],s=.1,show=True,elev=45,azim=45,zlim=(-0.3,0.4))

/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/SPACEL-1.0a0-py3.8.egg/SPACEL/Scube/gpr.py:281: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  observed_pred_batch = self.model.likelihood(self.model(torch.tensor(data_batch)))

../_images/tutorials_STARmap_mouse_brain_GPR_25_1.png

../_images/tutorials_STARmap_mouse_brain_GPR_25_2.png

../_images/tutorials_STARmap_mouse_brain_GPR_25_3.png

This step colored mesh surface with GPR expression. It can be used for visualization of the GRP model result with Open3D library.

[14]:

get_surface_colors(mesh,surface_expr)

Save the camera view parameters in a json file and then load it at the next visualization.

[15]:

save_view_parameters(mesh, 'view_parameters.json')

[16]:

g = 'Mbp'
# Expression of surface location from the 3D mesh
surface_expr = gpr_model.plot_gpr_expr(g,data=loc_surface/1000,s=.1,training_iter=50,lr=.02,save=False,show=False,elev=45,azim=45,cmap='Spectral_r',zlim=(-0.3,0.4))
get_surface_colors(mesh,surface_expr)
load_view_parameters(mesh, 'view_parameters.json')