Scube tutorial: 3D expression modeling with gaussian process regression
July 2023
Dataset: 3D STARmap of mouse brain (here).
Note: If you want to run 3D expression GPR model in Scube, you need to install the Open3D python library first.
[2]:
from SPACEL.setting import set_environ_seed
set_environ_seed()
from SPACEL import Scube
from SPACEL.Scube.utils_3d import create_mesh, smooth_mesh, sample_in_mesh, get_surface_colors, save_view_parameters, load_view_parameters
import scanpy as sc
import numpy as np
import pandas as pd
np.random.seed(42)
Setting environment seed: 42
Load data
The input data type of 3D expression model is normalized expression and 3D coordinates of each spot/cell.
[5]:
st_ad = sc.read_h5ad('../data/starmap_3d_mouse_brain/starmap_3d_mouse_brain.h5ad')
sc.pp.normalize_total(st_ad,target_sum=1e4)
sc.pp.log1p(st_ad)
# Normalized expression data (rows as spots/cells, columns as genes)
norm_expr = st_ad.to_df()
# 3D location (rows as spots/cells, columns as x,y,z axis coordinate)
loc = st_ad.obsm['spatial'].copy()
Generate 3D surface
In this step, we create a triangle mesh from 3D location coordinates of spots/cells by using alpha shape methods. By default the parameter α is automatically estimated, which affects the number of triangles in the mesh.
[6]:
mesh = create_mesh(loc.values,alpha=80,show=False)
alpha=80.000
[Open3D WARNING] [CreateFromPointCloudAlphaShape] invalid tetra in TetraMesh
To obtain smoothed mesh, we can used Taubin filter and mesh subdivision for the mesh. By increasing taubin_iter
and subdivide_iter
will generate a smoother mesh.
[7]:
mesh = smooth_mesh(mesh,taubin_iter=10,subdivide_iter=3,show=False)
filter with Taubin with 10 iterations
After subdivision it has 345092 vertices and 690176 triangles
Sampling 500000 location coordinates from inside the mesh and getting the vertices of the mesh surface.
[8]:
loc_sampled,loc_surface = sample_in_mesh(mesh,loc.values)
Gaussian process regression
For better performance of GPR model, we recommend to resize the location coordinates value to about 0 to 10.
[9]:
loc = loc/1000
loc_sampled = loc_sampled/1000
Then, we construct the GPR model and training it until convergence. If GPU devices avaliable, set use_gpu
to True
[10]:
gpr_model = Scube.GPRmodel(norm_expr,loc.values,loc_sampled,used_genes=norm_expr.columns,use_gpu=False)
[11]:
gpr_model.train(lr=.02,training_iter=50,save_model=True,cal_bf=True,optim_l=True,optimize_method='Adam')
Modeling Slc17a7
Optimize lenthscale...
/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/SPACEL-1.0a0-py3.8.egg/SPACEL/Scube/gpr.py:107: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
constant = torch.tensor(self.train_y.mean())
/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/gpytorch/functions/_pivoted_cholesky.py:118: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release.
torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
X = torch.triangular_solve(B, A).solution
should be replaced with
X = torch.linalg.solve_triangular(A, B). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/BatchLinearAlgebra.cpp:2189.)
[L, torch.triangular_solve(Krows[..., m:, :].transpose(-1, -2), L, upper=False)[0].transpose(-1, -2)],
Initialize lenthscale alpha as 1.000
Best model loss: 1.1666253805160522
Best model loss: 1.548324704170227
0.3816993236541748
Modeling Mgp
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.6543543338775635
Best model loss: 0.991144061088562
0.33678972721099854
Modeling Gad1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.7622023820877075
Best model loss: 1.8489148616790771
0.08671247959136963
Modeling Nov
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.706119179725647
Best model loss: 1.7335630655288696
0.027443885803222656
Modeling Rasgrf2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.5395879149436951
Best model loss: 0.8652480244636536
0.3256601095199585
Modeling Rorb
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.9758266806602478
Best model loss: 1.1724785566329956
0.1966518759727478
Modeling Cux2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.439920425415039
Best model loss: 2.089233636856079
0.64931321144104
Modeling Plcxd2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.360386848449707
Best model loss: 1.7752330303192139
0.41484618186950684
Modeling Sulf2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.696236789226532
Best model loss: 0.8046963810920715
0.10845959186553955
Modeling Ctgf
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3529804944992065
Best model loss: 1.5123544931411743
0.15937399864196777
Modeling Pcp4
Optimize lenthscale...
/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/gpytorch/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-06 to the diagonal
warnings.warn(
Initialize lenthscale alpha as 1.000
Best model loss: 1.4984480142593384
Best model loss: 1.8526679277420044
0.354219913482666
Modeling Sema3e
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.415098786354065
Best model loss: 1.5002400875091553
0.08514130115509033
Modeling Npy
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.6847648620605469
Best model loss: 2.013796091079712
0.32903122901916504
Modeling Sst
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 2.00138521194458
Best model loss: 2.1906580924987793
0.18927288055419922
Modeling Pvalb
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 2.120755910873413
Best model loss: 2.386117696762085
0.2653617858886719
Modeling Vip
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 2.1072800159454346
Best model loss: 2.3525805473327637
0.2453005313873291
Modeling Calb2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 0.5619483590126038
Best model loss: 0.9096982479095459
0.34774988889694214
Modeling Cck
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.456984281539917
Best model loss: 1.6856448650360107
0.22866058349609375
Modeling Reln
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3323649168014526
Best model loss: 1.425437092781067
0.09307217597961426
Modeling Fos
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.4171642065048218
Best model loss: 1.7169770002365112
0.29981279373168945
Modeling Egr1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.6372140645980835
Best model loss: 1.9656829833984375
0.328468918800354
Modeling Prok2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3214738368988037
Best model loss: 1.5150748491287231
0.19360101222991943
Modeling Egr2
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.829037070274353
Best model loss: 2.1909055709838867
0.3618685007095337
Modeling Bdnf
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.3676759004592896
Best model loss: 1.5427753925323486
0.17509949207305908
Modeling Gja1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.0627855062484741
Best model loss: 1.2011107206344604
0.13832521438598633
Modeling Ctss
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.0996112823486328
Best model loss: 1.183321237564087
0.0837099552154541
Modeling Mbp
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.1970020532608032
Best model loss: 1.6298285722732544
0.43282651901245117
Modeling Flt1
Optimize lenthscale...
Initialize lenthscale alpha as 1.000
Best model loss: 1.5456669330596924
Best model loss: 1.8425877094268799
0.2969207763671875
By calculating bayes factor, we can statistical identification the spatially varying genes in 3D expression model. For a gene, larger Bayes factor values indicate more spatial variation.
[12]:
gpr_model.log_bf.sort_values(by='log_bf',ascending=False)
[12]:
log_bf | |
---|---|
Cux2 | 0.649313 |
Mbp | 0.432827 |
Plcxd2 | 0.414846 |
Slc17a7 | 0.381699 |
Egr2 | 0.361869 |
Pcp4 | 0.35422 |
Calb2 | 0.34775 |
Mgp | 0.33679 |
Npy | 0.329031 |
Egr1 | 0.328469 |
Rasgrf2 | 0.32566 |
Fos | 0.299813 |
Flt1 | 0.296921 |
Pvalb | 0.265362 |
Vip | 0.245301 |
Cck | 0.228661 |
Rorb | 0.196652 |
Prok2 | 0.193601 |
Sst | 0.189273 |
Bdnf | 0.175099 |
Ctgf | 0.159374 |
Gja1 | 0.138325 |
Sulf2 | 0.10846 |
Reln | 0.093072 |
Gad1 | 0.086712 |
Sema3e | 0.085141 |
Ctss | 0.08371 |
Nov | 0.027444 |
Visualization
Visualization of the GRP model result for any genes.
[13]:
g = 'Cux2'
# Expression of sampled location from the 3D mesh
sampled_expr = gpr_model.plot_gpr_expr(g,s=.1,training_iter=50,lr=.02,save=False,show=True,elev=45,azim=45,cmap='Spectral_r',zlim=(-0.3,0.4))
# Expression of surface location from the 3D mesh
surface_expr = gpr_model.plot_gpr_expr(g,data=loc_surface/1000,s=.1,training_iter=50,lr=.02,save=False,show=True,elev=45,azim=45,cmap='Spectral_r',zlim=(-0.3,0.4))
# Expression of measurement location from the observation
Scube.plot_3d(loc=loc.values,val=norm_expr[g],s=.1,show=True,elev=45,azim=45,zlim=(-0.3,0.4))
/opt/miniconda3/envs/SPACEL/lib/python3.8/site-packages/SPACEL-1.0a0-py3.8.egg/SPACEL/Scube/gpr.py:281: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
observed_pred_batch = self.model.likelihood(self.model(torch.tensor(data_batch)))
This step colored mesh surface with GPR expression. It can be used for visualization of the GRP model result with Open3D
library.
[14]:
get_surface_colors(mesh,surface_expr)
Save the camera view parameters in a json file and then load it at the next visualization.
[15]:
save_view_parameters(mesh, 'view_parameters.json')
[16]:
g = 'Mbp'
# Expression of surface location from the 3D mesh
surface_expr = gpr_model.plot_gpr_expr(g,data=loc_surface/1000,s=.1,training_iter=50,lr=.02,save=False,show=False,elev=45,azim=45,cmap='Spectral_r',zlim=(-0.3,0.4))
get_surface_colors(mesh,surface_expr)
load_view_parameters(mesh, 'view_parameters.json')