Tutorial 4: HD data(MSI)

Here, we apply GraphPCA-Turbo to analyze a HD spatial transcriptomics sample from the Mouse Small Intestine(MSI). GraphPCA-Turbo is a highly scalable enhancement of GraphPCA that reformulates the optimization problem as sparse, symmetric positive-definite linear systems solved via an alternating iteration strategy. The intestine is a highly organized organ characterized by its complex vertical architecture, ranging from the deep crypts to the mature villus tips.MSI dataset can be download from 10x Genomics Visium HD website (https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine)

Load packages

[ ]:

import GraphPCA as sg
import numpy as np
import pandas as pd
import scanpy as sc
import squidpy as sq
from sklearn.cluster import KMeans

%config InlineBackend.figure_format = 'retina'

Load data

[2]:

adata = sc.read_h5ad('../data/after_gpca_008um_0306.h5ad')

Preprocessing

[3]:

sc.pp.filter_cells(adata,min_counts=100)
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
adata

WARNING: adata.X seems to be already log-transformed.

[3]:

AnnData object with n_obs × n_vars = 308813 × 19059
    obs: 'in_tissue', 'array_row', 'array_col', 'location_id', 'region', 'n_counts', 'GPCA_pred_7'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'GPCA_pred_7_colors', 'log1p', 'spatialdata_attrs'
    obsm: 'spatial'

Perform GraphPCA-Turbo

In GraphPCA-Turbo, we have significantly refactored the computational core to introduce a high-performance accelerated mode. By leveraging a C++ backend powered by Eigen3 and pybind11, the algorithm achieves a 5x to 20x speedup compared to standard implementations.

[4]:

Z, W = sg.Run_GPCA(
    adata,
    location=adata.obsm["spatial"],
    n_components=50,
    method="knn",
    n_neighbors=6,
    _lambda=0.5,
    max_iter=10,
    mode='accelerated'
)

Reached maximum iterations (10) without convergence.

[5]:

adata.obsm["GraphPCA-Turbo"] = Z
print(Z.shape)

(308813, 50)

Clustering

[6]:

estimator = KMeans(n_clusters=7,random_state=101)
res = estimator.fit(Z[:,:])
lable_pred=res.labels_
adata.obs["GPCA_pred_7"]= lable_pred
adata.obs["GPCA_pred_7"] = adata.obs["GPCA_pred_7"].astype('category')

d:\Miniconda3\envs\sctm\lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)

Visualization

[7]:

sc.set_figure_params(color_map = 'Set1',figsize=(5,5))
sq.pl.spatial_scatter(
    adata,
    library_id="spatial",
    shape=None,
    color=[
        "GPCA_pred_7",
    ],
    wspace=0.4
)

d:\Miniconda3\envs\sctm\lib\site-packages\squidpy\pl\_spatial_utils.py:955: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(

ROI Analysis

Read the csv of roi point information

[8]:

roi_info_from_csv = pd.read_csv('../data/roi_points_info.csv', index_col=0)
selected_indices = roi_info_from_csv.index
adata = adata[adata.obs_names.isin(selected_indices)].copy()
print(f"Extracted ROI adata: {adata.n_obs} cells")

Extracted ROI adata: 1347 cells

[9]:

sc.set_figure_params(color_map = 'Set1',figsize=(5,5))
sq.pl.spatial_scatter(
    adata,
    library_id="spatial",
    shape=None,
    color=[
        "GPCA_pred_7",
    ],
    wspace=0.4,
    size=10,
)

d:\Miniconda3\envs\sctm\lib\site-packages\squidpy\pl\_spatial_utils.py:955: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(