poss_dataset_ids = dataset_info
.map(d => d.dataset_id)
.filter(d => results.map(r => r.dataset_id).includes(d))
poss_method_ids = method_info
.map(d => d.method_id)
.filter(d => results.map(r => r.method_id).includes(d))
poss_metric_ids = metric_info
.map(d => d.metric_id)
.filter(d => results.map(r => Object.keys(r.scaled_scores)).flat().includes(d))
Spatial Decomposition
Calling cell-type compositions for spot-based spatial transcriptomics data
10 datasets · 9 methods · 2 control methods · 1 metrics
Task info Method info Metric info Dataset info Results
Spatial decomposition (also often referred to as Spatial deconvolution) is applicable to spatial transcriptomics data where the transcription profile of each capture location (spot, voxel, bead, etc.) do not share a bijective relationship with the cells in the tissue, i.e., multiple cells may contribute to the same capture location. The task of spatial decomposition then refers to estimating the composition of cell types/states that are present at each capture location. The cell type/states estimates are presented as proportion values, representing the proportion of the cells at each capture location that belong to a given cell type.
We distinguish between reference-based decomposition and de novo decomposition, where the former leverage external data (e.g., scRNA-seq or scNuc-seq) to guide the inference process, while the latter only work with the spatial data. We require that all datasets have an associated reference single cell data set, but methods are free to ignore this information.
Due to the lack of real datasets with the necessary ground-truth, this task makes use of a simulated dataset generated by creating cell-aggregates by sampling from a Dirichlet distribution. The ground-truth dataset consists of the spatial expression matrix, XY coordinates of the spots, true cell-type proportions for each spot, and the reference single-cell data (from which cell aggregated were simulated).
Summary
Display settings
Filter datasets
Filter methods
Filter metrics
Results
Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets.
Dataset info
Show
HypoMap
A unified single cell gene expression atlas of the murine hypothalamus (Steuernagel et al. 2022).
The hypothalamus plays a key role in coordinating fundamental body functions. Despite recent progress in single-cell technologies, a unified catalogue and molecular characterization of the heterogeneous cell types and, specifically, neuronal subtypes in this brain region are still lacking. Here we present an integrated reference atlas “HypoMap” of the murine hypothalamus consisting of 384,925 cells, with the ability to incorporate new additional experiments. We validate HypoMap by comparing data collected from SmartSeq2 and bulk RNA sequencing of selected neuronal cell types with different degrees of cellular heterogeneity.
CeNGEN
Complete Gene Expression Map of an Entire Nervous System (Hammarlund et al. 2018).
100k FACS-isolated C. elegans neurons from 17 experiments sequenced on 10x Genomics.
Zebrafish embryonic cells
Single-cell mRNA sequencing of zebrafish embryonic cells (Wagner et al. 2018).
90k cells from zebrafish embryos throughout the first day of development, with and without a knockout of chordin, an important developmental gene.
Human pancreas
Human pancreas cells dataset from the scIB benchmarks (Luecken et al. 2021).
Human pancreatic islet scRNA-seq data from 6 datasets across technologies (CEL-seq, CEL-seq2, Smart-seq2, inDrop, Fluidigm C1, and SMARTER-seq).
GTEX v9
Single-nucleus cross-tissue molecular reference maps to decipher disease gene function (Eraslan et al. 2022).
Understanding the function of genes and their regulation in tissue homeostasis and disease requires knowing the cellular context in which genes are expressed in tissues across the body. Single cell genomics allows the generation of detailed cellular atlases in human tissues, but most efforts are focused on single tissue types. Here, we establish a framework for profiling multiple tissues across the human body at single-cell resolution using single nucleus RNA-Seq (snRNA-seq), and apply it to 8 diverse, archived, frozen tissue types (three donors per tissue). We apply four snRNA-seq methods to each of 25 samples from 16 donors, generating a cross-tissue atlas of 209,126 nuclei profiles, and benchmark them vs. scRNA-seq of comparable fresh tissues. We use a conditional variational autoencoder (cVAE) to integrate an atlas across tissues, donors, and laboratory methods. We highlight shared and tissue-specific features of tissue-resident immune cells, identifying tissue-restricted and non-restricted resident myeloid populations. These include a cross-tissue conserved dichotomy between LYVE1- and HLA class II-expressing macrophages, and the broad presence of LAM-like macrophages across healthy tissues that is also observed in disease. For rare, monogenic muscle diseases, we identify cell types that likely underlie the neuromuscular, metabolic, and immune components of these diseases, and biological processes involved in their pathology. For common complex diseases and traits analyzed by GWAS, we identify the cell types and gene modules that potentially underlie disease mechanisms. The experimental and analytical frameworks we describe will enable the generation of large-scale studies of how cellular and molecular processes vary across individuals and populations.
Human immune
Human immune cells dataset from the scIB benchmarks (Luecken et al. 2021).
Human immune cells from peripheral blood and bone marrow taken from 5 datasets comprising 10 batches across technologies (10X, Smart-seq2).
Mouse Pancreatic Islet Atlas
Mouse pancreatic islet scRNA-seq atlas across sexes, ages, and stress conditions including diabetes (Hrovatin et al. 2023).
To better understand pancreatic β-cell heterogeneity we generated a mouse pancreatic islet atlas capturing a wide range of biological conditions. The atlas contains scRNA-seq datasets of over 300,000 mouse pancreatic islet cells, of which more than 100,000 are β-cells, from nine datasets with 56 samples, including two previously unpublished datasets. The samples vary in sex, age (ranging from embryonic to aged), chemical stress, and disease status (including T1D NOD model development and two T2D models, mSTZ and db/db) together with different diabetes treatments. Additional information about data fields is available in anndata uns field ‘field_descriptions’ and on https://github.com/theislab/mm_pancreas_atlas_rep/blob/main/resources/cellxgene.md.
Diabetic Kidney Disease
Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression (Wilson et al. 2022).
Multimodal single cell sequencing is a powerful tool for interrogating cell-specific changes in transcription and chromatin accessibility. We performed single nucleus RNA (snRNA-seq) and assay for transposase accessible chromatin sequencing (snATAC-seq) on human kidney cortex from donors with and without diabetic kidney disease (DKD) to identify altered signaling pathways and transcription factors associated with DKD. Both snRNA-seq and snATAC-seq had an increased proportion of VCAM1+ injured proximal tubule cells (PT_VCAM1) in DKD samples. PT_VCAM1 has a pro-inflammatory expression signature and transcription factor motif enrichment implicated NFkB signaling. We used stratified linkage disequilibrium score regression to partition heritability of kidney-function-related traits using publicly-available GWAS summary statistics. Cell-specific PT_VCAM1 peaks were enriched for heritability of chronic kidney disease (CKD), suggesting that genetic background may regulate chromatin accessibility and DKD progression. snATAC-seq found cell-specific differentially accessible regions (DAR) throughout the nephron that change accessibility in DKD and these regions were enriched for glucocorticoid receptor (GR) motifs. Changes in chromatin accessibility were associated with decreased expression of insulin receptor, increased gluconeogenesis, and decreased expression of the GR cytosolic chaperone, FKBP5, in the diabetic proximal tubule. Cleavage under targets and release using nuclease (CUT&RUN) profiling of GR binding in bulk kidney cortex and an in vitro model of the proximal tubule (RPTEC) showed that DAR co-localize with GR binding sites. CRISPRi silencing of GR response elements (GRE) in the FKBP5 gene body reduced FKBP5 expression in RPTEC, suggesting that reduced FKBP5 chromatin accessibility in DKD may alter cellular response to GR. We developed an open-source tool for single cell allele specific analysis (SALSA) to model the effect of genetic background on gene expression. Heterozygous germline single nucleotide variants (SNV) in proximal tubule ATAC peaks were associated with allele-specific chromatin accessibility and differential expression of target genes within cis-coaccessibility networks. Partitioned heritability of proximal tubule ATAC peaks with a predicted allele-specific effect was enriched for eGFR, suggesting that genetic background may modify DKD progression in a cell-specific manner.
Immune Cell Atlas
Cross-tissue immune cell analysis reveals tissue-specific features in humans (Domínguez Conde et al. 2022).
Despite their crucial role in health and disease, our knowledge of immune cells within human tissues remains limited. We surveyed the immune compartment of 16 tissues from 12 adult donors by single-cell RNA sequencing and VDJ sequencing generating a dataset of ~360,000 cells. To systematically resolve immune cell heterogeneity across tissues, we developed CellTypist, a machine learning tool for rapid and precise cell type annotation. Using this approach, combined with detailed curation, we determined the tissue distribution of finely phenotyped immune cell types, revealing hitherto unappreciated tissue-specific features and clonal architecture of T and B cells. Our multitissue approach lays the foundation for identifying highly resolved immune cell types by leveraging a common reference dataset, tissue-integrated expression analysis, and antigen receptor sequencing.
Tabula Sapiens
A multiple-organ, single-cell transcriptomic atlas of humans (Jones et al. 2022).
Tabula Sapiens is a benchmark, first-draft human cell atlas of nearly 500,000 cells from 24 organs of 15 normal human subjects. This work is the product of the Tabula Sapiens Consortium. Taking the organs from the same individual controls for genetic background, age, environment, and epigenetic effects and allows detailed analysis and comparison of cell types that are shared between tissues. Our work creates a detailed portrait of cell types as well as their distribution and variation in gene expression across tissues and within the endothelial, epithelial, stromal and immune compartments.
Method info
Show
Cell2Location
Cell2location uses a Bayesian model to resolve cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues.
Cell2location is a decomposition method based on Negative Binomial regression that is able to account for batch effects in estimating the single-cell gene expression signature used for the spatial decomposition step. Note that when batch information is unavailable for this task, we can use either a hard-coded reference, or a negative-binomial learned reference without batch labels. The parameter alpha refers to the detection efficiency prior.
DestVI
DestVI is a probabilistic method for multi-resolution analysis for spatial transcriptomics that explicitly models continuous variation within cell types.
Deconvolution of Spatial Transcriptomics profiles using Variational Inference (DestVI) is a spatial decomposition method that leverages a conditional generative model of spatial transcriptomics down to the sub-cell-type variation level, which is then used to decompose the cell-type proportions determining the spatial organization of a tissue.
NMFreg
NMFreg reconstructs gene expression as a weighted combination of cell type signatures defined by scRNA-seq.
Non-Negative Matrix Factorization regression (NMFreg) is a decomposition method that reconstructs expression of each spatial location as a weighted combination of cell-type signatures defined by scRNA-seq. It was originally developed for Slide-seq data. This is a re-implementation from https://github.com/tudaga/NMFreg_tutorial.
NNLS
NNLS is a decomposition method based on Non-Negative Least Square Regression.
NonNegative Least Squares (NNLS), is a convex optimization problem with convex constraints. It was used by the AutoGeneS method to infer cellular proporrtions by solvong a multi-objective optimization problem.
RCTD
RCTD learns cell type profiles from scRNA-seq to decompose cell type mixtures while correcting for differences across sequencing technologies.
RCTD (Robust Cell Type Decomposition) is a decomposition method that uses signatures learnt from single-cell data to decompose spatial expression of tissues. It is able to use a platform effect normalization step, which normalizes the scRNA-seq cell type profiles to match the platform effects of the spatial transcriptomics dataset.
Seurat
Seurat method that is based on Canonical Correlation Analysis (CCA).
This method applies the ‘anchor’-based integration workflow introduced in Seurat v3, that enables the probabilistic transfer of annotations from a reference to a query set. First, mutual nearest neighbors (anchors) are identified from the reference scRNA-seq and query spatial datasets. Then, annotations are transfered from the single cell reference data to the sptial data along with prediction scores for each spot.
Stereoscope
Stereoscope is a decomposition method based on Negative Binomial regression.
Stereoscope is a decomposition method based on Negative Binomial regression. It is similar in scope and implementation to cell2location but less flexible to incorporate additional covariates such as batch effects and other type of experimental design annotations.
Tangram
Tanagram maps single-cell gene expression data onto spatial gene expression data by fitting gene expression on shared genes.
Tangram is a method to map gene expression signatures from scRNA-seq data to spatial data. It performs the cell type mapping by learning a similarity matrix between single-cell and spatial locations based on gene expression profiles.
NMF
NMF reconstructs gene expression as a weighted combination of cell type signatures defined by scRNA-seq.
NMF is a decomposition method based on Non-negative Matrix Factorization (NMF) that reconstructs expression of each spatial location as a weighted combination of cell-type signatures defined by scRNA-seq. It is a simpler baseline than NMFreg as it only performs the NMF step based on mean expression signatures of cell types, returning the weights loading of the NMF as (normalized) cell type proportions, without the regression step.
Control method info
Show
Random Proportions
Negative control method that randomly assigns celltype proportions from a Dirichlet distribution
A negative control method with random assignment of predicted celltype proportions from a Dirichlet distribution.
True Proportions
Positive control method that assigns celltype proportions from the ground truth
A positive control method with perfect assignment of predicted celltype proportions from the ground truth.
Metric info
Show
R2
R2 represents the proportion of variance in the true proportions which is explained by the predicted proportions (10.1002/0470013192.bsa526?).
R2, or the “coefficient of determination”, reports the fraction of the true proportion values’ variance that can be explained by the predicted proportion values. The best score, and upper bound, is 1.0. There is no fixed lower bound for the metric. The uniform/non-weighted average across all cell types/states is used to summarise performance. By default, cases resulting in a score of NaN (perfect predictions) or -Inf (imperfect predictions) are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions) respectively.
Quality control results
Show
Category | Name | Value | Condition | Severity |
---|---|---|---|---|
Scaling | Worst score cell2location r2 | -3.3126000 | worst_score >= -1 | ✗✗✗ |
Scaling | Worst score seurat r2 | -3.1724000 | worst_score >= -1 | ✗✗✗ |
Dataset info | Pct 'task_id' missing | 1.0000000 | percent_missing(dataset_info, field) | ✗✗ |
Method info | Pct 'paper_reference' missing | 0.8181818 | percent_missing(method_info, field) | ✗✗ |
Scaling | Worst score nnls r2 | -2.8586000 | worst_score >= -1 | ✗✗ |
Scaling | Worst score rctd r2 | -2.5379000 | worst_score >= -1 | ✗✗ |
Scaling | Worst score nmfreg r2 | -1.8296000 | worst_score >= -1 | ✗ |
Scaling | Worst score destvi r2 | -1.1763000 | worst_score >= -1 | ✗ |
Normalisation visualisation
Show
References
Domínguez Conde, C., C. Xu, L. B. Jarvis, D. B. Rainbow, S. B. Wells, T. Gomes, S. K. Howlett, et al. 2022. “Cross-Tissue Immune Cell Analysis Reveals Tissue-Specific Features in Humans.” Science 376 (6594). https://doi.org/10.1126/science.abl5197.
Eraslan, Gökcen, Eugene Drokhlyansky, Shankara Anand, Evgenij Fiskin, Ayshwarya Subramanian, Michal Slyper, Jiali Wang, et al. 2022. “Single-Nucleus Cross-Tissue Molecular Reference Maps Toward Understanding Disease Gene Function.” Science 376 (6594). https://doi.org/10.1126/science.abl4290.
Hammarlund, Marc, Oliver Hobert, David M. Miller, and Nenad Sestan. 2018. “The CeNGEN Project: The Complete Gene Expression Map of an Entire Nervous System.” Neuron 99 (3): 430–33. https://doi.org/10.1016/j.neuron.2018.07.042.
Hrovatin, Karin, Aimée Bastidas-Ponce, Mostafa Bakhti, Luke Zappia, Maren Büttner, Ciro Sallino, Michael Sterr, et al. 2023. “Delineating Mouse β-Cell Identity During Lifetime and in Diabetes with a Single Cell Atlas.” bioRxiv. https://doi.org/10.1101/2022.12.22.521557.
Jones, Robert C., Jim Karkanias, Mark A. Krasnow, Angela Oliveira Pisco, Stephen R. Quake, Julia Salzman, Nir Yosef, et al. 2022. “The Tabula Sapiens: A Multiple-Organ, Single-Cell Transcriptomic Atlas of Humans.” Science 376 (6594). https://doi.org/10.1126/science.abl4896.
Luecken, Malte D., M. Büttner, K. Chaichoompu, A. Danese, M. Interlandi, M. F. Mueller, D. C. Strobl, et al. 2021. “Benchmarking Atlas-Level Data Integration in Single-Cell Genomics.” Nature Methods 19 (1): 41–50. https://doi.org/10.1038/s41592-021-01336-8.
Steuernagel, Lukas, Brian Y. H. Lam, Paul Klemm, Georgina K. C. Dowsett, Corinna A. Bauder, John A. Tadross, Tamara Sotelo Hitschfeld, et al. 2022. “HypoMap—a Unified Single-Cell Gene Expression Atlas of the Murine Hypothalamus.” Nature Metabolism 4 (10): 1402–19. https://doi.org/10.1038/s42255-022-00657-y.
Wagner, Daniel E., Caleb Weinreb, Zach M. Collins, James A. Briggs, Sean G. Megason, and Allon M. Klein. 2018. “Single-Cell Mapping of Gene Expression Landscapes and Lineage in the Zebrafish Embryo.” Science 360 (6392): 981–87. https://doi.org/10.1126/science.aar4362.
Wilson, Parker C., Yoshiharu Muto, Haojia Wu, Anil Karihaloo, Sushrut S. Waikar, and Benjamin D. Humphreys. 2022. “Multimodal Single Cell Sequencing Implicates Chromatin Accessibility and Genetic Background in Diabetic Kidney Disease Progression.” Nature Communications 13 (1). https://doi.org/10.1038/s41467-022-32972-z.