Abstract
Recent technical developments have enabled the transcriptomes of hundreds of cells to be assayed in an unbiased manner, opening up the possibility that new subpopulations of cells can be found. However, the effects of potential confounding factors, such as the cell cycle, on the heterogeneity of gene expression and therefore on the ability to robustly identify subpopulations remain unclear. We present and validate a computational approach that uses latent variable models to account for such hidden factors. We show that our single-cell latent variable model (scLVM) allows the identification of otherwise undetectable subpopulations of cells that correspond to different stages during the differentiation of naive T cells into T helper 2 cells. Our approach can be used not only to identify cellular subpopulations but also to tease apart different sources of gene expression heterogeneity in single-cell transcriptomes.
This is a preview of subscription content, access via your institution
Access options
Access to this article via ICE Institution of Civil Engineers is not available.
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Accession codes
References
Levsky, J.M., Shenoy, S.M., Pezo, R.C. & Singer, R.H. Single-cell gene expression profiling. Science 297, 836–840 (2002).
Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538 (2010).
Raj, A., van den Bogaard, P., Rifkin, S.A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).
Liu, J., Hansen, C. & Quake, S.R. Solving the “world-to-chip” interface problem with a microfluidic matrix. Anal. Chem. 75, 4718–4723 (2003).
Citri, A., Pang, Z.P., Sudhof, T.C., Wernig, M. & Malenka, R.C. Comprehensive qPCR profiling of gene expression in single neuronal cells. Nat. Protoc. 7, 118–127 (2012).
Wheeler, A.R. et al. Microfluidic device for single-cell analysis. Anal. Chem. 75, 3581–3586 (2003).
Marcus, J.S., Anderson, W.F. & Quake, S.R. Microfluidic single-cell mRNA isolation and analysis. Anal. Chem. 78, 3084–3089 (2006).
Guo, G. et al. Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. Dev. Cell 18, 675–685 (2010).
Burton, A. et al. Single-cell profiling of epigenetic modifiers identifies PRDM14 as an inducer of cell fate in the mammalian embryo. Cell Reports 5, 687–701 (2013).
Luo, L. et al. Gene expression profiles of laser-captured adjacent neuronal subtypes. Nat. Med. 5, 117–122 (1999).
Chiang, M.K. & Melton, D.A. Single-cell transcript analysis of pancreas development. Dev. Cell 4, 383–393 (2003).
Tang, F. et al. RNA-seq analysis to capture the transcriptome landscape of a single cell. Nat. Protoc. 5, 516–535 (2010).
Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).
Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).
Yan, L. et al. Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131–1139 (2013).
Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-seq analysis. Cell Stem Cell 6, 468–478 (2010).
Shalek, A.K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Pollen, A.A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).
Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).
Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007).
Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).
Li, S. et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 32, 888–895 (2014).
Xue, Z. et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597 (2013).
Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Reports 7, 1130–1142 (2014).
Newman, J.R. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846 (2006).
Gold, D., Mallick, B. & Coombes, K. Real-time gene expression: statistical challenges in design and inference. J. Comput. Biol. 15, 611–623 (2008).
Singh, A.M. et al. Cell-cycle control of developmentally regulated transcription factors accounts for heterogeneity in human pluripotent cells. Stem Cell Reports 1, 532–544 (2013).
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).
Fusi, N., Stegle, O. & Lawrence, N.D. Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput. Biol. 8, e1002330 (2012).
Lawrence, N.D. Gaussian process latent variable models for visualisation of high dimensional data. Adv. Neural Inf. Process. Syst. 16, 329–336 (2004).
Sasagawa, Y. et al. Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 14, R31 (2013).
Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).
Fox, C.J., Hammerman, P.S. & Thompson, C.B. Fuel feeds function: energy metabolism and the T-cell response. Nat. Rev. Immunol. 5, 844–852 (2005).
Nelms, K., Keegan, A.D., Zamorano, J., Ryan, J.J. & Paul, W.E. The IL-4 receptor: signaling mechanisms and biologic functions. Annu. Rev. Immunol. 17, 701–738 (1999).
Zhu, J., Yamane, H., Cote-Sierra, J., Guo, L. & Paul, W.E. GATA-3 promotes TH2 responses through three different mechanisms: induction of TH2 cytokine production, selective growth of TH2 cells and inhibition of Th1 cell-specific factors. Cell Res. 16, 3–10 (2006).
Stritesky, G.L. et al. The transcription factor STAT3 is required for T helper 2 cell development. Immunity 34, 39–49 (2011).
Zhou, M. et al. Kruppel-like transcription factor 13 regulates T lymphocyte survival in vivo. J. Immunol. 178, 5496–5504 (2007).
Betz, B.C. et al. Batf coordinates multiple aspects of B and T cell function required for normal antibody responses. J. Exp. Med. 207, 933–942 (2010).
Sahoo, A. et al. Stat6 and c-Jun mediate TH2 cell-specific IL--24 gene expression. J. Immunol. 186, 4098–4109 (2011).
Jensen, L.J. et al. STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–D416 (2009).
Chang, C.H. et al. Posttranscriptional control of T cell effector function by aerobic glycolysis. Cell 153, 1239–1251 (2013).
Garcia-Sanz, J.A., Mikulits, W., Livingstone, A., Lefkovits, I. & Mullner, E.W. Translational control: a general mechanism for gene regulation during T cell activation. FASEB J. 12, 299–306 (1998).
Bird, J.J. et al. Helper T cell differentiation is controlled by the cell cycle. Immunity 9, 229–237 (1998).
Wilson, C.B., Makar, K.W. & Perez-Melgosa, M. Epigenetic regulation of T cell fate and function. J. Infect. Dis. 185 (suppl. 1), S37–S45 (2002).
Stegle, O., Teichmann, S.A. & Marioni, J.C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. (in the press).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Gagnon-Bartsch, J.A. & Speed, T.P. Using control genes to correct for unwanted variation in microarray data. Biostatistics 13, 539–552 (2012).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Buettner, F. & Theis, F.J. A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst. Bioinformatics 28, i626–i632 (2012).
Acknowledgements
We thank S. Anders and A. Baud for helpful discussions. We also thank the Sanger-EBI Single Cell Centre for technical support. We acknowledge support of the European Research Council (Starting grant no. 260507 thSWITCH to S.A.T., Starting Grant LatentCauses to F.J.T., Marie Curie FP7 fellowship 253524 to O.S.), the Sanger-EBI Single Cell Centre (K.N.N. & A.S.) and the European Molecular Biology Organization (short-term fellowship to F.B.).
Author information
Authors and Affiliations
Contributions
F.B. developed the method, performed the analysis and wrote the paper. K.N.N. performed the mESC experiments and contributed to the analysis. F.P.C. and A.S. contributed to method development and analysis. V.P., S.A.T. and F.J.T. helped interpret the biological results. S.A.T. and V.P. designed the mouse TH2 differentiation experiment. J.C.M. and O.S. designed and supervised this study, contributed to the method development and wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–22, Supplementary Tables 3, 6, 8–9 and Supplementary Notes (PDF 31436 kb)
Supplementary Table 1
List of genes annotated from cell cycle, either from GO or from CycleBase. (XLSX 61 kb)
Supplementary Table 2
List of contribution of all variance components for the T-cell data. (XLSX 484 kb)
Supplementary Table 4
List of significantly differentially expressed genes between identified cell sub clusters. (XLSX 47 kb)
Supplementary Table 5
Manually curated list of 122 Th2 signature genes. (XLSX 33 kb)
Supplementary Table 7
List of genes with more than 5% of the variance explained by interaction between cell cycle and differentiation. (XLSX 34 kb)
Supplementary Data 1
Corrected and uncorrected expression values for T-cell data. (XLSX 9175 kb)
Supplementary Data 2
Corrected and uncorrected expression values for the newly generated mouse ESC data. (XLSX 37981 kb)
Rights and permissions
About this article
Cite this article
Buettner, F., Natarajan, K., Casale, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 33, 155–160 (2015). https://doi.org/10.1038/nbt.3102
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.3102