Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells

Abstract

Recent technical developments have enabled the transcriptomes of hundreds of cells to be assayed in an unbiased manner, opening up the possibility that new subpopulations of cells can be found. However, the effects of potential confounding factors, such as the cell cycle, on the heterogeneity of gene expression and therefore on the ability to robustly identify subpopulations remain unclear. We present and validate a computational approach that uses latent variable models to account for such hidden factors. We show that our single-cell latent variable model (scLVM) allows the identification of otherwise undetectable subpopulations of cells that correspond to different stages during the differentiation of naive T cells into T helper 2 cells. Our approach can be used not only to identify cellular subpopulations but also to tease apart different sources of gene expression heterogeneity in single-cell transcriptomes.

Access to this article via ICE Institution of Civil Engineers is not available.

This is a preview of subscription content, access via your institution

Access options

Access to this article via ICE Institution of Civil Engineers is not available.

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of the scLVM approach.
Figure 2: Validation of scLVM on cell cycle–staged mESCs.
Figure 3: Application of scLVM to identify subpopulations in differentiating T-cells.
Figure 4: Application of scLVM to decompose gene expression variability in differentiating T-cells, considering both cell cycle and the TH2 differentiation factor.

Similar content being viewed by others

Accession codes

Primary accessions

ArrayExpress

References

  1. Levsky, J.M., Shenoy, S.M., Pezo, R.C. & Singer, R.H. Single-cell gene expression profiling. Science 297, 836–840 (2002).

    Article  CAS  PubMed  Google Scholar 

  2. Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Raj, A., van den Bogaard, P., Rifkin, S.A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Liu, J., Hansen, C. & Quake, S.R. Solving the “world-to-chip” interface problem with a microfluidic matrix. Anal. Chem. 75, 4718–4723 (2003).

    Article  CAS  PubMed  Google Scholar 

  5. Citri, A., Pang, Z.P., Sudhof, T.C., Wernig, M. & Malenka, R.C. Comprehensive qPCR profiling of gene expression in single neuronal cells. Nat. Protoc. 7, 118–127 (2012).

    Article  CAS  Google Scholar 

  6. Wheeler, A.R. et al. Microfluidic device for single-cell analysis. Anal. Chem. 75, 3581–3586 (2003).

    Article  CAS  PubMed  Google Scholar 

  7. Marcus, J.S., Anderson, W.F. & Quake, S.R. Microfluidic single-cell mRNA isolation and analysis. Anal. Chem. 78, 3084–3089 (2006).

    Article  CAS  PubMed  Google Scholar 

  8. Guo, G. et al. Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. Dev. Cell 18, 675–685 (2010).

    Article  CAS  PubMed  Google Scholar 

  9. Burton, A. et al. Single-cell profiling of epigenetic modifiers identifies PRDM14 as an inducer of cell fate in the mammalian embryo. Cell Reports 5, 687–701 (2013).

    Article  CAS  PubMed  Google Scholar 

  10. Luo, L. et al. Gene expression profiles of laser-captured adjacent neuronal subtypes. Nat. Med. 5, 117–122 (1999).

    Article  CAS  PubMed  Google Scholar 

  11. Chiang, M.K. & Melton, D.A. Single-cell transcript analysis of pancreas development. Dev. Cell 4, 383–393 (2003).

    Article  CAS  PubMed  Google Scholar 

  12. Tang, F. et al. RNA-seq analysis to capture the transcriptome landscape of a single cell. Nat. Protoc. 5, 516–535 (2010).

    Article  CAS  PubMed  Google Scholar 

  13. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).

    Article  CAS  PubMed  Google Scholar 

  15. Yan, L. et al. Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131–1139 (2013).

    Article  CAS  PubMed  Google Scholar 

  16. Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-seq analysis. Cell Stem Cell 6, 468–478 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Shalek, A.K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Pollen, A.A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).

    Article  CAS  PubMed  Google Scholar 

  21. Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007).

    Article  PubMed Central  Google Scholar 

  22. Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Li, S. et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 32, 888–895 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Xue, Z. et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Reports 7, 1130–1142 (2014).

    Article  CAS  PubMed  Google Scholar 

  26. Newman, J.R. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846 (2006).

    Article  CAS  PubMed  Google Scholar 

  27. Gold, D., Mallick, B. & Coombes, K. Real-time gene expression: statistical challenges in design and inference. J. Comput. Biol. 15, 611–623 (2008).

    Article  CAS  PubMed  Google Scholar 

  28. Singh, A.M. et al. Cell-cycle control of developmentally regulated transcription factors accounts for heterogeneity in human pluripotent cells. Stem Cell Reports 1, 532–544 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

    Article  CAS  PubMed  Google Scholar 

  30. Fusi, N., Stegle, O. & Lawrence, N.D. Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput. Biol. 8, e1002330 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lawrence, N.D. Gaussian process latent variable models for visualisation of high dimensional data. Adv. Neural Inf. Process. Syst. 16, 329–336 (2004).

    Google Scholar 

  32. Sasagawa, Y. et al. Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 14, R31 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

    Article  PubMed  Google Scholar 

  34. Fox, C.J., Hammerman, P.S. & Thompson, C.B. Fuel feeds function: energy metabolism and the T-cell response. Nat. Rev. Immunol. 5, 844–852 (2005).

    Article  CAS  PubMed  Google Scholar 

  35. Nelms, K., Keegan, A.D., Zamorano, J., Ryan, J.J. & Paul, W.E. The IL-4 receptor: signaling mechanisms and biologic functions. Annu. Rev. Immunol. 17, 701–738 (1999).

    Article  CAS  PubMed  Google Scholar 

  36. Zhu, J., Yamane, H., Cote-Sierra, J., Guo, L. & Paul, W.E. GATA-3 promotes TH2 responses through three different mechanisms: induction of TH2 cytokine production, selective growth of TH2 cells and inhibition of Th1 cell-specific factors. Cell Res. 16, 3–10 (2006).

    Article  CAS  PubMed  Google Scholar 

  37. Stritesky, G.L. et al. The transcription factor STAT3 is required for T helper 2 cell development. Immunity 34, 39–49 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Zhou, M. et al. Kruppel-like transcription factor 13 regulates T lymphocyte survival in vivo. J. Immunol. 178, 5496–5504 (2007).

    Article  CAS  PubMed  Google Scholar 

  39. Betz, B.C. et al. Batf coordinates multiple aspects of B and T cell function required for normal antibody responses. J. Exp. Med. 207, 933–942 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sahoo, A. et al. Stat6 and c-Jun mediate TH2 cell-specific IL--24 gene expression. J. Immunol. 186, 4098–4109 (2011).

    Article  CAS  PubMed  Google Scholar 

  41. Jensen, L.J. et al. STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–D416 (2009).

    Article  CAS  PubMed  Google Scholar 

  42. Chang, C.H. et al. Posttranscriptional control of T cell effector function by aerobic glycolysis. Cell 153, 1239–1251 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Garcia-Sanz, J.A., Mikulits, W., Livingstone, A., Lefkovits, I. & Mullner, E.W. Translational control: a general mechanism for gene regulation during T cell activation. FASEB J. 12, 299–306 (1998).

    Article  CAS  PubMed  Google Scholar 

  44. Bird, J.J. et al. Helper T cell differentiation is controlled by the cell cycle. Immunity 9, 229–237 (1998).

    Article  CAS  PubMed  Google Scholar 

  45. Wilson, C.B., Makar, K.W. & Perez-Melgosa, M. Epigenetic regulation of T cell fate and function. J. Infect. Dis. 185 (suppl. 1), S37–S45 (2002).

    Article  CAS  PubMed  Google Scholar 

  46. Stegle, O., Teichmann, S.A. & Marioni, J.C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. (in the press).

  47. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Gagnon-Bartsch, J.A. & Speed, T.P. Using control genes to correct for unwanted variation in microarray data. Biostatistics 13, 539–552 (2012).

    PubMed  PubMed Central  Google Scholar 

  49. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Buettner, F. & Theis, F.J. A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst. Bioinformatics 28, i626–i632 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank S. Anders and A. Baud for helpful discussions. We also thank the Sanger-EBI Single Cell Centre for technical support. We acknowledge support of the European Research Council (Starting grant no. 260507 thSWITCH to S.A.T., Starting Grant LatentCauses to F.J.T., Marie Curie FP7 fellowship 253524 to O.S.), the Sanger-EBI Single Cell Centre (K.N.N. & A.S.) and the European Molecular Biology Organization (short-term fellowship to F.B.).

Author information

Authors and Affiliations

Authors

Contributions

F.B. developed the method, performed the analysis and wrote the paper. K.N.N. performed the mESC experiments and contributed to the analysis. F.P.C. and A.S. contributed to method development and analysis. V.P., S.A.T. and F.J.T. helped interpret the biological results. S.A.T. and V.P. designed the mouse TH2 differentiation experiment. J.C.M. and O.S. designed and supervised this study, contributed to the method development and wrote the paper.

Corresponding authors

Correspondence to John C Marioni or Oliver Stegle.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–22, Supplementary Tables 3, 6, 8–9 and Supplementary Notes (PDF 31436 kb)

Supplementary Table 1

List of genes annotated from cell cycle, either from GO or from CycleBase. (XLSX 61 kb)

Supplementary Table 2

List of contribution of all variance components for the T-cell data. (XLSX 484 kb)

Supplementary Table 4

List of significantly differentially expressed genes between identified cell sub clusters. (XLSX 47 kb)

Supplementary Table 5

Manually curated list of 122 Th2 signature genes. (XLSX 33 kb)

Supplementary Table 7

List of genes with more than 5% of the variance explained by interaction between cell cycle and differentiation. (XLSX 34 kb)

Supplementary Data 1

Corrected and uncorrected expression values for T-cell data. (XLSX 9175 kb)

Supplementary Data 2

Corrected and uncorrected expression values for the newly generated mouse ESC data. (XLSX 37981 kb)

Supplementary Source Code (ZIP 9010 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Buettner, F., Natarajan, K., Casale, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 33, 155–160 (2015). https://doi.org/10.1038/nbt.3102

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.3102

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing