Authors : Lê Cao, Kim-Anh (Author of the conference)
CIRM (Publisher )
Abstract :
Gene module detection methods aim to group genes with similar expression profiles to shed light into functional relationships and co-regulation, and infer gene regulatory networks. Methods proposed so far use clustering to group genes based on global similarity in their expression profiles (co-expression), bi-clustering to group genes and samples simultaneously, network inference to model regulatory relationships between genes. In this talk I will focus on multivariate matrix decomposition techniques that enable dimension reduction and the identification of molecular signatures.
We will consider two different types of assays: bulk and single cell assays. Bulk transcriptomics assays use RNA-sequencing techniques to monitor the average expression profile of all the constituent cells, but fail to identify the distinct transcriptional profiles from different cell types. Single cell assays use similar RNA-seq techniques (scRNA-seq) to those used for bulk cell populations, but provide unprecedented resolution at the cell level to understand cellular heterogeneity and uncover new biology. However, scRNA-seq present new computational and analytical challenges, because of their sheer size (100K – 500K of cells are sequenced) and their zero inflated distribution due to technical drop-outs.
I will illustrate how we can use matrix factorisation technique to mine these data and identify gene modules that underpin molecular mechanisms in cell identity in scRNA-seq. I will also give further perspective on how we could extend similar concepts to integrate different omics data types (e.g. bulk transcriptomics, proteomics, metabolomics) to identify tightly connected multi-omics signatures that holistically describe a biological system.
Keywords : biomathematics; reduction dimension; data integration
MSC Codes :
15A23
- Factorization of matrices
92B15
- General biostatistics, See also {62P10}
Film maker : Hennenfent, Guillaume
Language : English
Available date : 23/03/2020
Conference Date : 05/03/2020
Subseries : Research talks
arXiv category : Machine Learning ; Quantitative Biology
Mathematical Area(s) : Probability & Statistics
Format : MP4 (.mp4) - HD
Video Time : 01:26:46
Targeted Audience : Researchers
Download : https://videos.cirm-math.fr/2020-03-05_Le Cao.mp4
|
Event Title : Thematic Month Week 5: Networks and Molecular Biology / Mois thématique Semaine 5 : Réseaux et biologie moléculaire Event Organizers : Baudot, Anais ; Hubert, Florence ; Moss, Brigitte ; Rémy, Elisabeth ; Tichit, Laurent ; Vignes, Matthieu Dates : 02/03/2020 - 06/03/2020
Event Year : 2020
Event URL : https://conferences.cirm-math.fr/2305.html
DOI : 10.24350/CIRM.V.19620803
Cite this video as:
Lê Cao, Kim-Anh (2020). Matrix factorisation techniques for data integration. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19620803
URI : http://dx.doi.org/10.24350/CIRM.V.19620803
|
See Also
Bibliography
- DRIER, Yotam, SHEFFER, Michal, et DOMANY, Eytan. Pathway-based personalized analysis of cancer. Proceedings of the National Academy of Sciences, 2013, vol. 110, no 16, p. 6388-6393. - https://doi.org/10.1073/pnas.1219651110
- LIU, Chao, SRIHARI, Sriganesh, CAO, Kim-Anh Lê, et al. A fine-scale dissection of the DNA double-strand break repair machinery and its implications for breast cancer therapy. Nucleic acids research, 2014, vol. 42, no 10, p. 6106-6127. - https://doi.org/10.1093/nar/gku284
- LIU, Chao, SRIHARI, Sriganesh, LAL, Samir, et al. Personalised pathway analysis reveals association between DNA repair pathway dysregulation and chromosomal instability in sporadic breast cancer. Molecular oncology, 2016, vol. 10, no 1, p. 179-193. - https://doi.org/10.1016/j.molonc.2015.09.007
- HASTIE, Trevor et STUETZLE, Werner. Principal curves. Journal of the American Statistical Association, 1989, vol. 84, no 406, p. 502-516. - https://www.tandfonline.com/doi/abs/10.1080/01621459.1989.10478797
- SAELENS, Wouter, CANNOODT, Robrecht, et SAEYS, Yvan. A comprehensive evaluation of module detection methods for gene expression data. Nature communications, 2018, vol. 9, no 1, p. 1-12. - https://doi.org/10.1038/s41467-018-03424-4
- COMON, Pierre. Independent component analysis, a new concept?. Signal processing, 1994, vol. 36, no 3, p. 287-314. - https://doi.org/10.1016/0165-1684(94)90029-9
- YAO, Fangzhou, COQUERY, Jeff, et LÊ CAO, Kim-Anh. Independent principal component analysis for biologically meaningful dimension reduction of large biological data sets. BMC bioinformatics, 2012, vol. 13, no 1, p. 24. - http://dx.doi.org/10.1186/1471-2105-13-24
- SCHAUM, Nicholas, KARKANIAS, Jim, NEFF, Norma F., et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris: The Tabula Muris Consortium. Nature, 2018, vol. 562, no 7727, p. 367. - https://dx.doi.org/10.1038%2Fs41586-018-0590-4
- CAO, Kim-Anh, ROSSOUW, Debra, ROBERT-GRANIÉ, Christèle, et al. A sparse PLS for variable selection when integrating omics data. Statistical Applications in Genetics & Molecular Biology, 2008, vol. 7, no 1, p. 1-29. - https://doi.org/10.2202/1544-6115.1390
- BOITARD, Simon et BESSE, Philippe. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics june (12), Non paginé.(2011), 2011. - https://doi.org/10.1186/1471-2105-12-253
- TENENHAUS, Arthur, PHILIPPE, Cathy, GUILLEMOT, Vincent, et al. Variable selection for generalized canonical correlation analysis. Biostatistics, 2014, vol. 15, no 3, p. 569-583. - https://doi.org/10.1093/biostatistics/kxu001
- SINGH, Amrit, SHANNON, Casey P., GAUTIER, Benoît, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics, 2019, vol. 35, no 17, p. 3055-3062. - https://doi.org/10.1093/bioinformatics/bty1054
- ROHART, Florian, GAUTIER, Benoit, SINGH, Amrit, et al. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS computational biology, 2017, vol. 13, no 11, p. e1005752.
- https://doi.org/10.1371/journal.pcbi.1005752 - LEE, Amy H., SHANNON, Casey P., AMENYOGBE, Nelly, et al. Dynamic molecular changes during the first week of human life follow a robust developmental trajectory. Nature communications, 2019, vol. 10, no 1, p. 1-14. - https://doi.org/10.1038/s41467-019-08794-x
- LE CAO, Kim-Anh, COSTELLO, Mary-Ellen, LAKIS, Vanessa Anne, et al. MixMC: a multivariate statistical framework to gain insight into microbial communities. PloS one, 2016, vol. 11, no 8. - https://dx.doi.org/10.1371%2Fjournal.pone.0160169
- WANG, Yiwen et LÊCAO, Kim-Anh. Managing batch effects in microbiome data. Briefings in bioinformatics, 2019. - https://doi.org/10.1093/bib/bbz105
- BODEIN, Antoine, CHAPLEUR, Olivier, DROIT, Arnaud, et al. A generic multivariate framework for the integration of microbiome longitudinal studies with other data types. Frontiers in Genetics, 2019, vol. 10. - https://dx.doi.org/10.3389%2Ffgene.2019.00963