CIRM - Videos & books Library - Exploring the presence of complex dependence structures in epidemiological and genomic data through flexible clustering

Multi angle

Auteurs : Richardson, Sylvia (Auteur de la Conférence)
CIRM (Editeur )

Loading the player...

Résumé : Faced with data containing a large number of inter-related explanatory variables, finding ways to investigate complex multi-factorial effects is an important statistical task. This is particularly relevant for epidemiological study designs where large numbers of covariates are typically collected in an attempt to capture complex interactions between host characteristics and risk factors. A related task, which is of great interest in stratified medicine, is to use multi-omics data to discover subgroups of patients with distinct molecular phenotypes and clinical outcomes, thus providing the potential to target treatments more precisely. Flexible clustering is a natural way to tackle such problems. It can be used in an unsupervised or a semi-supervised manner by adding a link between the clustering structure and outcomes and performing joint modelling. In this case, the clustering structure is used to help predict the outcome. This latter approach, known as profile regression, has been implemented recently using a Bayesian non parametric DP modelling framework, which specifies a joint clustering model for covariates and outcome, with an additional variable selection step to uncover the variables driving the clustering (Papathomas et al, 2012). In this talk, two related issues will be discussed. Firstly, we will focus on categorical covariates, a common situation in epidemiological studies, and examine the relation between: (i) dependence structures highlighted by Bayesian partitioning of the covariate space incorporating variable selection; and (ii) log linear modelling with interaction terms, a traditional approach to model dependence. We will show how the clustering approach can be employed to assist log-linear model determination, a challenging task as the model space becomes quickly very large (Papathomas and Richardson, 2015). Secondly, we will discuss clustering as a tool for integrating information from multiple datasets, with a view to discover useful structure for prediction. In this context several related issues arise. It is clear that each dataset may carry a different amount of information for the predictive task. Methods for learning how to reweight each data type for this task will therefore be presented. In the context of multi-omics datasets, the efficiency of different methods for performing integrative clustering will also be discussed, contrasting joint modelling and stepwise approaches. This will be illustrated by analysis of genomics cancer datasets.
Joint work with Michael Papathomas and Paul Kirk.

Codes MSC :
62F15 - Bayesian inference
62P10 - Applications of statistics to biology and medical sciences

Informations sur la Vidéo

Réalisateur :

Hennenfent, Guillaume

Langue :

Date de publication :

Date de captation :

Sous collection :

arXiv category :

Domaine :

Format :

Durée :

Audience :

Download :

https://videos.cirm-math.fr/2016-02-29_Richardson.mp4

Informations sur la Rencontre

Nom de la rencontre : Thematic month on statistics - Week 5: Bayesian statistics and algorithms / Mois thématique sur les statistiques - Semaine 5 : Semaine Bayésienne et algorithmes
Organisateurs de la rencontre : Le Gouic, Thibaut ; Pommeret, Denys ; Willer, Thomas
Dates : 29/02/16 - 04/03/16
Année de la rencontre : 2016
URL Congrès : http://conferences.cirm-math.fr/1619.html

Données de citation

DOI : 10.24350/CIRM.V.18937503
Citer cette vidéo: Richardson, Sylvia (2016). Exploring the presence of complex dependence structures in epidemiological and genomic data through flexible clustering. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.18937503
URI : http://dx.doi.org/10.24350/CIRM.V.18937503

Voir aussi

[Multi angle] Variational Bayes methods and algorithms - Part 1 / Auteur de la Conférence Keribin, Christine.
[Multi angle] The expectation-propagation algorithm: a tutorial - Part 1 / Auteur de la Conférence Barthelmé, Simon.
[Multi angle] Approximate Bayesian Computation methods for model choice a machine learning point of view - Part 1 / Auteur de la Conférence Marin, Jean-Michel.
[ Post-edited] Markov Chain Monte Carlo Methods - Part 1 / Auteur de la Conférence Robert, Christian P..

Bibliographie

Chung, Y., & Dunson, D.B. (2009). Nonparametric Bayes conditional distribution modelling with variable selection. Journal of the American Statistical Association, 104(488), 1646-1660 - http://dx.doi.org/10.1198/jasa.2009.tm08302

Kirk, P., Griffin, J.E., Savage, R., Ghahramani, Z., & Wild, D.L. (2012). Bayesian correlated clustering to integrate multiple datasets. Bioinformatics, 28(24), 3290-3297 - http://dx.doi.org/10.1093/bioinformatics/bts595

Liverani, S., Hastie, D.I., Papathomas, M., & Richardson, S. (2015). PReMiuM: An R package for profile regression mixture models using Dirichlet processes. Journal of Statistical Software, 64(7) - http://dx.doi.org/10.18637/jss.v064.i07

Molitor, J.T., Papathomas, M., Jerrett, M., & Richardson, S. (2010). Bayesian profile regression with an application to the national survey of children's health. Biostatistics, 11(3), 484-498 - http://dx.doi.org/10.1093/biostatistics/kxq013

Papathomas, M., Molitor, J., Richardson, S., Riboli E., & Vineis P. (2011) Examining the joint effect of multiple risk factors using exposure risk profiles : lung cancer in non smokers. Environmental Health Perspectives, 119,84-91 - http://dx.doi.org/10.1289/ehp.1002118

Papathomas, M. , Molitor, J., Hoggart, C., Hastie, D., & Richardson, S. (2012). Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: application to searching for gene-gene patterns. Genetic Epidemiology, 36(6), 663-674 - http://dx.doi.org/10.1002/gepi.21661

Papathomas, M. & Richardson, S. (2015). On the utility of the Dirichlet process for linear model determination: application to graphical log-linear model determination. To appear in Journal of Statistical Planning and Inference. - http://arxiv.org/abs/1401.7214

Papathomas, M. & Richardson, S. (2016). Exploring dependence between categorical variables: benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms. Journal of Statistical Planning and Inference, 173, 47-63 - http://dx.doi.org/10.1016/j.jspi.2016.01.002

Richardson.jpg

Sélection Signaler une erreur

TUTELLES

PARTENAIRES

Destination de la recherche

Raccourcis

Voir aussi