m

F Nous contacter


0

Documents  62P10 | enregistrements trouvés : 9

O
     

-A +A

P Q

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

I shall classify current approaches to multiple inferences according to goals, and discuss the basic approaches being used. I shall then highlight a few challenges that await our attention : some are simple inequalities, others arise in particular applications.

62J15 ; 62P10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Post-edited  Bayesian modelling
Mengersen, Kerrie (Auteur de la Conférence) | CIRM (Editeur )

This tutorial will be a beginner’s introduction to Bayesian statistical modelling and analysis. Simple models and computational tools will be described, followed by a discussion about implementing these approaches in practice. A range of case studies will be presented and possible solutions proposed, followed by an open discussion about other ways that these problems could be tackled.

62C10 ; 62F15 ; 62P12 ; 62P10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We analyse patterns of genetic variability of populations in the presence of a large seed bank with the help of a new coalescent structure called seed bank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seed banks, if the seed bank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson process on the active lineages, and potentially at reduced rate also on the dormant lineages. The presence of ‘dormant’ lineages leads to qualitatively altered times to the most recent common ancestor and non-classical patterns of genetic diversity. To illustrate this we provide a Wright-Fisher model with seed bank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seed bank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Commonly employed distance statistics, in the presence and absence of a seed bank, are compared. The effect of a seed bank on the expected site-frequency spectrum is also investigated. Our results indicate that the presence of a large seed bank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect the presence of a large seed bank in genetic data. Joint work with Bjarki Eldon, Adrián González Casanova, Noemi Kurt, Maite Wilke-Berenguer
We analyse patterns of genetic variability of populations in the presence of a large seed bank with the help of a new coalescent structure called seed bank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seed banks, if the seed bank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson process on the active lineages, ...

92D10 ; 60K35 ; 62P10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Faced with data containing a large number of inter-related explanatory variables, finding ways to investigate complex multi-factorial effects is an important statistical task. This is particularly relevant for epidemiological study designs where large numbers of covariates are typically collected in an attempt to capture complex interactions between host characteristics and risk factors. A related task, which is of great interest in stratified medicine, is to use multi-omics data to discover subgroups of patients with distinct molecular phenotypes and clinical outcomes, thus providing the potential to target treatments more precisely. Flexible clustering is a natural way to tackle such problems. It can be used in an unsupervised or a semi-supervised manner by adding a link between the clustering structure and outcomes and performing joint modelling. In this case, the clustering structure is used to help predict the outcome. This latter approach, known as profile regression, has been implemented recently using a Bayesian non parametric DP modelling framework, which specifies a joint clustering model for covariates and outcome, with an additional variable selection step to uncover the variables driving the clustering (Papathomas et al, 2012). In this talk, two related issues will be discussed. Firstly, we will focus on categorical covariates, a common situation in epidemiological studies, and examine the relation between: (i) dependence structures highlighted by Bayesian partitioning of the covariate space incorporating variable selection; and (ii) log linear modelling with interaction terms, a traditional approach to model dependence. We will show how the clustering approach can be employed to assist log-linear model determination, a challenging task as the model space becomes quickly very large (Papathomas and Richardson, 2015). Secondly, we will discuss clustering as a tool for integrating information from multiple datasets, with a view to discover useful structure for prediction. In this context several related issues arise. It is clear that each dataset may carry a different amount of information for the predictive task. Methods for learning how to reweight each data type for this task will therefore be presented. In the context of multi-omics datasets, the efficiency of different methods for performing integrative clustering will also be discussed, contrasting joint modelling and stepwise approaches. This will be illustrated by analysis of genomics cancer datasets.
Joint work with Michael Papathomas and Paul Kirk.
Faced with data containing a large number of inter-related explanatory variables, finding ways to investigate complex multi-factorial effects is an important statistical task. This is particularly relevant for epidemiological study designs where large numbers of covariates are typically collected in an attempt to capture complex interactions between host characteristics and risk factors. A related task, which is of great interest in stratified ...

62F15 ; 62P10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Dans une première partie, je présenterai différentes problématiques liées à des statistiques d'occurrences de mots dans des génomes et décortiquerai plus en détail la question de savoir comment détecter si un mot a une fréquence d'apparition significativement anormale dans une séquence. Dans une deuxième partie, je présenterai différentes extensions pour tenir compte du fait qu'un motif d'ADN fonctionnel n'est pas toujours un " mot ", mais qu'il peut avoir une structure plus complexe qui nécessite le développement de nouvelles méthodes statistiques.
Dans une première partie, je présenterai différentes problématiques liées à des statistiques d'occurrences de mots dans des génomes et décortiquerai plus en détail la question de savoir comment détecter si un mot a une fréquence d'apparition significativement anormale dans une séquence. Dans une deuxième partie, je présenterai différentes extensions pour tenir compte du fait qu'un motif d'ADN fonctionnel n'est pas toujours un " mot ", mais qu'il ...

92C40 ; 62P10 ; 60J20 ; 92C42

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Multi angle  Selective inference in genetics
Sabatti, Chiara (Auteur de la Conférence) | CIRM (Editeur )

Geneticists have always been aware that, when looking for signal across the entire genome, one has to be very careful to avoid false discoveries. Contemporary studies often involve a very large number of traits, increasing the challenges of "looking every-where". I will discuss novel approaches that allow an adaptive exploration of the data, while guaranteeing reproducible results.

62F15 ; 62J15 ; 62P10 ; 92D10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Multi angle  Learning on the symmetric group
Vert, Jean-Philippe (Auteur de la Conférence) | CIRM (Editeur )

Many data can be represented as rankings or permutations, raising the question of developing machine learning models on the symmetric group. When the number of items in the permutations gets large, manipulating permutations can quickly become computationally intractable. I will discuss two computationally efficient embeddings of the symmetric groups in Euclidean spaces leading to fast machine learning algorithms, and illustrate their relevance on biological applications and image classification.
Many data can be represented as rankings or permutations, raising the question of developing machine learning models on the symmetric group. When the number of items in the permutations gets large, manipulating permutations can quickly become computationally intractable. I will discuss two computationally efficient embeddings of the symmetric groups in Euclidean spaces leading to fast machine learning algorithms, and illustrate their relevance ...

62H30 ; 62P10 ; 68T05

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

In many health studies, interest often lies in assessing health effects on a large set of outcomes or specific outcome subtypes, which may be sparsely observed, even in big data settings. For example, while the overall prevalence of birth defects is not low, the vast heterogeneity in types of congenital malformations leads to challenges in estimation for sparse groups. However, lumping small groups together to facilitate estimation is often controversial and may have limited scientific support.
There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. We wish to cluster birth defects into groups to facilitate estimation, and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects.
In many health studies, interest often lies in assessing health effects on a large set of outcomes or specific outcome subtypes, which may be sparsely observed, even in big data settings. For example, while the overall prevalence of birth defects is not low, the vast heterogeneity in types of congenital malformations leads to challenges in estimation for sparse groups. However, lumping small groups together to facilitate estimation is often ...

62F15 ; 62H30 ; 60G09 ; 60G57 ; 62G05 ; 62P10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

The term ‘Public Access Defibrillation’ (PAD) is referred to programs based on the placement of Automated External Defibrillators (AED) in key locations along cities’ territory together with the development of a training plan for users (first responders). PAD programs are considered necessary since time for intervention in cases of sudden cardiac arrest outside of a medical environment (out-of-hospital cardiocirculatory arrest, OHCA) is strongly limited: survival potential decreases from a 67% baseline by 7 to 10% for each minute of delay in first defibrillation. However, it is widely recognized that current PAD performance is largely below its full potential. We provide a Bayesian spatio-temporal statistical model for predidicting OHCAs. Then we construct a risk map for Ticino, adjusted for demographic covariates, that explains and forecasts the spatial distribution of OHCAs, their temporal dynamics, and how the spatial distribution changes over time. The objective is twofold: to efficiently estimate, in each area of interest, the occurrence intensity of the OHCA event and to suggest a new optimized distribution of AEDs that accounts for population exposure to the geographic risk of OHCA occurrence and that includes both displacement of current devices and installation of new ones.
The term ‘Public Access Defibrillation’ (PAD) is referred to programs based on the placement of Automated External Defibrillators (AED) in key locations along cities’ territory together with the development of a training plan for users (first responders). PAD programs are considered necessary since time for intervention in cases of sudden cardiac arrest outside of a medical environment (out-of-hospital cardiocirculatory arrest, OHCA) is strongly ...

62F15 ; 62P10 ; 62H11 ; 91B30

Z