En poursuivant votre navigation sur ce site, vous acceptez l'utilisation d'un simple cookie d'identification. Aucune autre exploitation n'est faite de ce cookie. OK

Documents 62P10 18 résultats

Filtrer
Sélectionner : Tous / Aucun
Q
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
2y

Bayesian modelling - Mengersen, Kerrie (Auteur de la Conférence) | CIRM H

Post-edited

This tutorial will be a beginner's introduction to Bayesian statistical modelling and analysis. Simple models and computational tools will be described, followed by a discussion about implementing these approaches in practice. A range of case studies will be presented and possible solutions proposed, followed by an open discussion about other ways that these problems could be tackled.

62C10 ; 62F15 ; 62P12 ; 62P10

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Genetic variability under the seed bank coalescent - Blath, Jochen (Auteur de la Conférence) | CIRM H

Multi angle

We analyse patterns of genetic variability of populations in the presence of a large seed bank with the help of a new coalescent structure called seed bank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seed banks, if the seed bank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson process on the active lineages, and potentially at reduced rate also on the dormant lineages. The presence of ‘dormant' lineages leads to qualitatively altered times to the most recent common ancestor and non-classical patterns of genetic diversity. To illustrate this we provide a Wright-Fisher model with seed bank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seed bank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Commonly employed distance statistics, in the presence and absence of a seed bank, are compared. The effect of a seed bank on the expected site-frequency spectrum is also investigated. Our results indicate that the presence of a large seed bank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect the presence of a large seed bank in genetic data. Joint work with Bjarki Eldon, Adrián González Casanova, Noemi Kurt, Maite Wilke-Berenguer[-]
We analyse patterns of genetic variability of populations in the presence of a large seed bank with the help of a new coalescent structure called seed bank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seed banks, if the seed bank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson process on the active lineages, ...[+]

92D10 ; 60K35 ; 62P10

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
Faced with data containing a large number of inter-related explanatory variables, finding ways to investigate complex multi-factorial effects is an important statistical task. This is particularly relevant for epidemiological study designs where large numbers of covariates are typically collected in an attempt to capture complex interactions between host characteristics and risk factors. A related task, which is of great interest in stratified medicine, is to use multi-omics data to discover subgroups of patients with distinct molecular phenotypes and clinical outcomes, thus providing the potential to target treatments more precisely. Flexible clustering is a natural way to tackle such problems. It can be used in an unsupervised or a semi-supervised manner by adding a link between the clustering structure and outcomes and performing joint modelling. In this case, the clustering structure is used to help predict the outcome. This latter approach, known as profile regression, has been implemented recently using a Bayesian non parametric DP modelling framework, which specifies a joint clustering model for covariates and outcome, with an additional variable selection step to uncover the variables driving the clustering (Papathomas et al, 2012). In this talk, two related issues will be discussed. Firstly, we will focus on categorical covariates, a common situation in epidemiological studies, and examine the relation between: (i) dependence structures highlighted by Bayesian partitioning of the covariate space incorporating variable selection; and (ii) log linear modelling with interaction terms, a traditional approach to model dependence. We will show how the clustering approach can be employed to assist log-linear model determination, a challenging task as the model space becomes quickly very large (Papathomas and Richardson, 2015). Secondly, we will discuss clustering as a tool for integrating information from multiple datasets, with a view to discover useful structure for prediction. In this context several related issues arise. It is clear that each dataset may carry a different amount of information for the predictive task. Methods for learning how to reweight each data type for this task will therefore be presented. In the context of multi-omics datasets, the efficiency of different methods for performing integrative clustering will also be discussed, contrasting joint modelling and stepwise approaches. This will be illustrated by analysis of genomics cancer datasets.
Joint work with Michael Papathomas and Paul Kirk.[-]
Faced with data containing a large number of inter-related explanatory variables, finding ways to investigate complex multi-factorial effects is an important statistical task. This is particularly relevant for epidemiological study designs where large numbers of covariates are typically collected in an attempt to capture complex interactions between host characteristics and risk factors. A related task, which is of great interest in stratified ...[+]

62F15 ; 62P10

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Une histoire de mots inattendus et de génomes - Schbath, Sophie (Auteur de la Conférence) | CIRM H

Multi angle

Dans une première partie, je présenterai différentes problématiques liées à des statistiques d'occurrences de mots dans des génomes et décortiquerai plus en détail la question de savoir comment détecter si un mot a une fréquence d'apparition significativement anormale dans une séquence. Dans une deuxième partie, je présenterai différentes extensions pour tenir compte du fait qu'un motif d'ADN fonctionnel n'est pas toujours un « mot », mais qu'il peut avoir une structure plus complexe qui nécessite le développement de nouvelles méthodes statistiques.[-]
Dans une première partie, je présenterai différentes problématiques liées à des statistiques d'occurrences de mots dans des génomes et décortiquerai plus en détail la question de savoir comment détecter si un mot a une fréquence d'apparition significativement anormale dans une séquence. Dans une deuxième partie, je présenterai différentes extensions pour tenir compte du fait qu'un motif d'ADN fonctionnel n'est pas toujours un « mot », mais qu'il ...[+]

92C40 ; 62P10 ; 60J20 ; 92C42

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
2y
I shall classify current approaches to multiple inferences according to goals, and discuss the basic approaches being used. I shall then highlight a few challenges that await our attention : some are simple inequalities, others arise in particular applications.

62J15 ; 62P10

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Selective inference in genetics - Sabatti, Chiara (Auteur de la Conférence) | CIRM H

Multi angle

Geneticists have always been aware that, when looking for signal across the entire genome, one has to be very careful to avoid false discoveries. Contemporary studies often involve a very large number of traits, increasing the challenges of "looking every-where". I will discuss novel approaches that allow an adaptive exploration of the data, while guaranteeing reproducible results.

62F15 ; 62J15 ; 62P10 ; 92D10

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Learning on the symmetric group - Vert, Jean-Philippe (Auteur de la Conférence) | CIRM H

Multi angle

Many data can be represented as rankings or permutations, raising the question of developing machine learning models on the symmetric group. When the number of items in the permutations gets large, manipulating permutations can quickly become computationally intractable. I will discuss two computationally efficient embeddings of the symmetric groups in Euclidean spaces leading to fast machine learning algorithms, and illustrate their relevance on biological applications and image classification.[-]
Many data can be represented as rankings or permutations, raising the question of developing machine learning models on the symmetric group. When the number of items in the permutations gets large, manipulating permutations can quickly become computationally intractable. I will discuss two computationally efficient embeddings of the symmetric groups in Euclidean spaces leading to fast machine learning algorithms, and illustrate their relevance ...[+]

62H30 ; 62P10 ; 68T05

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
In many health studies, interest often lies in assessing health effects on a large set of outcomes or specific outcome subtypes, which may be sparsely observed, even in big data settings. For example, while the overall prevalence of birth defects is not low, the vast heterogeneity in types of congenital malformations leads to challenges in estimation for sparse groups. However, lumping small groups together to facilitate estimation is often controversial and may have limited scientific support.
There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. We wish to cluster birth defects into groups to facilitate estimation, and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects.[-]
In many health studies, interest often lies in assessing health effects on a large set of outcomes or specific outcome subtypes, which may be sparsely observed, even in big data settings. For example, while the overall prevalence of birth defects is not low, the vast heterogeneity in types of congenital malformations leads to challenges in estimation for sparse groups. However, lumping small groups together to facilitate estimation is often ...[+]

62F15 ; 62H30 ; 60G09 ; 60G57 ; 62G05 ; 62P10

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
The term ‘Public Access Defibrillation' (PAD) is referred to programs based on the placement of Automated External Defibrillators (AED) in key locations along cities' territory together with the development of a training plan for users (first responders). PAD programs are considered necessary since time for intervention in cases of sudden cardiac arrest outside of a medical environment (out-of-hospital cardiocirculatory arrest, OHCA) is strongly limited: survival potential decreases from a 67% baseline by 7 to 10% for each minute of delay in first defibrillation. However, it is widely recognized that current PAD performance is largely below its full potential. We provide a Bayesian spatio-temporal statistical model for predidicting OHCAs. Then we construct a risk map for Ticino, adjusted for demographic covariates, that explains and forecasts the spatial distribution of OHCAs, their temporal dynamics, and how the spatial distribution changes over time. The objective is twofold: to efficiently estimate, in each area of interest, the occurrence intensity of the OHCA event and to suggest a new optimized distribution of AEDs that accounts for population exposure to the geographic risk of OHCA occurrence and that includes both displacement of current devices and installation of new ones.[-]
The term ‘Public Access Defibrillation' (PAD) is referred to programs based on the placement of Automated External Defibrillators (AED) in key locations along cities' territory together with the development of a training plan for users (first responders). PAD programs are considered necessary since time for intervention in cases of sudden cardiac arrest outside of a medical environment (out-of-hospital cardiocirculatory arrest, OHCA) is strongly ...[+]

62F15 ; 62P10 ; 62H11 ; 91B30

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
Low-dimensional compartment models for biological systems can be fitted to time series data using Monte Carlo particle filter methods. As dimension increases, for example when analyzing a collection of spatially coupled populations, particle filter methods rapidly degenerate. We show that many independent Monte Carlo calculations, each of which does not attempt to solve the filtering problem, can be combined to give a global filtering solution with favorable theoretical scaling properties under a weak coupling condition. The independent Monte Carlo calculations are called islands, and the operation carried out on each island is called adapted simulation, so the complete algorithm is called an adapted simulation island filter. We demonstrate this methodology and some related algorithms on a model for measles transmission within and between cities.[-]
Low-dimensional compartment models for biological systems can be fitted to time series data using Monte Carlo particle filter methods. As dimension increases, for example when analyzing a collection of spatially coupled populations, particle filter methods rapidly degenerate. We show that many independent Monte Carlo calculations, each of which does not attempt to solve the filtering problem, can be combined to give a global filtering solution ...[+]

60G35 ; 60J20 ; 62M02 ; 62M05 ; 62M20 ; 62P10 ; 65C35

Sélection Signaler une erreur