CIRM - Videos & books Library

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Data mining methods based on finite mixture models are quite common in many areas of applied science, such as marketing, to segment data and to identify subgroups with specific features. Recent work shows that these methods are also useful in micro econometrics to analyze the behavior of workers in labor markets. Since these data are typically available as time series with discrete states, clustering kernels based on Markov chains with group-specific transition matrices are applied to capture both persistence in the individual time series as well as cross-sectional unobserved heterogeneity. Markov chains clustering has been applied to data from the Austrian labor market, (a) to understanding the effect of labor market entry conditions on long-run career developments for male workers (Frühwirth-Schnatter et al., 2012), (b) to study mothers' long-run career patterns after first birth (Frühwirth-Schnatter et al., 2016), and (c) to study the effects of a plant closure on future career developments for male worker (Frühwirth-Schnatter et al., 2018). To capture non- stationary effects for the later study, time-inhomogeneous Markov chains based on time-varying group specific transition matrices are introduced as clustering kernels. For all applications, a mixture-of-experts formulation helps to understand which workers are likely to belong to a particular group. Finally, it will be shown that Markov chain clustering is also useful in a business application in marketing and helps to identify loyal consumers within a customer relationship management (CRM) program.[-]

Data mining methods based on finite mixture models are quite common in many areas of applied science, such as marketing, to segment data and to identify subgroups with specific features. Recent work shows that these methods are also useful in micro econometrics to analyze the behavior of workers in labor markets. Since these data are typically available as time series with discrete states, clustering kernels based on Markov chains with ...[+]

62C10 ; 62M05 ; 62M10 ; 62H30 ; 62P20 ; 62F15

Sélection Signaler une erreur

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

In many health studies, interest often lies in assessing health effects on a large set of outcomes or specific outcome subtypes, which may be sparsely observed, even in big data settings. For example, while the overall prevalence of birth defects is not low, the vast heterogeneity in types of congenital malformations leads to challenges in estimation for sparse groups. However, lumping small groups together to facilitate estimation is often controversial and may have limited scientific support.
There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. We wish to cluster birth defects into groups to facilitate estimation, and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects.[-]

In many health studies, interest often lies in assessing health effects on a large set of outcomes or specific outcome subtypes, which may be sparsely observed, even in big data settings. For example, while the overall prevalence of birth defects is not low, the vast heterogeneity in types of congenital malformations leads to challenges in estimation for sparse groups. However, lumping small groups together to facilitate estimation is often ...[+]

62F15 ; 62H30 ; 60G09 ; 60G57 ; 62G05 ; 62P10

Sélection Signaler une erreur

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal variability. However, fitting hierarchical spatiotemporal models often involves expensive matrix computations with complexity increasing in cubic order for the number of spatial locations and temporal points. This renders such models unfeasible for large data sets. I will present a focused review of two methods for constructing well-defined highly scalable spatiotemporal stochastic processes. Both these processes can be used as ``priors" for spatiotemporal random fields. The first approach constructs a low-rank process operating on a lower-dimensional subspace. The second approach constructs a Nearest-Neighbor Gaussian Process (NNGP) that ensures sparse precision matrices for its finite realizations. Both processes can be exploited as a scalable prior embedded within a rich hierarchical modeling framework to deliver full Bayesian inference. These approaches can be described as model-based solutions for big spatiotemporal datasets. The models ensure that the algorithmic complexity has n floating point operations (flops), where n is the number of spatial locations (per iteration). We compare these methods and provide some insight into their methodological underpinnings.[-]

With the growing capabilities of Geographic Information Systems (GIS) and user-friendly software, statisticians today routinely encounter geographically referenced data containing observations from a large number of spatial locations and time points. Over the last decade, hierarchical spatiotemporal process models have become widely deployed statistical tools for researchers to better understand the complex nature of spatial and temporal ...[+]

62P12 ; 62M30 ; 62F15

Sélection Signaler une erreur

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Arctic sea-ice extent has been of considerable interest to scientists in recent years, mainly due to its decreasing trend over the past 20 years. In this talk, I propose a hierarchical spatio-temporal generalized linear model (GLM) for binary Arctic-sea-ice data, where data dependencies are introduced through a latent, dynamic, spatio-temporal mixed-effects model. By using a fixed number of spatial basis functions, the resulting model achieves both dimension reduction and non-stationarity for spatial fields at different time points. An EM algorithm is used to estimate model parameters, and an MCMC algorithm is developed to obtain the predictive distribution of the latent spatio-temporal process. The methodology is applied to spatial, binary, Arctic-sea-ice data for each September over the past 20 years, and several posterior summaries are computed to detect changes of Arctic sea-ice cover. The fully Bayesian version is under development awill be discussed.[-]

Arctic sea-ice extent has been of considerable interest to scientists in recent years, mainly due to its decreasing trend over the past 20 years. In this talk, I propose a hierarchical spatio-temporal generalized linear model (GLM) for binary Arctic-sea-ice data, where data dependencies are introduced through a latent, dynamic, spatio-temporal mixed-effects model. By using a fixed number of spatial basis functions, the resulting model achieves ...[+]

62M30 ; 62M10 ; 62M15

Sélection Signaler une erreur

TUTELLES

PARTENAIRES

Destination de la recherche

Raccourcis

Search by event 1912 4 résultats

Bayesian econometrics in the Big Data Era - Frühwirth-Schnatter, Sylvia (Auteur de la Conférence) | CIRM H Nouveau

Centered partition processes: lumping versus splitting in sparse health data - Herring, Amy (Auteur de la Conférence) | CIRM H Nouveau

High-dimensional Bayesian geostatistics ​ - Banerjee, Sudipto (Auteur de la Conférence) | CIRM H Nouveau

Inference for spatio-temporal changes of arctic sea ice - Cressie, Noel A. C. (Auteur de la Conférence) | CIRM H Nouveau

High-dimensional Bayesian geostatistics - Banerjee, Sudipto (Auteur de la Conférence) | CIRM H Nouveau