Déposez votre fichier ici pour le déplacer vers cet enregistrement.

The continuous time Bayesian networks (CTBNs) represent a class of stochastic processes, which can be used to model complex phenomena, for instance, they can describe interactions occurring in living processes, in social science models or in medicine. The literature on this topic is usually focused on the case, when the dependence structure of a system is known and we are to determine conditional transition intensities (parameters of the network). In the paper, we study the structure learning problem, which is a more challenging task and the existing research on this topic is limited. The approach, which we propose, is based on a penalized likelihood method. We prove that our algorithm, under mild regularity conditions, recognizes the dependence structure of the graph with high probability. We also investigate the properties of the procedure in numerical studies to demonstrate its effectiveness .

The continuous time Bayesian networks (CTBNs) represent a class of stochastic processes, which can be used to model complex phenomena, for instance, they can describe interactions occurring in living processes, in social science models or in medicine. The literature on this topic is usually focused on the case, when the dependence structure of a system is known and we are to determine conditional transition intensities (parameters of the ...

62M05 ; 62F30 ; 60J27

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

In high-dimensional regression, the number of explanatory variables with nonzero effects - often referred to as sparsity - is an important measure of the difficulty of the variable selection problem. As a complement to sparsity, this paper introduces a new measure termed effect size heterogeneity for a finer-grained understanding of the trade-off between type I and type II errorsor, equivalently, false and true positive rates using the Lasso. Roughly speaking, a regression coefficient vector has higher effect size heterogeneity than another vector (of the same sparsity) if the nonzero entries of the former are more heterogeneous than those of the latter in terms of magnitudes. From the perspective of this new measure, we prove that in a regime of linear sparsity, false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximum in the sense that all nonzero effect sizes have very differentmagnitudes, and the worst-case trade-off is achieved when it is minimum in the sense that allnonzero effect sizes are about equal. Moreover, we demonstrate that the Lasso path produces anoptimal ranking of explanatory variables in terms of the rank of the first false variable when the effect size heterogeneity is maximum, and vice versa. Metaphorically, these two findings suggest that variables with comparable effect sizes—no matter how large they are—would compete with each other along the Lasso path, leading to an increased hardness of the variable selection problem. Our proofs use techniques from approximate message passing theory as well as a novel argument for estimating the rank of the first false variable.

In high-dimensional regression, the number of explanatory variables with nonzero effects - often referred to as sparsity - is an important measure of the difficulty of the variable selection problem. As a complement to sparsity, this paper introduces a new measure termed effect size heterogeneity for a finer-grained understanding of the trade-off between type I and type II errorsor, equivalently, false and true positive rates using the Lasso. ...

62F03 ; 62J07 ; 62J05

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Averages of proper scoring rules are often used to rank probabilistic forecasts. In many cases, the individual observations and their predictive distributions in these averages have variable scale (variance). I will show that some of the most popular proper scoring rules, such as the continuous ranked probability score (CRPS), up-weight observations with large uncertainty which can lead to unintuitive rankings. We have developed a new scoring rule, scaled CRPS (SCRPS), this new proper scoring rule is locally scale invariant and therefore works in the case of varying uncertainty.

I will demonstrate this how this affects model selection through parameter estimation in spatial statitics.

Averages of proper scoring rules are often used to rank probabilistic forecasts. In many cases, the individual observations and their predictive distributions in these averages have variable scale (variance). I will show that some of the most popular proper scoring rules, such as the continuous ranked probability score (CRPS), up-weight observations with large uncertainty which can lead to unintuitive rankings. We have developed a new scoring ...

60G25 ; 62M30 ; 62C25

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, e.g., when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured using mean-field asymptotics. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with light-weight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and show that our approach enables the platform to optimize p based on perturbations whose magnitude can get vanishingly small in large systems.

Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, e.g., when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable ...

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We present a novel theoretical framework for statistical analysis of Large-scale problems that builds on the Robbins compound decision approach. We present a hierarchical Bayesian approach for implementing this framework and illustrate its application to simulated data.

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

The maximum score statistic is used to detect and estimate changes in the level, slope, or other local feature of a sequance of observations, and to segment the sequence xhen there appear to be multiple changes. Control of false positive errors when observations are auto-correlated is achieved by using a first order autoregressive model. True changes in level or slope can lead to badly biased estimates of the autoregressive parameter and variance, which can result in a loss of power. Modifications of the natural estimators to deal with this difficulty are partially successful. Applications to temperature time series, atmospheric CO2 levels, COVID-19 incidence, excess deaths, copy number variations, and weather extremes illustrate the general theory.

This is joint research with Xiao Fang.

The maximum score statistic is used to detect and estimate changes in the level, slope, or other local feature of a sequance of observations, and to segment the sequence xhen there appear to be multiple changes. Control of false positive errors when observations are auto-correlated is achieved by using a first order autoregressive model. True changes in level or slope can lead to badly biased estimates of the autoregressive parameter and ...

62H10 ; 62J02 ; 62L10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We introduce a new method for high-dimensional, online changepoint detection in settings where a p-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates.

The algorithm is online in the sense that its worst-case computational complexity per new observation, namely O(p2log(ep)), is independent of the number of previous observations; in practice, it may even be significantly faster than this. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal.

We introduce a new method for high-dimensional, online changepoint detection in settings where a p-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates.

The algorithm is online in the sense that its worst-case computational complexity per new ...

62L10 ; 62L15 ; 62F30

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. The canonical pivotal estimator is the square-root Lasso, formulated along with its derivatives as a "non-smooth + non-smooth'' optimization problem.

Modern techniques to solve these include smoothing the datafitting term, to benefit from fast efficient proximal algorithms.

In this work we focus on minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators. We also provide some guidelines on how to set the smoothing hyperparameter, and illustrate on synthetic data the interest of such guidelines.

This is joint work with Quentin Bertrand (INRIA), Mathurin Massias, Olivier Fercoq and Alexandre Gramfort.

In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. The canonical pivotal estimator is the square-root Lasso, formulated along with its derivatives as a "non-smooth + non-smooth'' optimization problem.

Modern techniques to solve these include smoothing the datafitting term, to benefit from fast efficient proximal algorithms.

In this work we ...

62J05 ; 62J12 ; 62P10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

The framework of knockoffs has been recently proposed to perform variable selection under rigorous type-I error control, without relying on strong modeling assumptions. We extend the methodology of knockoffs to a rich family of problems where the distribution of the covariates can be described by a hidden Markov model. We develop an exact and efficient algorithm to sample knockoff variables in this setting and then argue that, combined with the existing selective framework, this provides a natural and powerful tool for performing principled inference in genomewide association studies with guaranteed false discovery rate control. To handle the high level of dependence that can exist between SNPs in linkage disequilibrium, we propose a multi-resolution analysis, that simultaneously identifies loci of importance and provides results analogous to those obtained in fine mapping.

This is joint work with Matteo Sesia, Eugene Katsevich, Stephen Bates and Emmanuel Candes.

The framework of knockoffs has been recently proposed to perform variable selection under rigorous type-I error control, without relying on strong modeling assumptions. We extend the methodology of knockoffs to a rich family of problems where the distribution of the covariates can be described by a hidden Markov model. We develop an exact and efficient algorithm to sample knockoff variables in this setting and then argue that, combined with the ...

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Multiple testing problems are a staple of modern statistics. The fundamental objective is to reject as many false null hypotheses as possible, subject to controlling an overall measure of false discovery, like family-wise error rate (FWER) or false discovery rate (FDR). We formulate multiple testing of simple hypotheses as an infinite-dimensional optimization problem, seeking the most powerful rejection policy which guarantees strong control of the selected measure. We show that for exchangeable hypotheses, for FWER or FDR and relevant notions of power, these problems lead to infinite programs that can provably be solved. We explore maximin rules for complex alternatives, and show they can be found in practice, leading to improved practical procedures compared to existing alternatives. We derive explicit optimal tests for FWER or FDR control for three independent normal means. We find that the power gain over natural competitors is substantial in all settings examined. We apply our optimal maximin rule to subgroup analyses in systematic reviews from the Cochrane library, leading to an increased number of findings compared to existing alternatives.

Multiple testing problems are a staple of modern statistics. The fundamental objective is to reject as many false null hypotheses as possible, subject to controlling an overall measure of false discovery, like family-wise error rate (FWER) or false discovery rate (FDR). We formulate multiple testing of simple hypotheses as an infinite-dimensional optimization problem, seeking the most powerful rejection policy which guarantees strong control of ...

62F03 ; 62J15 ; 62P10

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

When performing multiple testing, adjusting the distribution of the null hypotheses is ubiquitous in applications. However, the effect of such an operation remains largely unknown, especially in terms of false discovery proportion (FDP) and true discovery proportion (TDP). In this talk, we explore this issue in the most classical case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and where the Benjamini-Hochberg (BH) procedure is applied after a datarescaling step.

When performing multiple testing, adjusting the distribution of the null hypotheses is ubiquitous in applications. However, the effect of such an operation remains largely unknown, especially in terms of false discovery proportion (FDP) and true discovery proportion (TDP). In this talk, we explore this issue in the most classical case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and where the ...

62G10 ; 62C20 ; 62G30

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

This paper addresses the following question: “Can regression trees do what other machine learning methods cannot?” To answer this question, we consider the problem of estimating regression functions with spatial inhomogeneities. Many real life applications involve functions that exhibit a variety of shapes including jump discontinuities or high-frequency oscillations. Unfortunately, the overwhelming majority of existing asymptotic minimaxity theory (for density or regression function estimation) is predicated on homogeneous smoothness assumptions which are inadequate for such data. Focusing on locally Holder functions, we provide locally adaptive posterior concentration rate results under the supremum loss. These results certify that trees can adapt to local smoothness by uniformly achieving the point-wise (near) minimax rate. Such results were previously unavailable for regression trees (forests). Going further, we construct locally adaptive credible bands whose width depends on local smoothness and which achieve uniform coverage under local self-similarity. Unlike many other machine learning methods, Bayesian regression trees thus provide valid uncertainty quantification. To highlight the benefits of trees, we show that Gaussian processes cannot adapt to local smoothness by showing lower bound results under a global estimation loss. Bayesian regression trees are thus uniquely suited for estimation and uncertainty quantification of spatially inhomogeneous functions.

This paper addresses the following question: “Can regression trees do what other machine learning methods cannot?” To answer this question, we consider the problem of estimating regression functions with spatial inhomogeneities. Many real life applications involve functions that exhibit a variety of shapes including jump discontinuities or high-frequency oscillations. Unfortunately, the overwhelming majority of existing asymptotic minimaxity ...

62G20 ; 62G15

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood ratio statistic, that we call “the split likelihood ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model-misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid p-values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.

We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood ratio statistic, that we call “the split likelihood ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood ratio statistic is often ...

62C05 ; 62F03 ; 62G10 ; 62L12

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We present confidence bounds on the false positives contained in subsets of selected null hypotheses. As the coverage probability holds simultaneously over all possible subsets, these bounds can be applied to an arbitrary number of possibly datadriven subsets.

These bounds are built via a two-step approach. First, build a family of candidate rejection subsets together with associated bounds, holding uniformly on the number of false positives they contain (call this a reference family). Then, interpolate from this reference family to find a bound valid for any subset.

This general program is exemplified for two particular types of reference families: (i) when the bounds are fixed and the subsets are p-value level sets (ii) when the subsets are fixed, spatially structured and the bounds are estimated.

We present confidence bounds on the false positives contained in subsets of selected null hypotheses. As the coverage probability holds simultaneously over all possible subsets, these bounds can be applied to an arbitrary number of possibly datadriven subsets.

These bounds are built via a two-step approach. First, build a family of candidate rejection subsets together with associated bounds, holding uniformly on the number of false positives ...

62J15

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

A quasi logistic distribution on the real line has density proportional to $(cosh x+cos a)^{-1}$ if $V> 0$ and $Z$ with standard normal law are independent, we say that $\sqrt{V}$ has a quasi Kolmogorov distribution if $Z\sqrt{V}$ is quasi logistic. We study the numerous properties of these generalizations of the logistic and Kolmogorov laws.

62E10 ; 60E07 ; 33E05

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We consider the problem of estimating the mean vector of the multivariate complex normaldistribution with unknown covariance matrix under an invariant loss function when the samplesize is smaller than the dimension of the mean vector. Following the approach of Chételat and Wells (2012, Ann.Statist, p. 3137-3160), we show that a modification of Baranchik-tpye estimatorsbeats the MLE if it satisfies certain conditions. Based on this result, we propose the James-Stein-like shrinkage and its positive-part estimators.

We consider the problem of estimating the mean vector of the multivariate complex normaldistribution with unknown covariance matrix under an invariant loss function when the samplesize is smaller than the dimension of the mean vector. Following the approach of Chételat and Wells (2012, Ann.Statist, p. 3137-3160), we show that a modification of Baranchik-tpye estimatorsbeats the MLE if it satisfies certain conditions. Based on this result, we ...

62F10 ; 62C20 ; 62H12

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Inferring causal effects of a treatment or policy from observational data is central to many applications. However, state-of-the-art methods for causal inference suffer when covariates have missing values, which is ubiquitous in application.

Missing data greatly complicate causal analyses as they either require strong assumptions about the missing data generating mechanism or an adapted unconfoundedness hypothesis. In this talk, I will first provide a classification of existing methods according to the main underlying assumptions, which are based either on variants of the classical unconfoundedness assumption or relying on assumptions about the mechanism that generates the missing values. Then, I will present two recent contributions on this topic: (1) an extension of doubly robust estimators that allows handling of missing attributes, and (2) an approach to causal inference based on variational autoencoders adapted to incomplete data.

I will illustrate the topic an an observational medical database which has heterogeneous data and a multilevel structure to assess the impact of the administration of a treatment on survival.

Inferring causal effects of a treatment or policy from observational data is central to many applications. However, state-of-the-art methods for causal inference suffer when covariates have missing values, which is ubiquitous in application.

Missing data greatly complicate causal analyses as they either require strong assumptions about the missing data generating mechanism or an adapted unconfoundedness hypothesis. In this talk, I will first ...

62P10 ; 62H12 ; 62N99

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Many modern applications seek to understand the relationship between an outcome variable of interest and a high-dimensional set of covariates. Often the first question asked is which covariates are important in this relationship, but the immediate next question, which in fact subsumes the first, is \emph{how} important each covariate is in this relationship. In parametric regression this question is answered through confidence intervals on the parameters. But without making substantial assumptions about the relationship between the outcome and the covariates, it is unclear even how to \emph{measure} variable importance, and for most sensible choices even less clear how to provide inference for it under reasonable conditions. In this paper we propose \emph{floodgate}, a novel method to provide asymptotic inference for a scalar measure of variable importance which we argue has universal appeal, while assuming nothing but moment bounds about the relationship between the outcome and the covariates. We take a model-X approach and thus assume the covariate distribution is known, but extend floodgate to the setting that only a \emph{model} for the covariate distribution is known and also quantify its robustness to violations of the modeling assumptions. We demonstrate floodgate's performance through extensive simulations and apply it to data from the UK Biobank to quantify the effects of genetic mutations on traits of interest.

Many modern applications seek to understand the relationship between an outcome variable of interest and a high-dimensional set of covariates. Often the first question asked is which covariates are important in this relationship, but the immediate next question, which in fact subsumes the first, is \emph{how} important each covariate is in this relationship. In parametric regression this question is answered through confidence intervals on the ...

62G15 ; 62G20

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

As a generalization of fill-in free property of a sparse positive definite real symmetric matrix with respect to the Cholesky decomposition, we introduce a notion of (quasi-)Cholesky structure for a real vector space of symmetric matrices. The cone of positive definite symmetric matrices in a vector space with a quasi-Cholesky structure admits explicit calculations and rich analysis similar to the ones for Gaussian selsction model associated to a decomposable graph. In particular, we can apply our method to a decomposable graphical model with a vertex pemutation symmetry.

As a generalization of fill-in free property of a sparse positive definite real symmetric matrix with respect to the Cholesky decomposition, we introduce a notion of (quasi-)Cholesky structure for a real vector space of symmetric matrices. The cone of positive definite symmetric matrices in a vector space with a quasi-Cholesky structure admits explicit calculations and rich analysis similar to the ones for Gaussian selsction model associated to ...

15B48 ; 62E15