Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
In high-dimensional regression, the number of explanatory variables with nonzero effects - often referred to as sparsity - is an important measure of the difficulty of the variable selection problem. As a complement to sparsity, this paper introduces a new measure termed effect size heterogeneity for a finer-grained understanding of the trade-off between type I and type II errorsor, equivalently, false and true positive rates using the Lasso. Roughly speaking, a regression coefficient vector has higher effect size heterogeneity than another vector (of the same sparsity) if the nonzero entries of the former are more heterogeneous than those of the latter in terms of magnitudes. From the perspective of this new measure, we prove that in a regime of linear sparsity, false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximum in the sense that all nonzero effect sizes have very differentmagnitudes, and the worst-case trade-off is achieved when it is minimum in the sense that allnonzero effect sizes are about equal. Moreover, we demonstrate that the Lasso path produces anoptimal ranking of explanatory variables in terms of the rank of the first false variable when the effect size heterogeneity is maximum, and vice versa. Metaphorically, these two findings suggest that variables with comparable effect sizes—no matter how large they are—would compete with each other along the Lasso path, leading to an increased hardness of the variable selection problem. Our proofs use techniques from approximate message passing theory as well as a novel argument for estimating the rank of the first false variable.
[-]
In high-dimensional regression, the number of explanatory variables with nonzero effects - often referred to as sparsity - is an important measure of the difficulty of the variable selection problem. As a complement to sparsity, this paper introduces a new measure termed effect size heterogeneity for a finer-grained understanding of the trade-off between type I and type II errorsor, equivalently, false and true positive rates using the Lasso. ...
[+]
62F03 ; 62J07 ; 62J05
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
In this talk, we consider function-indexed normalized weighted integrated periodograms for equidistantly sampled multivariate continuous-time state space models which are multivariate continuous-time ARMA processes. Thereby, the sampling distance is fixed and the driving Lévy process has at least a finite fourth moment. Under different assumptions on the function space and the moments of the driving Lévy process we derive a central limit theorem for the function-indexed normalized weighted integrated periodogram. Either the assumption on the function space or the assumption on the existence of moments of the Lévy process is weaker. The results can be used to derive the asymptotic behavior of the Whittle estimator and to construct goodness-of-fit test statistics as the Grenander-Rosenblatt statistic and the Cramér-von Mises statistic.
[-]
In this talk, we consider function-indexed normalized weighted integrated periodograms for equidistantly sampled multivariate continuous-time state space models which are multivariate continuous-time ARMA processes. Thereby, the sampling distance is fixed and the driving Lévy process has at least a finite fourth moment. Under different assumptions on the function space and the moments of the driving Lévy process we derive a central limit theorem ...
[+]
62F03 ; 62F12 ; 62M10
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood ratio statistic, that we call “the split likelihood ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model-misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid p-values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.
[-]
We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood ratio statistic, that we call “the split likelihood ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood ratio statistic is often ...
[+]
62C05 ; 62F03 ; 62G10 ; 62L12
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
The highly influential two-group model in testing a large number of statistical hypotheses assumes that the test statistics are drawn independently from a mixture of a high probability null distribution and a low probability alternative. Optimal control of the marginal false discovery rate (mFDR), in the sense that it provides maximal power (expected true discoveries) subject to mFDR control, is known to be achieved by thresholding the local false discovery rate (locFDR), i.e., the probability of the hypothesis being null given the set of test statistics, with a fixed threshold.
We address the challenge of controlling optimally the popular false discovery rate (FDR) or positive FDR (pFDR) rather than mFDR in the general two-group model, which also allows for dependence between the test statistics. These criteria are less conservative than the mFDR criterion, so they make more rejections in expectation.
We derive their optimal multiple testing (OMT) policies, which turn out to be thresholding the locFDR with a threshold that is a function of the entire set of statistics. We develop an efficient algorithm for finding these policies, and use it for problems with thousands of hypotheses. We illustrate these procedures on gene expression studies.
[-]
The highly influential two-group model in testing a large number of statistical hypotheses assumes that the test statistics are drawn independently from a mixture of a high probability null distribution and a low probability alternative. Optimal control of the marginal false discovery rate (mFDR), in the sense that it provides maximal power (expected true discoveries) subject to mFDR control, is known to be achieved by thresholding the local ...
[+]
62F03 ; 62J15 ; 62P10
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
Multiple testing problems are a staple of modern statistics. The fundamental objective is to reject as many false null hypotheses as possible, subject to controlling an overall measure of false discovery, like family-wise error rate (FWER) or false discovery rate (FDR). We formulate multiple testing of simple hypotheses as an infinite-dimensional optimization problem, seeking the most powerful rejection policy which guarantees strong control of the selected measure. We show that for exchangeable hypotheses, for FWER or FDR and relevant notions of power, these problems lead to infinite programs that can provably be solved. We explore maximin rules for complex alternatives, and show they can be found in practice, leading to improved practical procedures compared to existing alternatives. We derive explicit optimal tests for FWER or FDR control for three independent normal means. We find that the power gain over natural competitors is substantial in all settings examined. We apply our optimal maximin rule to subgroup analyses in systematic reviews from the Cochrane library, leading to an increased number of findings compared to existing alternatives.
[-]
Multiple testing problems are a staple of modern statistics. The fundamental objective is to reject as many false null hypotheses as possible, subject to controlling an overall measure of false discovery, like family-wise error rate (FWER) or false discovery rate (FDR). We formulate multiple testing of simple hypotheses as an infinite-dimensional optimization problem, seeking the most powerful rejection policy which guarantees strong control of ...
[+]
62F03 ; 62J15 ; 62P10