Authors : Janson, Lucas (Author of the conference)
CIRM (Publisher )
Abstract :
Many modern applications seek to understand the relationship between an outcome variable of interest and a high-dimensional set of covariates. Often the first question asked is which covariates are important in this relationship, but the immediate next question, which in fact subsumes the first, is \emph{how} important each covariate is in this relationship. In parametric regression this question is answered through confidence intervals on the parameters. But without making substantial assumptions about the relationship between the outcome and the covariates, it is unclear even how to \emph{measure} variable importance, and for most sensible choices even less clear how to provide inference for it under reasonable conditions. In this paper we propose \emph{floodgate}, a novel method to provide asymptotic inference for a scalar measure of variable importance which we argue has universal appeal, while assuming nothing but moment bounds about the relationship between the outcome and the covariates. We take a model-X approach and thus assume the covariate distribution is known, but extend floodgate to the setting that only a \emph{model} for the covariate distribution is known and also quantify its robustness to violations of the modeling assumptions. We demonstrate floodgate's performance through extensive simulations and apply it to data from the UK Biobank to quantify the effects of genetic mutations on traits of interest.
Keywords : Variable importance; effect size; model-X; heterogeneous treatment effects; heritability
MSC Codes :
62G15
- Tolerance and confidence regions
62G20
- Nonparametric asymptotic efficiency
Additional resources :
https://www.cirm-math.com/uploads/2/6/6/0/26605521/janson.pdf
Film maker : Hennenfent, Guillaume
Language : English
Available date : 15/06/2020
Conference Date : 05/06/2020
Subseries : Research talks
arXiv category : Statistics ; Methodology
Mathematical Area(s) : Probability & Statistics
Format : MP4 (.mp4) - HD
Video Time : 00:48:44
Targeted Audience : Researchers
Download : https://videos.cirm-math.fr/2020-06-05_Janson.mp4
|
Event Title : Mathematical Methods of Modern Statistics 2 / Méthodes mathématiques en statistiques modernes 2 Event Organizers : Bogdan, Malgorzata ; Graczyk, Piotr ; Panloup, Fabien ; Proïa, Frédéric ; Roquain, Etienne Dates : 15/06/2020 - 19/06/2020
Event Year : 2020
Event URL : https://www.cirm-math.com/cirm-virtual-...
DOI : 10.24350/CIRM.V.19641303
Cite this video as:
Janson, Lucas (2020). Floodgate: inference for model-free variable importance. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19641303
URI : http://dx.doi.org/10.24350/CIRM.V.19641303
|
See Also
-
[Virtualconference]
Experimenting in equilibrium
/ Author of the conference Wager, Stefan.
-
[Virtualconference]
Structure learning for CTBN's
/ Author of the conference Miasojedow, Błażej.
-
[Virtualconference]
The price of competition: effect size heterogeneity matters in high dimensions!
/ Author of the conference Wang, Hua.
-
[Virtualconference]
Scaling of scoring rules
/ Author of the conference Wallin, Jonas.
-
[Virtualconference]
Hierarchical bayes modeling for large-scale inference
/ Author of the conference Yekutieli, Daniel.
-
[Virtualconference]
Change: detection, estimation, segmentation
/ Author of the conference Siegmund, David.
-
[Virtualconference]
High-dimensional, multiscale online changepoint detection
/ Author of the conference Samworth, Richard.
-
[Virtualconference]
The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation
/ Author of the conference Salmon, Joseph.
-
[Virtualconference]
Knockoff genotypes: value in counterfeit
/ Author of the conference Sabatti, Chiara.
-
[Virtualconference]
Optimal and maximin procedures for multiple testing problems
/ Author of the conference Rosset, Saharon.
-
[Virtualconference]
Sparse multiple testing: can one estimate the null distribution ?
/ Author of the conference Roquain, Etienne.
-
[Virtualconference]
Bayesian spatial adaptation
/ Author of the conference Rockova, Veronika.
-
[Virtualconference]
Universal inference using the split likelihood ratio test
/ Author of the conference Ramdas, Aaditya K..
-
[Virtualconference]
How to estimate a density on a spider web ?
/ Author of the conference Picard, Dominique.
-
[Virtualconference]
Post hoc bounds on false positives using reference families
/ Author of the conference Neuvial, Pierre.
-
[Virtualconference]
Quasi logistic distributions and Gaussian scale mixing
/ Author of the conference Letac, Gerard.
-
[Virtualconference]
Shrinkage estimation of mean for complex multivariate normal distribution with unknown covariance when p > n
/ Author of the conference Konno, Yoshihiko.
-
[Virtualconference]
Treatment effect estimation with missing attributes
/ Author of the conference Josse, Julie.
-
[Virtualconference]
On Cholesky structures on real symmetric matrices and their applications
/ Author of the conference Ishi, Hideyuki.
-
[Virtualconference]
Optimal control of false discovery criteria in the general two-group model
/ Author of the conference Heller, Ruth.
-
[Virtualconference]
Isotonic Distributional Regression (IDR) - leveraging monotonicity, uniquely so!
/ Author of the conference Gneiting, Tilmann.
-
[Virtualconference]
De-biasing arbitrary convex regularizers and asymptotic normality
/ Author of the conference Bellec, Pierre C..
-
[Virtualconference]
Consistent model selection criteria and goodness-of-fit test for common time series models
/ Author of the conference Bardet, Jean-Marc.
-
[Virtualconference]
High-dimensional classification by sparse logistic regression
/ Author of the conference Abramovich, Felix.
Bibliography
- BERK, Richard, BROWN, Lawrence, BUJA, Andreas, et al. Valid post-selection inference. The Annals of Statistics, 2013, vol. 41, no 2, p. 802-837. - http://dx.doi.org/10.1214/12-AOS1077
- BÜHLMANN, Peter, et al. Statistical significance in high-dimensional linear models. Bernoulli, 2013, vol. 19, no 4, p. 1212-1242. - http://dx.doi.org/10.3150/12-BEJSP11
- BÜHLMANN, Peter, VAN DE GEER, Sara, et al. High-dimensional inference in misspecified linear models. Electronic Journal of Statistics, 2015, vol. 9, no 1, p. 1449-1473. - http://dx.doi.org/10.1214/15-EJS1041
- BUJA, Andreas, BERK, Richard A., BROWN, Lawrence D., et al. Models as approximations-a conspiracy of random regressors and model deviations against classical inference in regression. Statistical Science, 2015, p. 1. - https://crim.sas.upenn.edu/sites/default/files/2015-9.0_Berk_ModelsAsApproximations%281%29.pdf
- BUJA, Andeas et BROWN, Larry. Discussion:" a significance test for the lasso". The Annals of Statistics, 2014, vol. 42, no 2, p. 509-517. - http://dx.doi.org/10.1214/14-AOS1175F
- BUJA, Andreas, BROWN, Lawrence, BERK, Richard, et al. Models as Approximations I: Consequences Illustrated with Linear Regression. Statistical Science, 2019, vol. 34, no 4, p. 523-544. - http://dx.doi.org/10.1214/18-STS693