CIRM - Videos & books Library - Adaptive rates for trend filtering using dual certificates

Virtualconference

Auteurs : Van de Geer, Sara (Auteur de la Conférence)
CIRM (Editeur )

Loading the player...

Résumé : The theory for trend filtering has been developed in a series of papers: Tibshirani [2014], Wang et al. [2016], Sadhanala and Tibshirani [2019], Sadhanala et al. [2017], Guntuboyina et al. [2020], and see Tibshirani [2020] for a very good overview and further references. In this course we will combine the approach of Dalalyan et al. [2017] with dual certificates as given in Candes and Plan [2011]. This allows us to obtain new results and extensions.

The trend filtering problem in one dimension is as follows. Let $Y\sim \mathcal{N}(f^{0},\ I)$ be a vector of observations with unknown mean $f^{0} :=\mathrm{E}Y$. Let for $f\in \mathbb{R}^{n},$
$$
(\triangle f)_{i}\ :=\ f_{i}-f_{i-1},\ i\geq 2,
$$
$$
(\triangle^{2}f)_{i}\ :=\ f_{i}-2f_{i-1}+f_{i-2},\ i\geq 3,
$$
$$
(\triangle^{k}f)_{i}\ :=\ (\triangle(\triangle^{k-1}f))_{i},\ i\geq k+1.
$$
Then we consider the estimator
$$
\hat{f}\ :=f\min_{\in \mathbb{R}^{n}}\{\Vert Y-f\Vert_{2}^{2}/n+2\lambda\Vert\triangle^{k}f\Vert_{1}\}.
$$
Let $S_{0} :=\{j\ :\ (\triangle^{k}f^{0})_{j}\neq 0\}$ and $s_{0} :=|S_{0}|$ its size. We want to prove a "oracle'' type of result: for an appropriate choice of the tuning parameter, and modulo $\log$-factors, with high probability $\Vert\hat{f}-f^{0}\Vert_{2}^{2}/n \lesssim (s_{0}+1)/n$. For this one may apply the general theory for the Lasso. Indeed, the above is a Lasso estimator if we write $f=\Psi b$ where $\Psi\in \mathbb{R}^{n\times n}$ is the "design matrix'' or "dictionary'' and where $b_{j}=(\triangle^{k}f)_{j}, j\geq k+1$. We present the explicit expression for this dictionary and then will notice that the restricted eigenvalue conditions that are typically imposed for Lasso problems do not hold. What we will do instead is use a "dual certificate'' $q$ with index set $\mathcal{D} :=\{k+1,\ .\ .\ .\ ,\ n\}$. We require that $q_{j}=\mathrm{s}\mathrm{i}\mathrm{g}\mathrm{n}(\triangle^{k}f^{0})_{j}$ if $j\in S_{0}$ and such that $|q_{j}|\leq 1-w_{j}$ if $j\in \mathcal{D}\backslash S_{0}$, where $\{w\}_{j\in D\backslash S_{0}}$ is a given set of noise weights. Moreover, we require $q_{k+1}=q_{n}=0.$ We call such $q$ an interpolating vector. We show for an appropriate choice of $\lambda$
$$
\Vert\hat{f}-f^{0}\Vert_{2}^{2}/n<\sim(s_{0}+1)/n+n\lambda^{2}\Vert\triangle^{k}q\Vert_{2}^{2}
$$
and that the two terms on the right hand side can be up to $\log$ terms of the same order if the elements in $S_{0}$ corresponding to different signs satisfy a minimal distance condition.

We develop a non-asymptotic theory for the problem and refine the above to sharp oracle results. Moreover, the approach we use allows extensions to higher dimensions and to total variation on graphs. For the case of graphs with cycles the main issue is to determine the noise weights, which can be done by counting the number of times an edge is used when traveling from one node to another. Extensions to other loss functions will be considered as well.

Keywords : Lasso; total variation; dual certificates

Codes MSC :
62J05 - Linear regression
62J99 - None of the above but in this section

Ressources complémentaires :
https://www.cirm-math.fr/RepOrga/2536/Slides/MMS2020Lecture_vandeGeer.pdf

Informations sur la Vidéo

Réalisateur :

Hennenfent, Guillaume

Langue :

Date de publication :

Date de captation :

Sous collection :

arXiv category :

Domaine :

Format :

Durée :

Audience :

Download :

https://videos.cirm-math.fr/2020-12-17_De Geer 2.mp4

Informations sur la Rencontre

Nom de la rencontre : Meeting in Mathematical Statistics / Rencontres de statistique mathématique
Organisateurs de la rencontre : Butucea, Cristina ; Minsker, Stanislav ; Pouet, Christophe ; Spokoiny, Vladimir
Dates : 14/12/2020 - 18/12/2020
Année de la rencontre : 2020
URL Congrès : https://conferences.cirm-math.fr/2536.html

Données de citation

DOI : 10.24350/CIRM.V.19692403
Citer cette vidéo: Van de Geer, Sara (2020). Adaptive rates for trend filtering using dual certificates - Lecture 2. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19692403
URI : http://dx.doi.org/10.24350/CIRM.V.19692403

Voir aussi

[Virtualconference] Statistical optimality and algorithms for top-K and total ranking - Lecture 2 / Auteur de la Conférence Gao, Chao.
[Virtualconference] Adaptive rates for trend filtering using dual certificates - Lecture 1 / Auteur de la Conférence Van de Geer, Sara.
[Virtualconference] Statistical optimality and algorithms for top-K and total ranking - Lecture 1 / Auteur de la Conférence Gao, Chao.

Bibliographie

TIBSHIRANI, Ryan J., et al. Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics, 2014, vol. 42, no 1, p. 285-323. - http://dx.doi.org/10.1214/13-AOS1189

E. J. Candes and Y. Plan, "A Probabilistic and RIPless Theory of Compressed Sensing," in IEEE Transactions on Information Theory, vol. 57, no. 11, pp. 7235-7254, Nov. 2011, doi: 10.1109/TIT.2011.2161794. - https://doi.org/10.1109/TIT.2011.2161794

DALALYAN, Arnak S., HEBIRI, Mohamed, LEDERER, Johannes, et al. On the prediction performance of the lasso. Bernoulli, 2017, vol. 23, no 1, p. 552-581. - http://dx.doi.org/10.3150/15-BEJ756

GUNTUBOYINA, Adityanand, LIEU, Donovan, CHATTERJEE, Sabyasachi, et al. Adaptive risk bounds in univariate total variation denoising and trend filtering. The Annals of Statistics, 2020, vol. 48, no 1, p. 205-229. - http://dx.doi.org/10.1214/18-AOS1799

SADHANALA, Veeranjaneyulu, TIBSHIRANI, Ryan J., et al. Additive models with trend filtering. The Annals of Statistics, 2019, vol. 47, no 6, p. 3032-3068. - http://dx.doi.org/10.1214/19-AOS1833

SADHANALA, Veeranjaneyulu, WANG, Yu-Xiang, SHARPNACK, James L., et al. Higher-order total variation classes on grids: Minimax theory and trend filtering methods. In : Advances in Neural Information Processing Systems. 2017. p. 5800-5810. - https://papers.nips.cc/paper/2017/file/3e60e09c222f206c725385f53d7e567c-Paper.pdf

TIBSHIRANI, Ryan J. Divided Differences, Falling Factorials, and Discrete Splines: Another Look at Trend Filtering and Related Problems. arXiv preprint arXiv:2003.03886, 2020. - https://arxiv.org/abs/2003.03886

WANG, Yu-Xiang, SHARPNACK, James, SMOLA, Alexander J., et al. Trend filtering on graphs. The Journal of Machine Learning Research, 2016, vol. 17, no 1, p. 3651-3691. - https://dl.acm.org/doi/pdf/10.5555/2946645.3007058

Van-de-Geer2.jpg

Player

Sélection Signaler une erreur

TUTELLES

PARTENAIRES

Destination de la recherche

Raccourcis

Voir aussi