Authors : Pagès, Gilles (Author of the conference)
CIRM (Publisher )
Abstract :
Optimal vector quantization has been originally introduced in Signal processing as a discretization method of random signals, leading to an optimal trade-off between the speed of transmission and the quality of the transmitted signal. In machine learning, similar methods applied to a dataset are the historical core of unsupervised classification methods known as “clustering”. In both case it appears as an optimal way to produce a set of weighted prototypes (or codebook) which makes up a kind of skeleton of a dataset, a signal and more generally, from a mathematical point of view, of a probability distribution.
Quantization has encountered in recent years a renewed interest in various application fields like automatic classification, learning algorithms, optimal stopping and stochastic control, Backward SDEs and more generally numerical probability. In all these various applications, practical implementation of such clustering/quantization methods more or less rely on two procedures (and their countless variants): the Competitive Learning Vector Quantization $(CLV Q)$ which appears as a stochastic gradient descent derived from the so-called distortion potential and the (randomized) Lloyd's procedure (also known as k- means algorithm, nu ees dynamiques) which is but a fixed point search procedure. Batch version of those procedures can also be implemented when dealing with a dataset (or more generally a discrete distribution).
In a more formal form, if is probability distribution on an Euclidean space $\mathbb{R}^d$, the optimal quantization problem at level $N$ boils down to exhibiting an $N$-tuple $(x_{1}^{*}, . . . , x_{N}^{*})$, solution to
argmin$_{(x1,\dotsb,x_N)\epsilon(\mathbb{R}^d)^N} \int_{\mathbb{R}^d 1\le i\le N} \min |x_i-\xi|^2 \mu(d\xi)$
and its distribution i.e. the weights $(\mu(C(x_{i}^{*}))_{1\le i\le N}$ where $(C(x_{i}^{*})$ is a (Borel) partition of $\mathbb{R}^d$ satisfying
$C(x_{i}^{*})\subset \lbrace\xi\epsilon\mathbb{R}^d :|x_{i}^{*} -\xi|\le_{1\le j\le N} \min |x_{j}^{*}-\xi|\rbrace$.
To produce an unsupervised classification (or clustering) of a (large) dataset $(\xi_k)_{1\le k\le n}$, one considers its empirical measure
$\mu=\frac{1}{n}\sum_{k=1}^{n}\delta_{\xi k}$
whereas in numerical probability $\mu = \mathcal{L}(X)$ where $X$ is an $\mathbb{R}^d$-valued simulatable random vector. In both situations, $CLV Q$ and Lloyd's procedures rely on massive sampling of the distribution $\mu$.
As for clustering, the classification into $N$ clusters is produced by the partition of the dataset induced by the Voronoi cells $C(x_{i}^{*}), i = 1, \dotsb, N$ of the optimal quantizer.
In this second case, which is of interest for solving non linear problems like Optimal stopping problems (variational inequalities in terms of PDEs) or Stochastic control problems (HJB equations) in medium dimensions, the idea is to produce a quantization tree optimally fitting the dynamics of (a time discretization) of the underlying structure process.
We will explore (briefly) this vast panorama with a focus on the algorithmic aspects where few theoretical results coexist with many heuristics in a burgeoning literature. We will present few simulations in two dimensions.
MSC Codes :
62L20
- Stochastic approximation
65C05
- Monte Carlo methods
93E25
- Computational methods in stochastic control
94A12
- Signal theory (characterization, reconstruction, filtering, etc.)
91G60
- Numerical methods in mathematical finance
Additional resources :
http://smai.emath.fr/cemracs/cemracs17/Slides/pages.pdf
Film maker : Hennenfent, Guillaume
Language : English
Available date : 31/07/17
Conference Date : 19/07/17
Subseries : Research School
arXiv category : Probability ; Numerical Analysis ; Computer Science
Mathematical Area(s) : Probability & Statistics ; Numerical Analysis & Scientific Computing ; Computer Science
Format : MP4 (.mp4) - HD
Video Time : 01:51:24
Targeted Audience : Researchers ; Graduate Students
Download : https://videos.cirm-math.fr/2017-07-19_Pages.mp4
|
Event Title : CEMRACS - Summer school: Numerical methods for stochastic models: control, uncertainty quantification, mean-field / CEMRACS - École d'été : Méthodes numériques pour équations stochastiques : contrôle, incertitude, champ moyen Event Organizers : Bouchard, Bruno ; Chassagneux, Jean-François ; Delarue, François ; Gobet, Emmanuel ; Lelong, Jérôme Dates : 17/07/17 - 25/08/17
Event Year : 2017
Event URL : http://conferences.cirm-math.fr/1556.html
DOI : 10.24350/CIRM.V.19199603
Cite this video as:
Pagès, Gilles (2017). Optimal vector quantization: from signal processing to clustering and numerical probability. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19199603
URI : http://dx.doi.org/10.24350/CIRM.V.19199603
|
See Also
-
[Multi angle]
Forward and backward simulation of Euler scheme
/ Author of the conference Gobet, Emmanuel.
-
[Multi angle]
Stochastic variational inequalities for random mechanics
/ Author of the conference Mertz, Laurent.
-
[Multi angle]
Mean field type control with congestion
/ Author of the conference Laurière, Mathieu.
-
[Multi angle]
On the discretization of some nonlinear Fokker-Planck-Kolmogorov equations and applications
/ Author of the conference Silva Álvarez, Francisco José.
-
[Multi angle]
Project evaluation under uncertainty
/ Author of the conference Zubelli, Jorge P..
-
[Multi angle]
Some asymptotic results about American options and volativity
/ Author of the conference De Marco, Stefano.
-
[Multi angle]
Projected particle methods for solving McKean Vlasov SDEs
/ Author of the conference Belomestny, Denis.
-
[Multi angle]
Splitting algorithm for nested events
/ Author of the conference Goudenège, Ludovic.
-
[Multi angle]
Lecture 2: Introduction to HPC - MPI: design of parallel program and MPI
/ Author of the conference Lelong, Jérôme.
-
[Multi angle]
Lecture 1: Introduction to HPC, random generation, and OpenMP
/ Author of the conference Lelong, Jérôme.
-
[Post-edited]
Numerical methods for mean field games - Lecture 2: Monotone finite difference schemes
/ Author of the conference Achdou, Yves.
-
[Multi angle]
Numerical methods for mean field games - Lecture 3: Variational MFG and related algorithms for solving the discrete system of nonlinear equations
/ Author of the conference Achdou, Yves.
-
[Multi angle]
Numerical methods for mean field games - Lecture 1: Introduction to the system of PDEs and its interpretation. Uniqueness of classical solutions
/ Author of the conference Achdou, Yves.
-
[Multi angle]
Metamodels for uncertainty quantification and reliability analysis
/ Author of the conference Marelli, Stefano.
-
[Multi angle]
Global sensitivity analysis in stochastic systems
/ Author of the conference Le Maître, Olivier.
-
[Multi angle]
Subsurface flow with uncertainty : applications and numerical analysis issues
/ Author of the conference Charrier, Julia.
-
[Multi angle]
Dynamic formulations of optimal transportation and variational MFGs
/ Author of the conference Benamou, Jean-David.
-
[Multi angle]
Least squares regression Monte Carlo for approximating BSDES and semilinear PDES
/ Author of the conference Turkedjiev, Plamen.
-
[Multi angle]
Bandits in auctions (& more)
/ Author of the conference Perchet, Vianney.
-
[Multi angle]
Multilevel and multi-index sampling methods with applications - Lecture 2: Multilevel and Multi-index Monte Carlo methods for the McKean-Vlasov equation
/ Author of the conference Tempone, Raul.
-
[Multi angle]
Multilevel and multi-index sampling methods with applications - Lecture 1: Adaptive strategies for Multilevel Monte Carlo
/ Author of the conference Tempone, Raul.
-
[Multi angle]
Model-free control and deep learning
/ Author of the conference Bellemare, Marc.
-
[Multi angle]
The Metropolis Hastings algorithm: introduction and optimal scaling of the transient phase
/ Author of the conference Jourdain, Benjamin.
-
[Multi angle]
Branching for PDEs
/ Author of the conference Warin, Xavier.
-
[Multi angle]
On the interplay between kinetic theory and game theory
/ Author of the conference Degond, Pierre.
-
[Multi angle]
Particle algorithm for McKean SDE: a short review on numerical analysis
/ Author of the conference Bossy, Mireille.
-
[Multi angle]
Cubature methods and applications
/ Author of the conference Crisan, Dan.
-
[Multi angle]
Capacity expansion games with application to competition in power generation investments
/ Author of the conference Aïd, René.
-
[Multi angle]
An introduction to BSDE
/ Author of the conference Imkeller, Peter.
-
[Multi angle]
Mean field games with major and minor players
/ Author of the conference Carmona, René.
Bibliography
- Duflo, M. (1996). Algorithmes stochastiques. Paris: Springer-Verlag - http://www.springer.com/fr/book/9783540606994
- Gersho, A., & Gray, R.M. (1992). Vector Quantization and Signal Compression. Boston: Kluwer Academic Publishers - http://dx.doi.org/10.1007/978-1-4615-3626-0
- Graf, S., & Luschgy, H. (2000). Foundations of quantization for probability distributions. Berlin: Springer - http://dx.doi.org/10.1007/BFb0103945
- Kushner, H., & Yin, G.G. (2003). Stochastic approximation and recursive algorithms and applications. 2nd ed. New York: Springer - http://dx.doi.org/10.1007/b97441
- Pagès, G. (2015). Introduction to vector quantization and its applications for numerics. ESAIM Proceedings and Surveys, 48, 29-79 - http://dx.doi.org/10.1051/proc/201448002
- Pagès, G., & Printems, J. (2009). Optimal quantization for finance: from random vectors to stochastic processes. In A. Bensoussan, & Q. Zhang (Eds.), Handbook of numerical analysis. XV, Special volume, Mathematical modeling and numerical methods in finance (pp. 595-649). Amsterdam: Elsevier/North-Holland - http://dx.doi.org/10.1016/S1570-8659(08)00015-x
- Pagès, G., & Wilbertz, B. (2011). Optimal Delaunay and Voronoi quantization schemes for pricing American style options. - https://hal.archives-ouvertes.fr/hal-00572709