CIRM - Videos & books Library

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Optimal vector quantization has been originally introduced in Signal processing as a discretization method of random signals, leading to an optimal trade-off between the speed of transmission and the quality of the transmitted signal. In machine learning, similar methods applied to a dataset are the historical core of unsupervised classification methods known as “clustering”. In both case it appears as an optimal way to produce a set of weighted prototypes (or codebook) which makes up a kind of skeleton of a dataset, a signal and more generally, from a mathematical point of view, of a probability distribution.
Quantization has encountered in recent years a renewed interest in various application fields like automatic classification, learning algorithms, optimal stopping and stochastic control, Backward SDEs and more generally numerical probability. In all these various applications, practical implementation of such clustering/quantization methods more or less rely on two procedures (and their countless variants): the Competitive Learning Vector Quantization $(CLV Q)$ which appears as a stochastic gradient descent derived from the so-called distortion potential and the (randomized) Lloyd's procedure (also known as k- means algorithm, nu ees dynamiques) which is but a fixed point search procedure. Batch version of those procedures can also be implemented when dealing with a dataset (or more generally a discrete distribution).
In a more formal form, if is probability distribution on an Euclidean space $\mathbb{R}^d$, the optimal quantization problem at level $N$ boils down to exhibiting an $N$-tuple $(x_{1}^{*}, . . . , x_{N}^{*})$, solution to

argmin$_{(x1,\dotsb,x_N)\epsilon(\mathbb{R}^d)^N} \int_{\mathbb{R}^d 1\le i\le N} \min |x_i-\xi|^2 \mu(d\xi)$

and its distribution i.e. the weights $(\mu(C(x_{i}^{*}))_{1\le i\le N}$ where $(C(x_{i}^{*})$ is a (Borel) partition of $\mathbb{R}^d$ satisfying

$C(x_{i}^{*})\subset \lbrace\xi\epsilon\mathbb{R}^d :|x_{i}^{*} -\xi|\le_{1\le j\le N} \min |x_{j}^{*}-\xi|\rbrace$.

To produce an unsupervised classification (or clustering) of a (large) dataset $(\xi_k)_{1\le k\le n}$, one considers its empirical measure

$\mu=\frac{1}{n}\sum_{k=1}^{n}\delta_{\xi k}$

whereas in numerical probability $\mu = \mathcal{L}(X)$ where $X$ is an $\mathbb{R}^d$-valued simulatable random vector. In both situations, $CLV Q$ and Lloyd's procedures rely on massive sampling of the distribution $\mu$.
As for clustering, the classification into $N$ clusters is produced by the partition of the dataset induced by the Voronoi cells $C(x_{i}^{*}), i = 1, \dotsb, N$ of the optimal quantizer.
In this second case, which is of interest for solving non linear problems like Optimal stopping problems (variational inequalities in terms of PDEs) or Stochastic control problems (HJB equations) in medium dimensions, the idea is to produce a quantization tree optimally fitting the dynamics of (a time discretization) of the underlying structure process.
We will explore (briefly) this vast panorama with a focus on the algorithmic aspects where few theoretical results coexist with many heuristics in a burgeoning literature. We will present few simulations in two dimensions.[-]

Optimal vector quantization has been originally introduced in Signal processing as a discretization method of random signals, leading to an optimal trade-off between the speed of transmission and the quality of the transmitted signal. In machine learning, similar methods applied to a dataset are the historical core of unsupervised classification methods known as “clustering”. In both case it appears as an optimal way to produce a set of weighted ...[+]

62L20 ; 93E25 ; 94A12 ; 91G60 ; 65C05

Sélection Signaler une erreur

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Compared to artificial neural networks (ANNs), the brain seems to learn faster, generalize better to new situations and consumes much less energy. ANNs are motivated by the functioning of the brain but differ in several crucial aspects. While ANNs are deterministic, biological neural networks (BNNs) are stochastic. Moreover, it is biologically implausible that the learning of the brain is based on gradient descent. In the past years, statistical theory for artificial neural networks has been developed. The idea now is to extend this to biological neural networks, as the future of AI is likely to draw even more inspiration from biology. In this lecture series we will survey the challenges and present some first statistical risk bounds for different biologically inspired learning rules.[-]

Compared to artificial neural networks (ANNs), the brain seems to learn faster, generalize better to new situations and consumes much less energy. ANNs are motivated by the functioning of the brain but differ in several crucial aspects. While ANNs are deterministic, biological neural networks (BNNs) are stochastic. Moreover, it is biologically implausible that the learning of the brain is based on gradient descent. In the past years, statistical ...[+]

62L20 ; 62J05

Sélection Signaler une erreur

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We investigate a method based on risk minimization to hedge observable but non-tradable source of risk on financial or energy markets. The optimal portfolio strategy is obtained by minimizing dynamically the Conditional Value-at-Risk (CVaR) using three main tools: a stochastic approximation algorithm, optimal quantization and variance reduction techniques (importance sampling (IS) and linear control variable (LCV)) as the quantities of interest are naturally related to rare events. We illustrate our approach by considering several portfolios in connection with energy markets.

Keywords : VaR, CVaR, Stochastic Approximation, Robbins-Monro algorithm, Quantification[-]

We investigate a method based on risk minimization to hedge observable but non-tradable source of risk on financial or energy markets. The optimal portfolio strategy is obtained by minimizing dynamically the Conditional Value-at-Risk (CVaR) using three main tools: a stochastic approximation algorithm, optimal quantization and variance reduction techniques (importance sampling (IS) and linear control variable (LCV)) as the quantities of interest ...[+]

91G70 ; 91B30 ; 62L20

Sélection Signaler une erreur

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are $O(1/\sqrt{n})$ for general convex functions and reaches $O(1/n)$ for strongly-convex functions. In this tutorial, I will first present the classical results in stochastic approximation and relate them to classical optimization and statistics results. I will then show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of $O(1/n)$ without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent.[-]

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes ...[+]

62L20 ; 68T05 ; 90C06 ; 90C25

Sélection Signaler une erreur

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are $O(1/\sqrt{n})$ for general convex functions and reaches $O(1/n)$ for strongly-convex functions. In this tutorial, I will first present the classical results in stochastic approximation and relate them to classical optimization and statistics results. I will then show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of $O(1/n)$ without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent.[-]

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes ...[+]

62L20 ; 68T05 ; 90C06 ; 90C25

Sélection Signaler une erreur

TUTELLES

PARTENAIRES

Destination de la recherche

Raccourcis

Documents 62L20 5 résultats

Optimal vector quantization: from signal processing to clustering and numerical probability - Pagès, Gilles (Auteur de la Conférence) | CIRM H Nouveau

Statistical learning in biological neural networks - Schmidt-Hieber, Johannes (Auteur de la Conférence) | CIRM H Nouveau

CVaR hedging using quantization based stochastic approximation algorithm - Pagès, Gilles (Auteur de la Conférence) | CIRM Nouveau

Large-scale machine learning and convex optimization 1/2 - Bach, Francis (Auteur de la Conférence) | CIRM H Nouveau

Large-scale machine learning and convex optimization 2/2 - Bach, Francis (Auteur de la Conférence) | CIRM H Nouveau