Déposez votre fichier ici pour le déplacer vers cet enregistrement.

## Multi angle  CVaR hedging using quantization based stochastic approximation algorithm Pagès, Gilles (Auteur de la Conférence) | CIRM (Editeur )

We investigate a method based on risk minimization to hedge observable but non-tradable source of risk on financial or energy markets. The optimal portfolio strategy is obtained by minimizing dynamically the Conditional Value-at-Risk (CVaR) using three main tools: a stochastic approximation algorithm, optimal quantization and variance reduction techniques (importance sampling (IS) and linear control variable (LCV)) as the quantities of interest are naturally related to rare events. We illustrate our approach by considering several portfolios in connection with energy markets.

Keywords : VaR, CVaR, Stochastic Approximation, Robbins-Monro algorithm, Quantification
We investigate a method based on risk minimization to hedge observable but non-tradable source of risk on financial or energy markets. The optimal portfolio strategy is obtained by minimizing dynamically the Conditional Value-at-Risk (CVaR) using three main tools: a stochastic approximation algorithm, optimal quantization and variance reduction techniques (importance sampling (IS) and linear control variable (LCV)) as the quantities of interest ...

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

## Multi angle  Large-scale machine learning and convex optimization 1/2 Bach, Francis (Auteur de la Conférence) | CIRM (Editeur )

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are $O(1/\sqrt{n})$ for general convex functions and reaches $O(1/n)$ for strongly-convex functions. In this tutorial, I will first present the classical results in stochastic approximation and relate them to classical optimization and statistics results. I will then show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of $O(1/n)$ without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes ...

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

## Multi angle  Large-scale machine learning and convex optimization 2/2 Bach, Francis (Auteur de la Conférence) | CIRM (Editeur )

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are $O(1/\sqrt{n})$ for general convex functions and reaches $O(1/n)$ for strongly-convex functions. In this tutorial, I will first present the classical results in stochastic approximation and relate them to classical optimization and statistics results. I will then show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of $O(1/n)$ without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes ...

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

## Multi angle  Optimal vector quantization: from signal processing to clustering and numerical probability Pagès, Gilles (Auteur de la Conférence) | CIRM (Editeur )

Optimal vector quantization has been originally introduced in Signal processing as a discretization method of random signals, leading to an optimal trade-off between the speed of transmission and the quality of the transmitted signal. In machine learning, similar methods applied to a dataset are the historical core of unsupervised classification methods known as “clustering”. In both case it appears as an optimal way to produce a set of weighted prototypes (or codebook) which makes up a kind of skeleton of a dataset, a signal and more generally, from a mathematical point of view, of a probability distribution.
Quantization has encountered in recent years a renewed interest in various application fields like automatic classification, learning algorithms, optimal stopping and stochastic control, Backward SDEs and more generally numerical probability. In all these various applications, practical implementation of such clustering/quantization methods more or less rely on two procedures (and their countless variants): the Competitive Learning Vector Quantization $(CLV Q)$ which appears as a stochastic gradient descent derived from the so-called distortion potential and the (randomized) Lloyd's procedure (also known as k- means algorithm, nu ees dynamiques) which is but a fixed point search procedure. Batch version of those procedures can also be implemented when dealing with a dataset (or more generally a discrete distribution).
In a more formal form, if is probability distribution on an Euclidean space $\mathbb{R}^d$, the optimal quantization problem at level $N$ boils down to exhibiting an $N$-tuple $(x_{1}^{*}, . . . , x_{N}^{*})$, solution to

argmin$_{(x1,\dotsb,x_N)\epsilon(\mathbb{R}^d)^N} \int_{\mathbb{R}^d 1\le i\le N} \min |x_i-\xi|^2 \mu(d\xi)$

and its distribution i.e. the weights $(\mu(C(x_{i}^{*}))_{1\le i\le N}$ where $(C(x_{i}^{*})$ is a (Borel) partition of $\mathbb{R}^d$ satisfying

$C(x_{i}^{*})\subset \lbrace\xi\epsilon\mathbb{R}^d :|x_{i}^{*} -\xi|\le_{1\le j\le N} \min |x_{j}^{*}-\xi|\rbrace$.

To produce an unsupervised classification (or clustering) of a (large) dataset $(\xi_k)_{1\le k\le n}$, one considers its empirical measure

$\mu=\frac{1}{n}\sum_{k=1}^{n}\delta_{\xi k}$

whereas in numerical probability $\mu = \mathcal{L}(X)$ where $X$ is an $\mathbb{R}^d$-valued simulatable random vector. In both situations, $CLV Q$ and Lloyd's procedures rely on massive sampling of the distribution $\mu$.
As for clustering, the classification into $N$ clusters is produced by the partition of the dataset induced by the Voronoi cells $C(x_{i}^{*}), i = 1, \dotsb, N$ of the optimal quantizer.
In this second case, which is of interest for solving non linear problems like Optimal stopping problems (variational inequalities in terms of PDEs) or Stochastic control problems (HJB equations) in medium dimensions, the idea is to produce a quantization tree optimally fitting the dynamics of (a time discretization) of the underlying structure process.
We will explore (briefly) this vast panorama with a focus on the algorithmic aspects where few theoretical results coexist with many heuristics in a burgeoning literature. We will present few simulations in two dimensions.
Optimal vector quantization has been originally introduced in Signal processing as a discretization method of random signals, leading to an optimal trade-off between the speed of transmission and the quality of the transmitted signal. In machine learning, similar methods applied to a dataset are the historical core of unsupervised classification methods known as “clustering”. In both case it appears as an optimal way to produce a set ...

Z