En poursuivant votre navigation sur ce site, vous acceptez l'utilisation d'un simple cookie d'identification. Aucune autre exploitation n'est faite de ce cookie. OK

Documents 68T99 6 results

Filter
Select: All / None
Q
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.[-]
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a ...[+]

65K10 ; 65K05 ; 68W99 ; 68T99

Bookmarks Report an error
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

What does back propagation compute? - Pauwels, Edouard (Author of the conference) | CIRM H

Multi angle

We are interested in nonsmooth analysis of backpropagation as implemented in modern machine learning librairies, such as Tensorflow or Pytorch. First I will illustrate how blind application of
differential calculus to nonsmooth objects can be problematic, requiring a proper mathematical model.
Then I will introduce a weak notion of generalized derivative, named conservativity, and illustrate how it complies with calculus and optimization for well structured objects. We provide stability results for empirical risk minimization similar as in the smooth setting for the combination of nonsmooth automatic differentiation, minibatch stochastic approximation and first order optimization. This is joint work with Jérôme Bolte.[-]
We are interested in nonsmooth analysis of backpropagation as implemented in modern machine learning librairies, such as Tensorflow or Pytorch. First I will illustrate how blind application of
differential calculus to nonsmooth objects can be problematic, requiring a proper mathematical model.
Then I will introduce a weak notion of generalized derivative, named conservativity, and illustrate how it complies with calculus and optimization for ...[+]

65K05 ; 65K10 ; 68T99

Bookmarks Report an error
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Machine learning on graphs - Vandergheynst, Pierre (Author of the conference) | CIRM H

Multi angle

There are a plethora of interesting applications that can leverage graph structured data, from drug discovery to route planning, and it is only natural that graph Machine Learning has attracted a lot of attention lately. We will review approaches in graph representation learning, leveraging intuition from graph signal processing to design and study graph neural networks and some of their recent extensions.

05C90 ; 05C50 ; 68T99

Bookmarks Report an error
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Enhancing sampling with learned transport maps - Gabrié, Marylou (Author of the conference) | CIRM H

Multi angle

Deep generative models parametrize very flexible families of distributions able to fit complicated datasets of images or text. These models provide independent samples from complex high-distributions at negligible costs. On the other hand, sampling exactly a target distribution, such as Boltzmann distributions and Bayesian posteriors is typically challenging: either because of dimensionality, multi-modality, ill-conditioning or a combination of the previous. In this talk, I will review recent works trying to enhance traditional inference and sampling algorithms with learning. I will present in particular flowMC, an adaptive MCMC with Normalizing Flows along with first applications and remaining challenges.[-]
Deep generative models parametrize very flexible families of distributions able to fit complicated datasets of images or text. These models provide independent samples from complex high-distributions at negligible costs. On the other hand, sampling exactly a target distribution, such as Boltzmann distributions and Bayesian posteriors is typically challenging: either because of dimensionality, multi-modality, ill-conditioning or a combination of ...[+]

68T99 ; 82B80 ; 62F15

Bookmarks Report an error
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

The linear algebra of Large Language Models - Saad, Yousef (Author of the conference) | CIRM H

Multi angle

In an era where Artificial Intelligence (AI) is permeating virtuallly every single field of science and engineering, it is becoming critical to members of the numerical linear algebra community to understand and embrace AI , and to contribute to its advancement, and more broadly to the advancement of machine learning. What is fascinating and rather encouraging is that Numerical Linear Algebra (NLA) is at the core of machine learning and AI. In this talk we will give an overview of Deep Learning with an emphasis on Large Language Models (LLMs) and Transformers [3, 4]. The very first step of LLMs is to convert the problem into one that can he exploited by numerical methods, or to be more accurate, by optimization techniques. All AI methods rely almost entirely on essentially 4 ingredients: data, optimization methods, statistical intuition, and linear algebra. Thus, the first task is to map words or sentences into tokens which are then imbedded into Euclidean spaces. From there on, the models refer to vectors and matrices. We will show a few examples of important developments in ML, that were heavily based on linear algebra ideas. Among these, we will briefly discuss LoRa [1] a technique in which low-rank approximation was used to reduce computational cost in some models, leading to gains of a few orders of magnitude. Another contribution that used purely algebraic arguments and that had a major impact on LLMs is the article [2]. Here the main discovery is that the nonlinear ""self-attention"" in LLMs can be approximated linearly, resulting in huge savings in computations, as the computational complexity was decreased from $O\left(n^2\right)$ to $O(n)$.The talk will be mostly a survey of known recent methods in AI with the primary goal of unraveling the mathematics of Transformers. A secondary goal is to initiate a discussion on the issue of how NLA specialitst can participate in AI research.[-]
In an era where Artificial Intelligence (AI) is permeating virtuallly every single field of science and engineering, it is becoming critical to members of the numerical linear algebra community to understand and embrace AI , and to contribute to its advancement, and more broadly to the advancement of machine learning. What is fascinating and rather encouraging is that Numerical Linear Algebra (NLA) is at the core of machine learning and AI. In ...[+]

65F99 ; 68T99

Bookmarks Report an error
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y
Modern machine learning architectures often embed their inputs into a lower-dimensional latent space before generating a final output. A vast set of empirical results---and some emerging theory---predicts that these lower-dimensional codes often are highly structured, capturing lower-dimensional variation in the data. Based on this observation, in this talk I will describe efforts in my group to develop lightweight algorithms that navigate, restructure, and reshape learned latent spaces. Along the way, I will consider a variety of practical problems in machine learning, including low-rank adaptation of large models, regularization to promote local latent structure, and efficient training/evaluation of generative models.[-]
Modern machine learning architectures often embed their inputs into a lower-dimensional latent space before generating a final output. A vast set of empirical results---and some emerging theory---predicts that these lower-dimensional codes often are highly structured, capturing lower-dimensional variation in the data. Based on this observation, in this talk I will describe efforts in my group to develop lightweight algorithms that navigate, ...[+]

62E20 ; 62F99 ; 62G07 ; 62P30 ; 65C50 ; 68T99

Bookmarks Report an error