Auteurs : Lazaric, Allesandro (Auteur de la conférence)
CIRM (Editeur )
Résumé :
Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.
Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL
Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL
Codes MSC :
62C05
- General considerations
68T05
- Learning and adaptive systems
90C15
- Stochastic programming
68Q87
- Probability in computer science (algorithm analysis, random structures, phase transitions, etc.)
93B47
- Iterative learning control
|
Informations sur la Rencontre
Nom de la Rencontre : Mathematics, Signal Processing and Learning / Mathématiques, traitement du signal et apprentissage Organisateurs de la Rencontre : Anthoine, Sandrine ; Chaux, Caroline ; Mélot, Clothilde ; Richard, Frédéric Dates : 25/01/2021 - 29/01/2021
Année de la rencontre : 2021
URL de la Rencontre : https://conferences.cirm-math.fr/2472.html
DOI : 10.24350/CIRM.V.19705103
Citer cette vidéo:
Lazaric, Allesandro (2021). Reinforcement learning - lecture 3. CIRM. Audiovisual resource. doi:10.24350/CIRM.V.19705103
URI : http://dx.doi.org/10.24350/CIRM.V.19705103
|
Voir Aussi
-
[Multi angle]
Teasing poster: mathematics, signal processing and learning
/ Auteur de la conférence Antonsanti, Pierre-Louis ; Auteur de la conférence Belotto Da Silva, André ; Auteur de la conférence Cano, Cyril ; Auteur de la conférence Cohen, Jeremy ; Auteur de la conférence Doz, Cyprien ; Auteur de la conférence Lazzaretti, Marta ; Auteur de la conférence Pilavci, Yusuf Yigit ; Auteur de la conférence Rodriguez, Willy ; Auteur de la conférence Stergiopoulou, Vasiliki ; Auteur de la conférence Kaloga, Yacouba ; Auteur de la conférence Safaa, Al-Ali.
-
[Virtualconference]
Optimization - lecture 4
/ Auteur de la conférence Pustelnik, Nelly.
-
[Virtualconference]
Optimization - lecture 3
/ Auteur de la conférence Pustelnik, Nelly.
-
[Virtualconference]
Optimization - lecture 2
/ Auteur de la conférence Pustelnik, Nelly.
-
[Virtualconference]
Optimization - lecture 1
/ Auteur de la conférence Pustelnik, Nelly.
-
[Multi angle]
One signal processing view on deep learning - lecture 2
/ Auteur de la conférence Oyallon, Edouard.
-
[Multi angle]
One signal processing view on deep learning - lecture 1
/ Auteur de la conférence Oyallon, Edouard.
-
[Virtualconference]
Signal processing tutorial - part 2
/ Auteur de la conférence Oudre, Laurent.
-
[Virtualconference]
Signal processing tutorial - part 1
/ Auteur de la conférence Oudre, Laurent.
-
[Virtualconference]
Reinforcement learning - lecture 4
/ Auteur de la conférence Lazaric, Allesandro.
-
[Virtualconference]
Reinforcement learning - lecture 2
/ Auteur de la conférence Lazaric, Allesandro.
-
[Virtualconference]
Reinforcement learning - lecture 1
/ Auteur de la conférence Lazaric, Allesandro.
-
[Multi angle]
Basics in machine learning - practical session 2
/ Auteur de la conférence Clausel, Marianne.
-
[Multi angle]
Basics in machine learning - practical session 1
/ Auteur de la conférence Clausel, Marianne.
-
[Multi angle]
Basics in machine learning - lecture 2
/ Auteur de la conférence Clausel, Marianne.
-
[Multi angle]
Basics in machine learning - lecture 1
/ Auteur de la conférence Clausel, Marianne.
Bibliographie