CIRM - Videos & books Library

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood ratio statistic, that we call “the split likelihood ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model-misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid p-values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.[-]

We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood ratio statistic, that we call “the split likelihood ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood ratio statistic is often ...[+]

62C05 ; 62F03 ; 62G10 ; 62L12

Bookmarks Report an error

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Bookmarks Report an error

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Bookmarks Report an error

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Bookmarks Report an error

Déposez votre fichier ici pour le déplacer vers cet enregistrement.

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Bookmarks Report an error

TRUSTEES

INSTITUTIONAL PARTNERS

Destination de la recherche

Raccourcis

Documents 62C05 5 results

Universal inference using the split likelihood ratio test - Ramdas, Aaditya K. (Author of the conference) | CIRM H NEW

Reinforcement learning - lecture 1 - Lazaric, Allesandro (Author of the conference) | CIRM H NEW

Reinforcement learning - lecture 2 - Lazaric, Allesandro (Author of the conference) | CIRM H NEW

Reinforcement learning - lecture 3 - Lazaric, Allesandro (Author of the conference) | CIRM H NEW

Reinforcement learning - lecture 4 - Lazaric, Allesandro (Author of the conference) | CIRM H NEW