En poursuivant votre navigation sur ce site, vous acceptez l'utilisation d'un simple cookie d'identification. Aucune autre exploitation n'est faite de ce cookie. OK

Documents 68Q87 8 résultats

Filtrer
Sélectionner : Tous / Aucun
Q
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

RNA secondary structures - Hofacker, Ivo (Auteur de la Conférence) | CIRM H

Multi angle

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Reinforcement learning - lecture 1 - Lazaric, Allesandro (Auteur de la Conférence) | CIRM H

Virtualconference

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]
Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Reinforcement learning - lecture 2 - Lazaric, Allesandro (Auteur de la Conférence) | CIRM H

Virtualconference

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]
Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Reinforcement learning - lecture 3 - Lazaric, Allesandro (Auteur de la Conférence) | CIRM H

Virtualconference

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]
Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Reinforcement learning - lecture 4 - Lazaric, Allesandro (Auteur de la Conférence) | CIRM H

Virtualconference

Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved impressive results in a variety of problems ranging from recommendation systems to computer games, often reaching human-competitive performance (e.g., in the Go game). In this course, we will review the mathematical foundations of RL and the most popular algorithmic strategies. In particular, we will build around the model of Markov decision processes (MDPs) to formalize the agent-environment interaction and ground RL algorithms into popular dynamic programming algorithms, such as value and policy iteration. We will study how such algorithms can be made online, incremental and how to integrate approximation techniques from the deep learning literature. Finally, we will discuss the problem of the exploration-exploitation dilemma in the simpler bandit scenario as well as in the full RL case. Across the course, we will try to identify the main current limitations of RL algorithms and the main open questions in the field.

Theoretical part
- Introduction to reinforcement learning (recent advances and current limitations)
- How to model a RL problem: Markov decision processes (MDPs)
- How to solve an MDP: Dynamic programming methods (value and policy iteration)
- How to solve an MDP from direct interaction: RL algorithms (Monte-Carlo, temporal difference, SARSA, Q-learning)
- How to solve an MDP with approximation (aka deep RL): value-based (e.g., DQN) and policy gradient methods (e.g., Reinforce, TRPO)
- How to efficiently explore an MDP: from bandit to RL

Practical part
- Simple example of value iteration and Q-learning
- More advanced example with policy gradient
- Simple bandit example for exploration
- More advanced example for exploration in RL[-]
Reinforcement learning (RL) studies the problem of learning how to optimally controlling a dynamical and stochastic environment. Unlike in supervised learning, a RL agent does not receive a direct supervision on which actions to take in order to maximize the longterm reward, and it needs to learn from the samples collected through direct interaction with the environment. RL algorithms combined with deep learning tools recently achieved ...[+]

68T05 ; 62C05 ; 68Q87 ; 90C15 ; 93B47

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Stabilising shifts of finite type with cellular automata - Taati, Siamak (Auteur de la Conférence) | CIRM H

Multi angle

We say that a CA F stabilises an SFT X if (1) every element of X is a fixed point of F, and (2) starting from any finite perturbation of a configuration in X, the CA returns to X in finitely many steps. Does every SFT admit a stabilising CA? If so, what is the optimal stabilisation time for a given SFT? Do conjugate SFTs have the same optimal stabilisation times? What about stabilisation from random perturbations? I will present a joint work with Nazim Fatès and Irène Marcovici providing (partial) answers to these questions.[-]
We say that a CA F stabilises an SFT X if (1) every element of X is a fixed point of F, and (2) starting from any finite perturbation of a configuration in X, the CA returns to X in finitely many steps. Does every SFT admit a stabilising CA? If so, what is the optimal stabilisation time for a given SFT? Do conjugate SFTs have the same optimal stabilisation times? What about stabilisation from random perturbations? I will present a joint work ...[+]

68Q80 ; 37B15 ; 37B10 ; 68Q87

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Random hyperbolic graphs - Kiwi, Marcos (Auteur de la Conférence) | CIRM H

Multi angle

Random hyperbolic graphs (RHG) were proposed rather recently (2010) as a model of real-world networks. Informally speaking, they are like random geometric graphs where the underlying metric space has negative curvature (i.e., is hyperbolic). In contrast to other models of complex networks, RHG simultaneously and naturally exhibit characteristics such as sparseness, small diameter, non-negligible clustering coefficient and power law degree distribution. We will give a slow pace introduction to RHG, explain why they have attracted a fair amount of attention and then survey most of what is known about this promising infant model of real-world networks.[-]
Random hyperbolic graphs (RHG) were proposed rather recently (2010) as a model of real-world networks. Informally speaking, they are like random geometric graphs where the underlying metric space has negative curvature (i.e., is hyperbolic). In contrast to other models of complex networks, RHG simultaneously and naturally exhibit characteristics such as sparseness, small diameter, non-negligible clustering coefficient and power law degree ...[+]

05C80 ; 68Q87 ; 74E35

Sélection Signaler une erreur
Déposez votre fichier ici pour le déplacer vers cet enregistrement.
y

Comptage et design multiple d'ARN - Ponty, Yann (Auteur de la Conférence) | CIRM H

Multi angle

Les Acides RiboNucléiques (ARN) sont des biopolymères linéaires omniprésents dans notre organisme, pouvant être codés comme des séquences sur un alphabet A,C,G,U. Ces molécules se replient sur elles-mêmes, établissant des liaisons hydrogènes d'où découlent l'appariement de certaines des positions, selon des règles de compatibilité des lettres n'autorisant que les paires dans l'ensemble A,U,C,G,G,U. De ce mécanisme d'appariements résulte l'adoption d'une ou plusieurs conformations, appelées structures secondaires, au passage bijectif avec les mots de Motzkin sans-pic. De nombreuses applications, en nanotechnologie, médecine, ou biostatistique, nécessitent de compter, ou encore engendrer aléatoirement, des séquences d'ARN simultanément compatibles avec un ensemble donné de structures secondaires. Un algorithme exponentiel, basé sur une décomposition (ear decomposition) du graphe de dépendance induit par l'union des paires, a ainsi été proposé par Höner zu Siederdissen et al [A]. Cet algorithme utilise la méthode récursive/programmation dynamique pour précalculer les nombres d'affectations compatibles avant/après chacun des choix locaux. Une phase de génération utilise ensuite ces nombres pour garantir l'uniformité de la génération. Cependant, cet algorithme ne permettait pas la prise en compte de critères énergétiques plus complexes, nécessitant l'utilisation d'un formalisme plus expressif que les graphes de dépendance (hypergraphes). De plus, la complexité de l'algorithme, théoriquement exponentielle sur un paramètre non-borné et parfois élevée en pratique, soulevait la question de la complexité du problème de comptage.
Dans un travail récent avec Hammer, Wang et Will [B], nous établissons la #P complétude, et la complexité d'approximation, du problème de comptage des séquences compatibles. Notre preuve repose sur une bijection simple entre les séquences compatibles et les stables du graphes de dépendance. Nous proposons une approche alternative, basée sur la décomposition arborescente, pour contrôler de façon probabiliste [C] l'énergie moyenne des séquences pour les différentes structures, ou la composition en les différentes lettres. Ces résultats fournissent un cadre flexible et expressif pour le design d'ARN, et soulèvent des questions sur l'utilisation de stratégies alternatives (génération de Boltzmann, simulation parfaite) pour la génération aléatoire, ainsi sur le concept d'analyse en moyenne dans un contexte où la donnée en entrée est plus complexe que la taille de l'objet engendré.[-]
Les Acides RiboNucléiques (ARN) sont des biopolymères linéaires omniprésents dans notre organisme, pouvant être codés comme des séquences sur un alphabet A,C,G,U. Ces molécules se replient sur elles-mêmes, établissant des liaisons hydrogènes d'où découlent l'appariement de certaines des positions, selon des règles de compatibilité des lettres n'autorisant que les paires dans l'ensemble A,U,C,G,G,U. De ce mécanisme d'appariements résulte ...[+]

05A05 ; 05B45 ; 60C05 ; 68Q87 ; 68Q45 ; 68R05 ; 68W32 ; 90C27 ; 92D20

Sélection Signaler une erreur