Reinforcement learning (RL)
Reinforcement learning (RL) is an area of AI/ML concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward
# Resources
- https://en.wikipedia.org/wiki/Reinforcement_learning
- Reinforcement learning is the task of learning what actions to take, given a certain situation/environment, so as to maximize a reward signal. The interesting difference between supervised and reinforcement learning is that this reward signal simply tells you whether the action (or input) that the agent takes is good or bad. It doesn’t tell you anything about what the best action is. Contrast this to CNNs where the corresponding label for each image input is a definite instruction of what the output should be for each input. Another unique component of RL is that an agent’s actions will affect the subsequent data it receives. For example, an agent’s action of moving left instead of right means that the agent will receive different input from the environment at the next time step.
- Curriculum for Reinforcement Learning
- Andrej Karpathy’s introduction to RL
- Spinning Up as a Deep RL Researcher
- Evolution strategies vs RL
- Reinforcement learning derivations (math)
- Introduction to various RL algos
- Q-learning
- Temporal differencing (TD) learning is a prediction-based machine learning method.
- It has primarily been used for the reinforcement learning problem, and is said to be “a combination ofMonte Carlo ideas and dynamic programming (DP) ideas.”
- TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy, and is related to dynamic programming techniques as it approximates its current estimate based on previously learned estimates (a process known as bootstrapping). The TD learning algorithm is related to the temporal difference model of animal learning. As a prediction method, TD learning considers that subsequent predictions are often correlated in some sense.
- TD-Lambda: This algorithm was famously applied by Gerald Tesauro to createTD-Gammon, a program that learned to play the game of backgammon at the level of expert human players. The lambda parameter refers to the trace decay parameter, with 0<= lambda <=1. Higher settings lead to longer lasting traces; that is, a larger proportion of credit from a reward can be given to more distant states and actions when lambda is higher, with lambda=1 producing parallel learning to Monte Carlo RL algorithms.
- SARSA
# Courses, talks and books
- #COURSE Reinforcement Learning (UCL)
- #COURSE CS294-112 Deep Reinforcement Learning Sp17
- #COURSE Practical Reinforcement Learning (Yandex)
- #COURSE Tutorial: Introduction to Reinforcement Learning
- #TALK Deep Learning and Reinforcement Learning Summer School, Toronto 2018
- #TALK Deep RL Bootcamp
- #BOOK Deep Reinforcement Learning (2020 SPRINGER)
# Code
- #CODE Acme: a research framework for reinforcement learning
- #CODE Deep Reinforcement Learning Model ZOO
- #CODE Open.ai Gym - A toolkit for developing and comparing reinforcement learning algorithms
- #CODE Horizon (Facebook) - The first open source reinforcement learning platform for large-scale products and services
- #CODE Keras-rl - Deep Reinforcement Learning for Keras
- #CODE TRFL (pronounced “truffle”) is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Learning agents
- #CODE Surreal - Open-Source Distributed Reinforcement Learning Framework by Stanford Vision and Learning Lab
- #CODE Tensorforce - Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice.
- #CODE
Tensorlayer
- Deep Learning and Reinforcement Learning Library for Scientists and Engineers
- https://tensorlayer.readthedocs.io/en/latest/index.html
# References
- #PAPER DQN: Human-level control through Deep Reinforcement Learning (Mnih 2015)
- #PAPER Learning to Optimize (Li 2016)
- #PAPER Deep Recurrent Q-Learning for Partially Observable MDPs (Hausknecht 2017)
- #PAPER
Neural Episodic Control (Pritzel 2017)
- Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance
- Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them.
- The agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function
- https://www.technologyreview.es/s/6656/olvidese-del-aprendizaje-profundo-el-nuevo-enfoque-de-google-funciona-mucho-mejor
- Explanation of Neural Episodic Control
- #PAPER #REVIEW
A Brief Survey of Deep Reinforcement Learning (Arulkumaran 2017)
- Many of the successes in DRL have been based on scaling up prior work in RL to high-dimensional problems. This is due to the learning of low-dimensional feature representations and the powerful function approximation properties of neural networks. By means of representation learning, DRL can deal efficiently with the curse of dimensionality, unlike tabular and traditional non-parametric methods.
- https://adeshpande3.github.io/adeshpande3.github.io/Deep-Learning-Research-Review-Week-2-Reinforcement-Learning
- #PAPER #REVIEW An Introduction to Deep Reinforcement Learning (Fancois-Lavet 2018)
- #PAPER Supervising strong learners by amplifying weak experts (Christiano 2018)
- #PAPER MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (Schrittwieser 2019)
- #PAPER #REVIEW Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems (Levine 2020)
- #PAPER Decision Transformer: Reinforcement Learning via Sequence Modeling (Chen 2021) ^decisiontransformer
- #PAPER
Reward is enough (Silver 2021)
- https://towardsdatascience.com/reward-is-enough-ml-paper-review-e448ee0a6092
- From the authors of “Attention is all you need”, this paper proposes an intriguing hypothesis that incentivizing AI agents with reward is enough to achieve General Artificial Intelligence
- “General intelligence, of the sort possessed by humans and perhaps also other animals, may be defined as the ability to flexibly achieve a variety of goals in different contexts. According to our hypothesis, general intelligence can instead be understood as, and implemented by, maximising a singular reward in a single, complex environment4”