Description
The BAIR Blog
Summary
- The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming.
- The penultimate section will discuss how goal relabeling, a modified problem definition, and inverse RL extract “good data” in the multi-task setting.
- For example, reward-weighted regression [Williams 2007] and advantage-weighted regression [Neumann 2009, Peng 2019] combine the two steps by doing behavior cloning on reward-weighted data.
- More generally, this result is exciting Future Directions In this article, we discussed how RL can be viewed as solving a sequence of standard supervised learning problems but using optimized (relabled) data.