Description
Reinforcement Learning - Policy Gradient REINFORCE algorithm with Baseline, concepts and Python implementation in Tensorflow2 to play CartPole
Summary
- Policy Gradient REINFORCE Algorithm with Baseline Algorithm and Implementation in Tensorflow Policy gradient methods are very popular reinforcement learning(RL) algorithms.
- To do so, we search for the maxima in V(θ) by ascending the gradient of the policy, w.r.t parameters θ.
- In fact, the value function itself is a good candidate for baseline.
- Let’s run the code and render a video once training is done.