#bellman-equation

Why Reward Shaping Sucks and What You Can Do About It

In my previous article on reward shaping, I walked through four hard-learned lessons about balancing collision penalties in a navigation task. I eventually found the "Goldilocks solution" - a -0.1 penalty that let my agent learn to navigate obstacles...

Sep 18, 20256 min read19

Command Palette