Reality Navigation: Baseline Results

https://www.youtube.com/watch?v=U4ckkoolNH0

Quick update on our Reality simulator navigation experiments. We've moved from Y-axis rewards to proper target-based navigation with compass guidance.

What Changed

We switched from rewarding Y-axis movement to giving the agent +1 for actually reaching targets. The agent now gets compass bearing and distance info, so it knows where to go but has to figure out how to get there. No more arbitrary direction rewards.

Results

The agent navigates successfully and reaches targets consistently without crashing. But I've noticed two annoying behaviors. First, it's way too cautious about collisions and takes unnecessarily wide routes when shorter paths clearly exist. Second, it spends forever circling around looking for the "perfect" entry point before committing to go through gates or between obstacles.

Scattered cubes environment Agent navigating through obstacles - note the overly cautious routing

Next Steps

I need to fix both the excessive cautiousness and the circling behavior. My plan is to increase exploration aggression during training, start rewarding shorter paths to targets, and add some time pressure to discourage all that circling around.

Want to dive deeper into our simulation framework? Check out the Reality project - built on the Madrona engine for high-performance RL simulation.

Reality Navigation: Baseline Results

What Changed

Results

Next Steps

Comments

More from this blog

The Building Blocks of an Agent Memory System

Smaller is Better: Replacing GPT-4o-mini with a 7B Local Judge

How InfoNCE Creates Exploration: The Hidden Engine of Contrastive RL

Contrastive RL: A Step-by-Step Guide to Learning Reachability

How wp.ScopedTimer Found My 12x Speedup

Command Palette

What Changed

Results

Next Steps

Comments

More from this blog