Reinforcement Learning Grid World

Compare Q-Learning and SARSA algorithms

Grid Configuration:
0
Episode
0
Total Reward

Environment

🤖 Agent | 🚀 Start | 🎯 Goal | 💀 Hazard | 💰 Reward
Tap cells to see Q-values. Green intensity shows learning progress.
🏆

Hall of Fame

Filter:
📊
No scores yet. Be the first to submit!

Learning Algorithm

Key Differences

Q-Learning:Uses max Q-value for next state (off-policy), can be more aggressive
SARSA:Uses actual next action Q-value (on-policy), generally more conservative

Learning Parameters

Core

Directional Heuristics

Advanced

Episode Rewards

No episodes completed yet. Run some episodes to see the reward history.