Reinforcement Learning Grid World

Compare Q-Learning and SARSA algorithms

Grid Configuration:

Episode

Total Reward

Environment

🤖 Agent | 🚀 Start | 🎯 Goal | 💀 Hazard | 💰 Reward

Q-Values:

High Positive

Medium Positive

Low Positive

Zero

Negative

Numbers show max valid Q-value, arrows show best valid action

Tap cells to see Q-values. Green intensity shows learning progress.

🏆

Hall of Fame

Filter:

📊

No scores yet. Be the first to submit!

Learning Algorithm

Key Differences

Q-Learning:Uses max Q-value for next state (off-policy), can be more aggressive

SARSA:Uses actual next action Q-value (on-policy), generally more conservative

Learning Parameters

Core

Directional Heuristics

Advanced

Episode Rewards

No episodes completed yet. Run some episodes to see the reward history.