Q-Learning Grid World

Visualizing the Q-Table as an agent learns to navigate a 4x4 environment

Note: Numbers in cells represent $Q(s, a)$. Blue highlights the optimal action for each state.

Controls

Update Rule

$$Q(s, a) \leftarrow Q(s, a) + \alpha [r + \gamma \max_{a'} Q(s', a') - Q(s, a)]$$

Training Metrics & Params

Episodes
0