Actor-Critic with a linear neural network.
Network Architecture (Linear):
Input: One-Hot Vector (Size 12)
Actor: Linear → Softmax (4 outputs) | Critic: Linear (1 output)
Legend: S: Start | G: Goal (+10) | X: Trap (-5) | -: Path (-1)