JEPA Pattern Laboratory

Browser toy for the I-JEPA idea in minutes: each row is a permutation of three user-drawn patches; mask one cell, then predict its representation from visible cells plus position cues (teacher = EMA of the encoder). Not full-scale CV training.

1 Pattern editors (4×4 each)

Draw three slightly different motifs so masked prediction has something to latch onto.

2 Generated pattern grid

Each row places patterns 1, 2, 3 exactly once (random permutation per row). After OK, click a cell to mark the slot you want to inspect (“?”). Training samples a random masked cell each step on this grid (plus mask coordinates in the predictor).

Training steps (×1000)

Status: Ready

Tap OK in panel 1 to build this grid.

3 Latent space & masked cell

● Readout: dots visualize context latents (mean over 8 visible cells, L2 after encoder). Below: auxiliary decoder sketches the masked patch from predicted latent — for eyes only; core loss is latent MSE vs EMA teacher. The predictor receives normalized row/col for the masked cell so changing the masked slot inference updates the decoded patch.

Loss: 0.0000

Step: 0

Masked cell decode

predicted

ground truth

After OK: pick mask on the grid, train, then compare below. Tap another cell to re-run inference with a different mask.