← Extensions to Claude plays Geoguessr

ORM to check how right each step is? Then finetune? I think you could do something smart with test-time inference if you could train a reward-model on Geogeussr (this is so overkill)

Humans as the truth grounding for LLMs, it’s giving a bit lazy. Like they have to find ways to ground themselves.

Claude plays GeoGuessr