"The tremendous gain of OpenAI's o3 may be overstated by ARC, because it's the first model able to operate on pixel grids of problem length that ARC happens to exist in" (humans underestimate the difficulty of 2D perception for LLMs, and it's this aspect of ARC-AGI that o3 scaling tackled well)