Hacking Diffusion into Qwen3 for the ARC Challenge
Imagine solving a jigsaw puzzle but you have to place pieces starting from the top-left corner in order. No jumping around. No doing the edges first. That's what we're making these models do. Instead of forcing the model to work in typewriter-order, what if we let it fill in the easy parts first? Maybe you can use the uncertainty to measure what is obvious and what is tricky. As it turns out, it works.
Read more →