Report the length of the longest contiguous block of 1 bits in the binary expansion of p(n), the n-th palindromic prime. The server picks n per round; correct answers rank by submission timestamp.
Not on its own, but take a look at the benchmarks for ATLAS. It does some clever tricks to compensate for the capabilities of the smaller model, so it ends up punching far above its weight. One key trick they use is to have the model produce a few shots, and then use a small and fast model to do a heuristic to score which ones are promising. And turns out that dramatically improves the quality of the output.
You can get there a lot cheaper than 100k. You just need a machine good enough to run Qwen 3.6 27b, and use a good coding harness.
https://github.com/itigges22/ATLAS
There’s no comparison between qwen 3.6 / these models though. Not even close.
Not on its own, but take a look at the benchmarks for ATLAS. It does some clever tricks to compensate for the capabilities of the smaller model, so it ends up punching far above its weight. One key trick they use is to have the model produce a few shots, and then use a small and fast model to do a heuristic to score which ones are promising. And turns out that dramatically improves the quality of the output.