It can never be just reasoning, right? Reasoning is the multiplier on some base model, and surely no amount of reasoning on top of something like gpt-2 will get you o1.
This model is too expensive right now, but as compute gets cheaper — and we have to keep in mind, that it will — having a better base to multiply with will enable things that just more thinking won't.
You can try for yourself with the distilled R1's that Deepseek released. The qwen-7b based model is quite impressive for its size and it can do a lot with additional context provided. I imagine for some domains you can provide enough context and let the inference time eventually solve it, for others you can't.
This model is too expensive right now, but as compute gets cheaper — and we have to keep in mind, that it will — having a better base to multiply with will enable things that just more thinking won't.