My point was to caution against being too confident about the underlying architecture, not to argue for any particular alternative.
Your statement is false - things changed a lot between gpt4 and o1 under the hood, but notably not a larger model size. In fact the model size of o1 is smaller than gpt4 by several orders of magnitude! Improvements are being made in other ways.