I have a very similar prompting style to yours and share this experience.
I am an experienced programmer and usually have a fairly exact idea of what I want, so I write detailed requirements and use the models more as typing accelerators.
GPT-4 is useful in this regard, but I also tried about a dozen older prompts on Gemini Advanced/Ultra recently and in every case preferred the Ultra output. The code was usually more complete and prod-ready, with higher sophistication in its construction and somewhat higher density. It was just closer to what I would have hand-written.
It's increasingly clear though LLM use has a couple of different major modes among end-user behavior. Knowledge base vs. reasoning, exploratory vs. completion, instruction following vs. getting suggestions, etc.
For programming I want an obedient instruction-following completer with great reasoning. Gemini Ultra seems to do this better than GPT-4 for me.
It constantly hallucinates APIs for me, I really wonder why people's perceptions are so radically different. For me it's basically unusable for coding. Perhaps I'm getting a cheaper model because I live in a poorer country.
Spent a few hours comparing Gemini Advanced with GPT-4.
Gemini Advanced is nowhere even close to GPT-4, either for text generation, code generation or logical reasoning.
Gemini Advanced is constantly asking for directions "What are your thoughts on this approach?" even to create a short task list of 10 items. Even when being told several times to provide the full list, and not stop at every three or four items and ask for directions. Is constantly giving moral lessons or finishing the results with annoying marketing style comments of the type "Let's make this an awesome product!"
Code is more generic, solutions are less sophisticated. On a discussion of Options Trading strategies Gemini Advanced got core risk management strategies wrong and apologized when errors were made clear to the model. GPT-4 provided answers with no errors, and even went into the subtleties of some exotic risk scenarios with no mistakes.
Maybe 1.5 will be it, or maybe Google realized this quite quickly and are trying the increased token size as a Hail Mary to catch up. Why release so soon?
I asked Gemini Advanced, the paid one, to "Write a script to delete some files" and it told me that it couldn't do that because deleting files was unethical. At that point I cancelled my subscription since even GPT-4 with all its problems isn't nearly as broken as Gemini.
If you share your prompt I'm sure people here can help you.
Here's a prompt I used and got a a script that not only accomplishes the objective, but even has an option to show what files will be deleted and asks for confirmation before deleting them.
Write a bash script to delete all files with the extension .log in the current directory and all subdirectories of the current directory.
I’m going to have to try Gemini for code again. It just occurred to me as a Xoogler that if they used Google’s code base as the training data it’s going to be unbeatable. Now did they do that? No idea, but quality wins over quantity, even with LLM.
I am an experienced programmer and usually have a fairly exact idea of what I want, so I write detailed requirements and use the models more as typing accelerators.
GPT-4 is useful in this regard, but I also tried about a dozen older prompts on Gemini Advanced/Ultra recently and in every case preferred the Ultra output. The code was usually more complete and prod-ready, with higher sophistication in its construction and somewhat higher density. It was just closer to what I would have hand-written.
It's increasingly clear though LLM use has a couple of different major modes among end-user behavior. Knowledge base vs. reasoning, exploratory vs. completion, instruction following vs. getting suggestions, etc.
For programming I want an obedient instruction-following completer with great reasoning. Gemini Ultra seems to do this better than GPT-4 for me.