All the benchmarks provide substantial scaffolding and specification details, and that's if they are zero-shot at all, which they often are not. In reality, nobody wants to spend as much time providing so much details or examples just to get the AI to write the correct function, when that same time and effort you'd have used to write it yourself.
Also, those benchmarks often run the model K times on the same question, and if any one of them is correct, they say it passed. That could mean if you re-ran the model 8 times, it might come up with the right answer only once. But now you have to waste your time checking if it is right or not.
I want to ask: "Write a function to count unique numbers in a list" and get the correct answer the first time.
What you need to ask:
"""
Write a Python function that takes a list of integers as input and returns
the count of numbers that appear exactly once in the list.
The function should:
- Accept a single parameter: a list of integers
- Count elements that appear exactly once
- Return an integer representing the count
- Handle empty lists and return 0
- Handle lists with duplicates correctly
Please provide a complete implementation.
"""
And run it 8 times and if you're lucky it'll get it correct zero-shot.
Edit: I'm not even aware of a Pass@1, zero-shot, and without detailed prompting (natural prompting) benchmark. If anyone knows one let me know.
Also, those benchmarks often run the model K times on the same question, and if any one of them is correct, they say it passed. That could mean if you re-ran the model 8 times, it might come up with the right answer only once. But now you have to waste your time checking if it is right or not.
I want to ask: "Write a function to count unique numbers in a list" and get the correct answer the first time.
What you need to ask:
""" Write a Python function that takes a list of integers as input and returns the count of numbers that appear exactly once in the list.
The function should: - Accept a single parameter: a list of integers - Count elements that appear exactly once - Return an integer representing the count - Handle empty lists and return 0 - Handle lists with duplicates correctly
Please provide a complete implementation. """
And run it 8 times and if you're lucky it'll get it correct zero-shot.
Edit: I'm not even aware of a Pass@1, zero-shot, and without detailed prompting (natural prompting) benchmark. If anyone knows one let me know.