Hacker News new | past | comments | ask | show | jobs | submit login

It’d be great if someone would do that with the same data and prompt to other models.

I did like the formatting and attributions but didn’t necessarily want attributions like that for every section. I’m also not sure if it’s fully matching what I’m seeing in the thread but maybe the data I’m seeing is just newer.





Thanks for sharing. To me, purely on personal preference, the Gemini models did best on this task, which also fits with my personal experience using Googles models to summarize extensive, highly specialized text. Geminis 2.0 models do especially well on Needle in Haystack type tests in my experience.


At a glance, none of these appear to be meaningfully worse than GPT-4.5


Seeing the other models, I actually come away impressed with how well GPT-4.5 is organizing the information and how well it reads. I find it a lot easier to quickly parse. It's more human-like.


I noticed 4o mini didn't follow the directions to quote users. My favourite part of the 4.5 summary was how it quoted Antirez. 4o mini brought out the same quote, but failed to attribute it as instructed.


It's fascinating, but while this does mean it strays from the given example, I actually feel the result is a better summary. The 4.5 version is so long you might just read the whole thread yourself.


I actually think the Claude 3.7 Sonnet summary is better.


yeah I liked it too, especially for 10x less the price lol


Interesting, thanks for doing this. I'd say that (at a glance) for now it's still worth to use more passes with smaller models than one pass with 4.5

Now, if you'd want to generate training data, I could see wanting to have the best answers possible, where even slight nuances would matter. 4.5 seems to adhere to instructions much better than the others. You might get the same result w/ generating n samples and "reflect" on them with a mixture of models, but then again you might not. Going through thousands of generations manually is also costly.


Compared to GPT-4.5 I prefer the GPT-4o version because it is less wordy. It summarizes and gives the gist of the conversation rather than reproducing it along with commentary.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: