1. | | Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur) |
|
7 points by zone411 46 days ago | past
|
2. | | Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur) |
|
5 points by zone411 69 days ago | past
|
3. | | SWE-Lancer: a benchmark of freelance software engineering tasks from Upwork (arxiv.org) |
|
111 points by zone411 77 days ago | past | 74 comments
|
4. | | LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur) |
|
17 points by zone411 84 days ago | past | 3 comments
|
5. | | Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur) |
|
7 points by zone411 3 months ago | past | 1 comment
|
6. | | Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur) |
|
6 points by zone411 3 months ago | past
|
7. | | Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur) |
|
5 points by zone411 3 months ago | past
|
8. | | Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur) |
|
8 points by zone411 4 months ago | past
|
9. | | Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur) |
|
7 points by zone411 6 months ago | past | 1 comment
|
10. | | LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur) |
|
6 points by zone411 6 months ago | past
|
11. | | O1-preview and o1-mini results on NYT Connections (twitter.com/lechmazur) |
|
2 points by zone411 7 months ago | past | 1 comment
|
12. | | Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy (twitter.com/xai) |
|
213 points by zone411 on Nov 5, 2023 | past | 228 comments
|
13. | | Can you beat a stochastic parrot? ParrotChess.com (parrotchess.com) |
|
3 points by zone411 on Sept 22, 2023 | past | 4 comments
|
14. | | Generative AI while browsing in Chrome (labs.google.com) |
|
3 points by zone411 on Aug 15, 2023 | past
|
15. | | Statement on AI Risk (safe.ai) |
|
341 points by zone411 on May 30, 2023 | past | 921 comments
|
16. | | Google tells staff it plans to limit publishing AI research (businessinsider.com) |
|
63 points by zone411 on May 5, 2023 | past | 28 comments
|
17. | | 4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward (servethehome.com) |
|
2 points by zone411 on Jan 10, 2023 | past | 1 comment
|
18. | | Fast and Furious Movie Titles by 'Claude' from Anthropic AI (twitter.com/jayelmnop) |
|
2 points by zone411 on Jan 9, 2023 | past
|
19. | | SatelliteXplorer (esri.com) |
|
2 points by zone411 on Dec 30, 2022 | past
|
20. | | SBF Arrested by Bahamian Authorities (twitter.com/tier10k) |
|
1308 points by zone411 on Dec 12, 2022 | past | 812 comments
|
21. | | Large Language Models Can Self-Improve (openreview.net) |
|
3 points by zone411 on Oct 2, 2022 | past | 1 comment
|
22. | | America Reached One Million Covid Deaths (nytimes.com) |
|
5 points by zone411 on May 14, 2022 | past
|
23. | | Show HN: Catchy melodies made with a diffusion-based neural net assistant (youtube.com) |
|
38 points by zone411 on May 11, 2022 | past | 14 comments
|
24. | | Honduras Repeals ZEDEs (laprensa.hn) |
|
47 points by zone411 on April 21, 2022 | past | 10 comments
|
25. | | Russian Tech Giant Yandex Says Might Default (barrons.com) |
|
12 points by zone411 on March 4, 2022 | past
|
26. | | Maryland hasn't updated Covid case data for 15 days due to a security incident (maryland.gov) |
|
112 points by zone411 on Dec 20, 2021 | past | 56 comments
|
27. | | Astronomers want NASA to build a giant space telescope to peer at alien Earths (npr.org) |
|
63 points by zone411 on Nov 4, 2021 | past | 34 comments
|
28. | | Have Italian Scholars Figured Out the Identity of Elena Ferrante? (lithub.com) |
|
1 point by zone411 on April 13, 2021 | past
|
29. | | Trump tells reporters aboard Air Force One he is banning TikTok (twitter.com/joshnbcnews) |
|
26 points by zone411 on Aug 1, 2020 | past | 17 comments
|
30. | | Moravec Transfer (everything2.com) |
|
1 point by zone411 on Jan 18, 2020 | past
|
|
|
More |