zone411's submissions

1.		Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur)
		7 points by zone411 46 days ago \| past
2.		Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur)
		5 points by zone411 69 days ago \| past
3.		SWE-Lancer: a benchmark of freelance software engineering tasks from Upwork (arxiv.org)
		111 points by zone411 77 days ago \| past \| 74 comments
4.		LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur)
		17 points by zone411 84 days ago \| past \| 3 comments
5.		Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
		7 points by zone411 3 months ago \| past \| 1 comment
6.		Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur)
		6 points by zone411 3 months ago \| past
7.		Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur)
		5 points by zone411 3 months ago \| past
8.		Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur)
		8 points by zone411 4 months ago \| past
9.		Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur)
		7 points by zone411 6 months ago \| past \| 1 comment
10.		LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur)
		6 points by zone411 6 months ago \| past
11.		O1-preview and o1-mini results on NYT Connections (twitter.com/lechmazur)
		2 points by zone411 7 months ago \| past \| 1 comment
12.		Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy (twitter.com/xai)
		213 points by zone411 on Nov 5, 2023 \| past \| 228 comments
13.		Can you beat a stochastic parrot? ParrotChess.com (parrotchess.com)
		3 points by zone411 on Sept 22, 2023 \| past \| 4 comments
14.		Generative AI while browsing in Chrome (labs.google.com)
		3 points by zone411 on Aug 15, 2023 \| past
15.		Statement on AI Risk (safe.ai)
		341 points by zone411 on May 30, 2023 \| past \| 921 comments
16.		Google tells staff it plans to limit publishing AI research (businessinsider.com)
		63 points by zone411 on May 5, 2023 \| past \| 28 comments
17.		4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward (servethehome.com)
		2 points by zone411 on Jan 10, 2023 \| past \| 1 comment
18.		Fast and Furious Movie Titles by 'Claude' from Anthropic AI (twitter.com/jayelmnop)
		2 points by zone411 on Jan 9, 2023 \| past
19.		SatelliteXplorer (esri.com)
		2 points by zone411 on Dec 30, 2022 \| past
20.		SBF Arrested by Bahamian Authorities (twitter.com/tier10k)
		1308 points by zone411 on Dec 12, 2022 \| past \| 812 comments
21.		Large Language Models Can Self-Improve (openreview.net)
		3 points by zone411 on Oct 2, 2022 \| past \| 1 comment
22.		America Reached One Million Covid Deaths (nytimes.com)
		5 points by zone411 on May 14, 2022 \| past
23.		Show HN: Catchy melodies made with a diffusion-based neural net assistant (youtube.com)
		38 points by zone411 on May 11, 2022 \| past \| 14 comments
24.		Honduras Repeals ZEDEs (laprensa.hn)
		47 points by zone411 on April 21, 2022 \| past \| 10 comments
25.		Russian Tech Giant Yandex Says Might Default (barrons.com)
		12 points by zone411 on March 4, 2022 \| past
26.		Maryland hasn't updated Covid case data for 15 days due to a security incident (maryland.gov)
		112 points by zone411 on Dec 20, 2021 \| past \| 56 comments
27.		Astronomers want NASA to build a giant space telescope to peer at alien Earths (npr.org)
		63 points by zone411 on Nov 4, 2021 \| past \| 34 comments
28.		Have Italian Scholars Figured Out the Identity of Elena Ferrante? (lithub.com)
		1 point by zone411 on April 13, 2021 \| past
29.		Trump tells reporters aboard Air Force One he is banning TikTok (twitter.com/joshnbcnews)
		26 points by zone411 on Aug 1, 2020 \| past \| 17 comments
30.		Moravec Transfer (everything2.com)
		1 point by zone411 on Jan 18, 2020 \| past
		More