Hacker News new | past | comments | ask | show | jobs | submit login

I'm sure the debate over the definition of AGI is important and will continue for a while, but... I can't care about it anymore.

Between Perplexity searching and summarizing, Claude explaining, and qwen (and other tools) coding, I'm already as happy as can be with whatever you want to call this level of intelligence.

Just today I used a completely local AI research tool, based on Ollama. It worked great.

Maybe it won't get much better? Or maybe it'll take decades instead of years? Ok. I remember not having these tools. I never want to go back.




Same here.

The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.

It reminds me of being a kid and asking my grandpa a million questions, like how light bulbs worked, or what was inside his radio, or how do we have day and night.

And before anyone talks about accuracy or hallucinations, these conversations usually are treated as starting off points to then start googling specific terms, people, laws, treaties, etc to dig deeper and verify.

Last year during a visit to my first Indian reservation, I had a whole bunch of questions that nobody in person had answers to. And ChatGPT was invaluable in understanding concepts like where a reservation’s autonomy begins and ends. And why certain tribes are richer than others. What happens when someone calls 911 on a reservation. Or speeds. Or wants to start a factory without worrying about import/export rules. And what causes some tribes to lose their language faster than others. And 20 other questions like this.

And most of those resulted in google searches to verify the information. But I literally could never do this before.

Same this year when I’m visiting family in India. To learn about the politics, the major players, WHY they are considered major players (like the Chief Minister of Bengal or Uttar Pradesh or Maharashtra being major players because of their populations and economies). Criticisms, explanations of laws, etc etc.

For insanely curious people who often feel unsatisfied with the answers given by those around them, it’s the greatest thing ever.


> The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.

It is dangerous to assume that LLMs are experts on any topic. With or without quotes. You are getting a super fast journalist intern with a huge memory but inability to reason critically, lacking understanding about anything and huge unreliability when it comes to answering questions (you can get completely different answers to the same question depending on how you answer it and sometimes even the same question can get you different answers). LLMs are very useful and are a true game changer. But calling that expertise is a disservice to the true experts.


I actually find LLMs lacking true expertise to be a feature, not a bug. Most of the time I'm starting from a place of no knowledge on a topic that's novel to me, I ask some questions, it replies with summaries, keywords, names of things, basic concepts. I enter with the assumption that it's really no different than googling phrases and sifting through results (except I don't know what phrases I'm supposed to be googling in the first place), so the summaries help a lot. I then ask a lot of questions and ask for examples and explanations, some of which of course turn out to be wrong, but the more I push back, re-generate, re-question, etc (while using traditional search engines in another tab), the better responses I can get it to provide.

Come to think of it, it's really no different than walking into Home Depot and asking "the old guys" working in the aisles about stuff -- you can access some fantastic knowledge if you know the names of all the tools and techniques, and if not, can show them a picture or describe what you're trying to do and they'll at least point you in a starting direction with regards to names of tools needed, techniques to use, etc.

Just like I don't expect Home Depot hourly worker Grandpa Bob to be the end-all-be-all expert (for free, as well!), neither do I expect ChatGPT to be an all-knowing-all-encompassing oracle of knowledge.

It'll probably get you 95% of the way there though!


You forget that it makes stuff up and you won't know it until you google it. When googling, fake stuff stands out because truth is consistent.

Querying multiple llms at the same time and being able to compare results is a much better comparison to googling but no one does this.

As I said, you are talking to a super confident journalist intern who can give you answers but you won't know if it is true or partially true until you consult with a human source of knowledge.

It's not even similar to asking the old guys at the Home Depot because they can tell you if they are unsure they have a good answer for you. An LLM won't. Old guys won't hallucinate facts the way an LLM will

It is really is the 21st century Searle's epistemological Chinese room nightmare edition. Grammar checks out but whatever is spit out doesn't necessarily bear any resemblance to reality


LLMs train from online info. Online info is full of misinformation. So I would not trust an answer to be true just because it is given by multiple LLMs. That is actually a really good way to fall into the misinformation trap.


Most of OpenAI's training data is written by hired experts now. They also buy datasets of professional writing such as Time's archives.


My point was that googling gets you a variety of results from independent sources. So I said that querying multiple LLMs is as close as you can get for a similar experience.


I agree with everything you said, except I think we're both right at the same time.

Ol' boy at the Depot is constrained by his own experiences and knowledge, absolutely can hallucinate, oftentimes will insert wild, irrelevant opinions and stories while getting to the point, and frankly if you line 6 of them up side by side to answer the same question, you're probably leaving with 8 different answers.

There's never One True Solution (tm) for any query; there are 100 ways to plumb your way out of a problem, and you're asking a literal stranger who you assume will at least point you in the right direction (which is kind of preposterous to begin with)

I encourage people to treat LLMs the same way -- use it as a jumping off point, a tool for discovery that's no more definitive than if you're asking for directions at some backwoods gas station. Take the info you get, look deeper with other tools, work the problem, and you'll find a solution.

Don't accept anything they provide at face value. I'm sure we all remember at least a couple teachers growing up who were the literal authority figures in our lives at the time, fully accredited and presented to us as masters of their curriculum, who were completely human, oftentimes wrong, and totally full of shit. So goes the LLM.


TBF they're only truly useful when hooked up to RAG imo. I'm honestly surprised that we haven't yet built a digital seal of authenticity for truth that can be used by AI agents + RAG to conceivably give the most accurate answer possible.

Scientists should be writing papers sealed digitally once they're peer reviewed and considered "truth", same thing with journalist/news articles - sealed once confirmed true or backed up by a solid source in the same way we trust root certificates.

But then again, especially when it comes to journalism, cropping photos, chopping quotes, etc all to misrepresent etc. Turns out we're all the bad actors; it's in our DNA. And tbf, many people when presented with hard evidence to the contrary of the opinion that they cling onto like a babe to a breast, just plug their ears and cover their eyes.

Okay so maybe there's no point seeking truth/factual correctness, our species doesn't want it 99% of the time, unless it affects them directly (eg people that shoot down public healthcare until they have an expensive illness themselves).


People who are experts (PhD and 20 years of experience) often have very dumb opinions in their field of expertise. Experts make amateur mistakes too. Look at the books written by expert economists, expert psychologists, expert historians, expert philosophers, expert software engineers. Most books are not worth the paper they're written on, despite the authors being experts with decades of experience in their respective fields.

I think you overestimate the ability of a typical 'expert'. You can earn a PhD without the ability to reason critically. You can testify as an expert in a courtroom without understanding conditional probability. Lawyers and accountants in real life also totally contradict themselves when they get asked the same question twice but phrased slightly differently.


My personal criterion for calling somebody an expert, or "educated", or a "scholar" is that they have any random area of expertise where they really know their shit.

And as a consequence, they know where that area of expertise ends. And they know what half-knowing something feels like compared to really knowing something. And thus, they will preface and qualify their statements.

LLMs don't do any of that. I don't know if they could, I do know it would be inconvenient for the sales pitch around them. But the people that I call experts distinguish themselves not by being right with their predictions a lot, but rather by qualifying their statements with the degree of uncertainty that they have.

And no "expert system" does that.


> And as a consequence, they know where that area of expertise ends. And they know what half-knowing something feels like compared to really knowing something. And thus, they will preface and qualify their statements.

How do you count examples like Musk, then?

He is very cautious about rockets, and all the space science people I follow and hold in high regard, say he's actually a ___domain expert there. He regularly expectation-manages experimental SpaceX launches downward.

He's also very bold and brash about basically everything else; the majority of people I've seeing saying he's skilled in any other area have turned out to not themselves have any skills in those areas, while the people who do have expertise say he's talking nonsense at best and is taking wild safety risks at worst.


Musk is probably really good at back of the envelope calculations. The kind that lets you excel in first year physics. That skill puts you above a lot of people in finance and engineering when it comes to quickly assessing an idea. It is also a gimmick, but I respect it. My wild guess is that he uses that one skill to find out who to believe among the people he hires.

The rest of the genius persona is growing up with enough ego that he could become a good salesman, and also badly managed autism and also a badly managed drug habit.

Seeing him dabble in politics and social media shows instantly how little he understands the limits of his knowledge. A scholar he is not.


Anecdotal but I told chatgpt to include it's level of confidence in its answers and to let me know if it didn't know something. This priming resulted in it starting almost every answer with some variation of "I'm not sure, but.." when I asked it vague / speculative questions and then when I asked it direct matter of fact questions with easy answers it would answer with confidence.

That's not to say I think it is rationalizing it's own level of understanding, but that somewhere in the vector space it seems to have a Gradient for speculative language. If primed to include language about it, it could help cut down on some of the hallucination. No idea if this will effect the rate of false positives on the statements it does still answer confidently however


You'd have to find out the veracity of those leading phrases. I'm guessing that it just prefaces the answer with a randomly chosen statement of doubtfulness. The error bar behind every bit of knowledge would have to exist in the dataset.

(And in neural network terms, that error bar could be represented by the number of connections, by congruency of separate paths of arguing, by vividness of memories, etc ... it's not above human reasoning either, no need for new data structures ...)


The level of confidence with which people express themselves is a (neutral to me) style choice. I'm indifferent because when I don't know somebody I don't know whether to take their opinions seriously regardless of the level of confidence they project. Some people who really know their shit are brash and loud and other experts hedge and qualify everything they say. Outward humility isn't a reliable signal. Even indisputably brilliant people frequently don't know where their expertise ends. How often have we seen tech luminaries put a sophomoric understanding of politics on display on twitter or during podcast interviews? People don't end up with correctly calibrated uncertainty unless they put a ton of effort into it. It's a skill that doesn't develop by itself.


I agree, and a lot of that is cultural as well. But there is still a variety of confidence within the statements of a single person, hopefully a lot, and I calibrate to that.


AIs are a "master of all trades", so it is very unlikely they'll ever be able to admit they don't know something. What makes them very unreliable with topics where there is little available knowledge.


The fact that humans make mistakes has little to no bearing on their capacity to create monumental intellectual works. I recently finished The Power Broker by Robert Caro, and found a mistake in the acknowledgements where he mixed up two towns in New York. Does that invalidate his 500+ interviews and years of research? No.

Also, expert historians, philosophers psychs, etc. aren't judged based on their correctness, but on their breadth and depth of knowledge and their capacity to derive novel insights. Some of the best works of history I've read are both detailed and polemical, trying to argue for a new framework for understanding a historical epoch that shifts how we understand our modern world.

I don't know, I think I know very little about the world and there are people who know far more and I appreciate reading what they have to say, of course with a critical eye. It seems to me that disagreeing with that is just regurgitated anti-intellectualism, which is a coherent position, but it's good to be honest about it.


I don't disagree with what you say, but one difference is that we generally hold these people accountable and often shift liability to them when they are wrong (though not always, admittedly), which is not something I have ever seen done with any AI system.


This sounds like an argument in favor of AI personhood, not an argument against AI experts.


Right, but, then what? If you throw away all of the books from experts, what do you do, go out in your backyard and start running experiments to re-create all of science? Or start googling? What, some random person on the internet is going to be a better 'expert' than someone that wrote a book?

  Books might not be great, but they are at least some minimum bar to reach.  You had to do some study and analysis.
Seems like any critic of books, if you scratch the surface is just the whole anti-science/anti-education tropes again and again. What is the option? Don't like peer review science, fine, it has flaws, propose an option.


Many terrific books have been published in the past 500 years. The median book is not worth your time, however, and neither is the top 10%. You cannot possibly read everything so you have to be very selective or you will read only dreck. This is the opposite of being anti-science or anti-education.


But compared to the content on the internet?

So

Top 10% of Books. Ok

90 % of Books. marginal, lot of bad.

Internet. Just millions of pages of junk.

- Books still take some effort. So why not start there.

It isn't either/or, binary, a lot of books are bad, so guess I'll learn my medical degree from browsing the web because I don't trust those 'experts'.


The median book about medicine is over 100 years old, written in a language you don't speak, and filled to the brim with quackery. Worse than useless. Maybe you don't realize that bookstores and libraries only carry a minuscule fraction of all published works? You will get better information from reddit than from a book written before the discovery of penicillin.

I'll get you started with one of the great works from the 1600s:

https://www.gutenberg.org/cache/epub/49513/pg49513-images.ht...


You seem to have excluded the possibility of a "Top 10% of the Internet" tranche.


""Top 10% of the Internet""

What is the top 10% of the Internet that isn't part of some publishing arm of existing media? And, how can you tell? Some dudes blog about vaccines verses Harvard? Which do you believe.

Where are the self funded scientific studies that are occurring outside of academia? And thus not 'biased' by the 'elites'.

For internet only writing. There aren't a ton of "Astral Codex Ten"'s to draw upon as independent thinkers. And even then, he didn't sprout out of the ether fully formed, he has a degree, he was trained in academia.


> What is the top 10% of the Internet that isn't part of some publishing arm of existing media?

Why does that even matter?


?? You said "You seem to have excluded the possibility of a "Top 10% of the Internet" tranche. "

So you brought up the top 10% of the Internet, possibly as argument against books? That maybe there is valuable information on the Internet.

I was just saying, that 10% is also created by the same people that create books. So if you are arguing against books, then the top 10% of the Internet isn't some golden age of knowledge coming from some different more reliable source.


A Call to expertise is actually a fallacy. This is because experts can be wrong.

The scientific method relies on evidence and reproducible results, not authority alone.

Edited to add a reference: see under Appeal to authority. https://writingcenter.unc.edu/tips-and-tools/fallacies/


The fact is that in science, facts are only definitions and everything else is a theory which by definition is never 100% true.


> everything else is a theory which by definition is never 100% true.

Which definition of theory includes that it can never be 100% true? It can't be proven to be true, but surely it could be true without anyone knowing about it.


Frankly, I'm not sure what the point of the parent's comment is. Experts can be dumb and ChatGPT is dumb so it's an expert?

> People who are experts (PhD and 20 years of experience) often have very dumb opinions in their field of expertise.

The conventional wisdom is that experts are dumb OUTSIDE of their fields of expertise.

I don't know about you, but I would be very insulted by someone passing judgement like this on my own work in my field. I am sure that I would doubt their qualifications to even make the judgement.

Are there experienced fools? Sure. We both probably work with some. To me they are not experts, though.


> People who are experts (PhD and 20 years of experience) often have very dumb opinions in their field of expertise.

And the training data contains all those dumb opinions.


and rehashes it unthinkingly, without an idea of what it means to consider and disagree with it.


It's scary to think that we are moving into this direction: I can see how in the next few years politicians and judges will use LLMs as neutral experts.

And all in the hand of a few big tech corporations...


They aren't just in the hands of big corporations though.

The open source, local LLM community is absolutely buzzing right now.

Yes, the big companies are making the models, but enough of them are open weights that they can be fine tuned and run however you like.

I think LLMs genuinely do present an opportunity to be neutral experts, or at the least neutral third parties. If they're run in completely transparent ways, they may be preferable to humans in some circumstances.


The whole problem is that they are not neutral. They token-complete based on the corpus that was fed into them and the dimensions that were extracted out of those corpuses and the curve-fitting done to those dimensions. Being "completely transparent" means exposing _all_ of that, but that's too large for anyone to reasonably understand without becoming an expert in that particular model.

And then we're right back to "trusting expert human beings" again.


Nothing is truly neutral. Humans all have a different corpus too. We roughly know what data has gone in, and what the RL process looks like, and how the models handle a given ethical situation.

With good prompting, the SOTA models already act in ways I think most reasonable people would agree with, and that's without trying to build this specifically for that use case.


> Yes, the big companies are making the models, but enough of them are open weights that they can be fine tuned and run however you like.

And how long is that going to last? This is a well known playbook at this point, we'd be better off if we didn't fall for it yet again - it's comical at this point. Sooner or later they'll lock the ecosystem down, take all the free stuff away and demand to extract the market value out of the work they used to "graciously" provide for free to build an audience and market share.


How will they do this?

You can't take the free stuff away. It's on my hard drive.

They can stop releasing them, but local models aren't going anywhere.


They can't take the current open models away, but those will eventually (and I imagine, rather quickly) become obsolete for many areas of knowledge work that require relatively up to date information.


What are the hardware and software requirements for a self-hosted LLM that is akin to Claude?


Llama v3.3 70B after quantization runs reasonably well on a 24GB GPU (7900XTX or 4090) and 64GB of regular RAM. Software: https://github.com/ggerganov/llama.cpp .


The world was such a boring and dark place before everybody was constantly swiping on his smartphone in any situation, and before everysaid said basically got piped through a bigtech data center, where their algorithms control its way.

Now we finally have a tool where all of you can prove every day how strong/smart/funny/foo you are (not actually). How was life even possible without?

So, don't be so pessimistic. ;)


> I can see how in the next few years politicians and judges will use LLMs as neutral experts.

While also noting that "neutral" is not well-defined, I agree. They will be used as if they were.


Will they though?

We humans are very good at rejecting any information that doesn’t confirm our priors or support our political goals.

Like, if ChatGPT says (say) vaccines are good/bad, I expect the other side will simply attack and reject it as misinformation, conspiracy, and similar.


From what I can see, LLMs default to being sychophants; acting as if a sychophant was neutral is entirely compatible with the cognitive bias you describe.


Shrug

I treat LLM answers about the same way I treat wikipedia articles. If it's critical I get it right, I go to the wiki sources referenced. Recent models have gotten good at 'showing their sources', which is helpful.


> If it's critical I get it right, I go to the wiki sources referenced

the problem with this is that humans will likely use it for low key stuff, see that it works (or that the errors don't affect them too badly) and start using it for more serious stuff. It will all be good until someone uses it in something more serious and some time later it ends badly.

Human basic thinking is fairly primitive. If yesterday was sunny, the assumption is that today should too. The more this happens the higher your confidence. The problem is that this confidence emboldens people to gamble on that and when it is not sunny anymore, terrible things happen. A lot of hype driven behaviour is like that. Crypto was like that. The economic crisis of the late 00s was like that. And LLMs look set to be like that too.

It is going to take a big event involving big critical damage or a high profile series of deaths via misuse of an LLM to give policymakers and business leaders around the world a reality check and get them looking at LLMs in a more critical way. An AI autumn if you wish. It is going to happen at some point. Maybe not in 2025 or 2026 but it will definitely happen.

You may argue that it is the fault of the human using the LLM/crypto/giving out loans but it really doesn't matter when those decisions affect others.


But hasn’t it become quite easy to deal with this issue simply by asking for the sources of the information and then validating? I quite like using the consensus app and then asking for specific academic paper references which I can then quickly check. However this has taught me also that academic claims must also be validated…


If you need to validate the sources, you might as well go to the sources directly and bypass the LLM. The whole point of LLMs is not needing to go to the sources. The LLM consumes them for you. If you need to read and understand the sources yourself well enough to tell if the LLM is lying, the LLM is a wasteful middleman.

It's like buying supermarket food and also buying the same food from the farmers themselves.


It's dangerous to assume that the person you have access to is an expert either.


IMO it’s dangerous to call experts experts as well. Possibly more dangerous.


No. Expertise isn’t a synonym for ‘infallible’ it denotes someone whose lived experience, learned knowledge and skill means that you should listen to their opinion in their area of expertise - and defer to it, unless you have direct and evidence-based reasons for thinking they are wrong.


By that definition an expert would be <more> trustworthy. (Usually they want you to look at credentials instead.)

However that still ignores human nature to use that trust for personal gain.

Nothing about expertise makes someone a saint.


They have tried to address it with the help of o1 or o3 model at least to help it understand and reason better than before, but one of the quotes my manager says with regards to these is to trust it but verify it also.


“Believe in God, but tie up your camels”.


LLMs suffer from the "Igon Value Problem" https://rationalwiki.org/wiki/Igon_Value_Problem

Similar to reading a pop sci book, you're getting an entertainment from a thing with no actual understanding of the source material rather than an education.


Earlier in this thread, people mention the counterpoint to this: they Google the information from the LLM and do more reading. It's an excellent starting point for researching a topic: you can't trust everything it says, but if you don't know where to start, it will very likely get you to a good place to start researching.

Similarly, while you can't fully trust everything a journalist says, it's obviously better to have journalism than to have nothing: the "Ikon Value Problem" doesn't mean that journalism should be eradicated. Pre-LLMs, we really had nothing like LLMs in this way.


> they Google the information from the LLM and do more reading

The runway on this one seems to be running out fast - how long before all the google results are also non-expert opinions regurgitated by LLMs?


People are forgetting about the content farms like Associate Content [1]. Since the early aughts, these content farms would happily produce expert-sounding content on anything that people were searching for. They would buy top search terms from search engines like Yahoo, hire English majors for dirt cheap, and have them produce "expert" content targeting those search terms. At least the LLMs have been trained on some relevant data!

[1] https://en.wikipedia.org/wiki/Yahoo_Voices


So with AI Google has cut out the middleman and insourced the content farm.


The way I see it they have been like that for at last a decade. Of course before the transformers revolution these were generated in a more crude way, but still the end result is 99% of Google results for any topic have been trash for me since early 200x.

Google has given up on fighting the SEO crowd long time ago. I worry they give up on the entire idea of search and will just serve answers from their LLM.


You can turn to actual experts, e.g. YouTube or books. But yes, I have recently had the misfortune of working with a personal trainer who was using ChatGPT to come up with training programs, and it felt confusing and like I was wasting time and money.


When I'm looking for actual experts, the first thing that comes to my mind is definitely YouTube!!

And least when it's about YouTube specific topics, like where the like button and the subscribe button is.

They will tell me. Every. Single. F*cking. 5. Minute. Clip. Again. And. Again.

Not soooo much for anything actually important or interesting, though.... ;)

PS: Also which of the always same ~5 shady companies their sponsor is, of course.


Unironically, youtube is a great place to find actual experts on a given subject.


But he explicitly mentions books. That contrast makes it interesting. I assume that he is explicitly fine with text content.

And then he does not mention the web in general (or even Reddit - it wouldn't be worth more than an eyeroll to me), but YouTube.

On the one hand, yeah, well, the web was probably in a better shape in the past. (And YT even is a major aspect of that, imho, but anyways...) On the other hand, you really must be a die hard YT fanatic to only mention that single website (which by the way is mostly video clips, and has all the issues of the entire web), instead of just the web.

It's really well outside of the sphere of my imagination. The root cause of my reply wasn't even disagreement at first, but surprise and confusion.


You've made an error here...

>They will tell me. Every. Single. F*cking. 5. Minute. Clip. Again. And. Again.

Do you know why you got that video. Because people liked and subscribed to them and the 'experts' with the best information in the universe are hidden 5000 videos below with 10 views.

And this is 100% Googles fault for the algorithms they created that force these behaviors on anyone that wants to use their platform and have visibility.

Lastly, if you can't find anything interesting or important on YT, this points at a failure of your own. While there is an ocean of crap, there is more than enough amazing content out there.


Yeah, well, I never said that there aren't any experts in any topic who at some point decided to publish something there. The fact that entire generations of human beings basically look there and at TikTok and Instagram for any topic, probably also helps with this decision. It's still wildly bizarre to me anyways when people don't mention the web in general in such a context, but one particular commercial website, which is a lot about video based attention economy (and rather classic economy via so-called influencers). Nothing of that sounds ideal to me when it comes to learning about actually useful topics from actual experts. Not even the media type. It's hard for them to hyperlink between content, it's hard for me to search, to skip stuff, reread a passage or two, choose my own speed for each section, etc, etc. Sure, you can find it somewhere there. In the same spirit, McD is a salad bar, though... ;)

> And this is 100% Googles fault for the algorithms they created that force these behaviors on anyone that wants to use their platform and have visibility.

Wrong assumptions. It's not their fault, and a lot of it is probably by intent. It's just that they and you are not in the same boat. You are the product at big tech sites. It's 100% (impersonally) your fault to be sooo resistant understanding that. ;)


LLMs are pretty good at attacking the "you don't know what you don't know" problem on a given topic.


You just state this as if it was obviously true, but I don't see how. Why is using LLM like reading a pop sci book and not like reading a history book? Or even less like either, because you have to continually ask questions to get anything?


A history book is written by someone who knows the topic, and then reviewed by more people who also know the topic, and then it's out there where people can read it and criticize it if it's wrong about the topic.

A question asked to an AI is not reviewed by anyone, and it's ephemeral. The AI can answer "yes" today, and "no" tomorrow, so it's not possible to build a consensus on whether it answers specific questions correctly.


A pop sci fi book can be written by someone who knows the topic and reviewed by people who know the topic — and a history book can also not.

LLM generated answers are more comparable to ad-hoc human expert's answers and not to written books. But it's much simpler to statistically evaluate and correct them. That is how we can know that, on average, LLMs are improving and are outperforming human experts on an increasing number of tasks and topics.


In my experience LLM generated answers are more comparable to an ad-hoc answer by a human with no special expertise, moderate google skills, but good bullshitting skills spending a few minutes searching the web, reading what they find and synthesizing it, waiting long enough for the details to get kind of hazy, and then writing up an answer off the top of their head based on that, filling in any missing material by just making something up. They can do this significantly faster than a human undergraduate student might be able to, so if you need someone to do this task very quickly / prolifically this can be beneficial (e.g. this could be effective for generating banter for video game non-player characters, for astroturfing social media, or for cheating on student essays read by an overworked grader). It's not a good way to get expert answers about anything though.

More specifically: I've never gotten an answer from an LLM to a tricky or obscure question about a subject I already know anything about that seemed remotely competent. The answers to basic and obvious questions are sometimes okay, but also sometimes completely wrong (but confidently stated). When asked follow-up questions the LLM will repeatedly directly contradict itself with additional answers each as wrong as the first, all just as confidently stated.


More like "have already skimmed half of the entire Internet in the past", but yeah. That's exactly the mental model IMO one should have with LLMs.

Of course don't forget that "writing up an answer off the top of their head based on that, filling in any missing material by just making something up" is what everyone does all the time, and in particular it's what experts do in their areas of expertise. How often those snap answers and hasty extrapolations turn out correct is, literally, how you measure understanding.

EDIT:

There's some deep irony here, because with LLMs being "all system 1, no system 2", we're trying to give them the same crutches we use on the road to understanding, but have them move the opposite direction. Take "chain of thought" - saying "let's think step by step" and then explicitly going through your reasoning is not understanding - it's the direct opposite of it. Think of a student that solves a math problem step by step - they're not demonstrating understanding or mastery of the subject. On the contrary, they're just demonstrating they can emulate understanding by more mechanistic, procedural means.


Okay, but if you read written work by an expert (e.g. a book published by a reputable academic press or a journal article in a peer-reviewed journal), you get a result whose details were all checked out, and can be relied on to some extent. By looking up in the citation graph you can track down their sources, cross-check claims against other scholars', look up survey sources putting the work in context, think critically about each author's biases, etc., and it's possible to come to some kind of careful analysis of the work's credibility and assess the truth value of claims made. By doing careful search and study it's possible to get to some sense of the scholarly consensus about a topic and some idea of the level of controversy about various details or interpretations.

If instead you are reading the expert's blog post or hastily composed email or chatting with them on an airplane you get a different level of polish and care, but again you can use context to evaluate the source and claims made. Often the result is still "oh yeah this seems pretty insightful" but sometimes "wow, this person shouldn't be speculating outside of their area of expertise because they have no clue about this".

With LLM output, the appropriate assessment (at least in any that I have tried, which is far from exhaustive) is basically always "this is vaguely topical bullshit; you shouldn't trust this at all".


I am just curious about this. You said the word never, and I think your claim can be tested, perhaps you could post a list of five obscure questions for a LLM to answer and then someone could ask that to a good LLM for you, or an expert in that field, to assess the value of the answers.

Edited: I just submitted an ASK HN post about this.


> I've never gotten an answer from an LLM to a tricky or obscure question about a subject I already know anything about that seemed remotely competent.

Certainly not my experience with the current SOTA. Without being more specific, it's hard to discuss. Feel free to name something that can be looked at.


The same is true of Google, no?


> A question asked to an AI is not reviewed by anyone, and it's ephemeral. The AI can answer "yes" today, and "no" tomorrow, so it's not possible to build a consensus on whether it answers specific questions correctly.

It's even more so with humans! Most of our conversations are, and has always been, ephemeral and unverifiable (and there's plenty of people who want to undo the little of permanence and verifiability we still have on the Internet...). Along the dimension of permanence and verifiability, asking an LLM is actually much better than asking a human - there's always a log of the conversation you had with the AI produced and stored somewhere for at least a while (even if only until you clear your temp folder), and if you can get ahold of that log, you can not just verify the answers, you can actually debug the AI. You can rerun the conversation with different parameters, different prompting, perhaps even inspect the inference process itself. You can do that ten times, hundred times, a million times, and won't be asked to come to Hague and explain yourself. Now try that with a human :).


The context of my comment was what is the difference between an AI and a history book. Or going back to the top comment, between an AI and an expert.

If you want to compare AI with ephemeral unverifiable conversations with uninformed people, go ahead. But that doesn't make them sound very valuable. I believe they are more valuable than that for sure, but how much, I'm not sure.


when i tried studying, i got really frustrated because i had to search for so many things and not a lot of people would explain basic math things to me in a simple way.

LLMs do already a lot better job at this. A lot faster, accurate enough and easy to use.

I can now study something alone which i was not able to do before.


> accurate enough

Ask it something non-trivial about a subject you are an expert in and get back to me.


Accurate enough for it to explain to me details of 101, 201 and 301 university courses in math or physics.

Besides, when i ask it about things like SRE, Cloud etc. its a very good starting point.


Sadly I lack expertise. Do you have any concrete examples? How does, say the Wiki entry on the topic compare to your expert opinion.


Oh so you mean I have at my fingertips a tool that can generate me a Scientific American issue on any topic I fancy? That's still some non-negative utility right there :).


A Scientific American issue where the authors have no idea that they don’t know a topic so just completely make up the content, including the sources. At least magazine authors are reading the sources before misunderstanding the content (or asking the authors what the research means).

I don’t even trust the summaries after watching LLMs think we have meetings about my boss’s cat just because I mentioned it once as she sniffed the camera…


Its good to not trust it but that's not the same as it having no idea. There is a lot of value in being close for many tasks!


I think it’s a very dangerous place to be in an area you’re not familiar with. I can read Python code and figure out if it’s what I want or not. I couldn’t read an article about physics and tell you what’s accurate and what’s not.

Legal Eagle has a great video on how ChatGPT was used to present a legal argument, including made up case references! Stuff like this is why I’m wary to rely on it in areas outside of my expertise.


There’s a world of difference between blindly trusting an LLM and using it to generate clues for further research.

You wouldn’t write a legal argument based on what some random stranger told you, would you?


> Oh so you mean I have at my fingertips a tool that can generate me a Scientific American issue on any topic I fancy?

I’m responding to this comment, where I think it’s clear that an LLM can’t event achieve the goal the poster would like.

> You wouldn’t write a legal argument based on what some random stranger told you, would you?

I wouldn’t but a lawyer actually went to court with arguments literally written by a machine without verification.


> I’m responding to this comment, where I think it’s clear that an LLM can’t event achieve the goal the poster would like.

I know it can't - the one thing it's missing is the ability to generate coherent and correct (and not ugly) ___domain-specific illustrations and diagrams to accompany the text. But that's not a big deal, it just means I need to add some txt2img and img2img models, and perhaps some old-school computer vision and image processing algos. They're all there at my fingertips too, the hardest thing about this is finding the right ComfyUI blocks to use and wiring them correctly.

Nothing in the universe says an LLM has to do the whole job zero-shot, end-to-end, in a single interaction.

> I wouldn’t but a lawyer actually went to court with arguments literally written by a machine without verification.

And surely a doctor somewhere tried to heal someone with whatever was on the first WebMD page returned by Google. There are always going to be lazy lawyers doctors doing stupid things; laziness is natural for humans. It's not a valid argument against tools that aren't 100% reliable and idiot-proof; it's an argument for professional licensure.


Your entire argument seems to be “it’s fine if you’re knowledgeable about an area,” which may be true. However, this entire discussion is in response to a comment who is explicitly not knowledgeable in the area they want to read about.

All the examples you give require ___domain knowledge which is the opposite of what OP wants, so I’m not sure what your issue is with what I’m saying.


> Its good to not trust it but that's not the same as it having no idea. There is a lot of value in being close for many tasks!

The task is to replace hazelcast with infinispan in a stand-alone IMDG setup. You're interested in Locks and EntryProcessors.

Ghat GPT 4, o1 tell you with their enthusiastic style Infinispan has all those features.

You test it locally and it does....

But the thing is infinispan doesn't have explicit locks in client-server mode, just in embedded mode, but that's something you find out from another human who has tied doing the same thing.

Are you better off using Chat GPT in this case?

I could go on and on and on, on times Chat GPT has bullshitted me and wasted days of my time, but hey, it helps with one-liners and Copilot occasionally has spectacular method auto-complete and learns on the fly some stuff and it makes my cry when it remembers random tidbits about me that not even family members do


Given I have never heard of any of {hazelcast, infinispan, IMDG, EntryProcessors}, even that kind of wrong would probably be a improvement by virtue of reducing the time I spend working on the wrong answer.

But only "probably" — the very fact that I've not heard of those things means I don't know if there's a potential risk from trying to push this onto a test server.

You do have a test server, and aren't just testing locally, right? Whatever this is?


>. You do have a test server, and aren't just testing locally, right? Whatever this is?

Of course I didn't test in a client-server setup, that's why chat gpt manage to fool me, because I know all those terms, and that was not the only alternative I looked up. Before trying Infinispan I tried Apache Ignite and the api was the same for client-server and embedded mode; in hazelcast the api was the same for client-server and embedded mode, so I just presumed it would be the same for Infinispan AND I had Chat GPT re-assuring me.

The takeaway about Chat GPT for me is -- if there's plenty of examples/knowledge out there, it's ok to trust it, but if you're pushing the envelope, the knowledge is obscure, not many examples, DO NOT TRUST it.

DO NOT assume that just because the information is in the documentation, chat GPT has the knowledge or insight and you can cut corners by asking chat GPT.

And it's not even obscure information -- we've asked Chat GPT about the behavior of PostgreSql batchupserts/locking and it also failed to understand how that works.

Basically, I cannot trust it on anything that's hard -- my 20 years of experience have made me weary of certain topics and whenever those come up, I KNOW that I don't know, I KNOW that that particular topic is tricky, obscure, niche and my output is low confidence, and I need to slow down.

The more you use Chat GPT, the more likely it will screw you over in subtle ways; I remember being very surprised about how could so very subtle bugs arise EXACTLY in the pieces of code I deemed very unlikely to need tests.

I know our interns/younger folks use it for everything and I just hope there's got to be some ways to profit from people mindlessly using it.


> There is a lot of value in being close for many tasks!

horseshoes and hand-grenades?


Yes. Despite this apparently popular saying, "close enough" is sufficient in almost everything in life. Usually it's the best you can get anyway - and this is fine, because on most things, you can also iterate, and then the only thing that matters is that you keep getting closer (fast enough to converge in reasonable time, anyway).

Where "close" does not count, it suggests there's some artificial threshold at play. Some are unavoidable, some might be desirable to push through, but in general, life sucks when you surround yourself or enforce artificial hard cut-offs.


I notice that you've just framed most knowledge creation/discovery as a form of gradient descent.

Which it is, of course.


So they have reached human level intelligence :D


Yes! But now you get a specific pop sci book _in any subject you want to learn about_ and _you can ask the book about comparisons_ (e.g. how were Roman and Parthian legal systems similar?). This at leas gives you a bunch of keywords to go silly in wikipedia and publications (sci-hub! Cough! Sci-hub!)


(throwaway account because of what I'm about to say, but it needs to be said)

While my main use case for LLMs is coding just like most people here, there are lots of areas that are being ignored.

Did you know llama 3.X models have been trained as psychotherapists? It's been invaluable to dump and discuss feelings with it in ways I wouldn't trust any regular person. When real therapists also cost more than what people can afford (and will have you committed if you say the wrong thing), this ends up being a very good option.

And you know how escorts are traditionally known as therapists lite? Yeah, it works in reverse too. The main use case most are sleeping on is, well, emotional porn and erotic role play. Let me explain.

My generation (i.e. Z) doesn't do drugs, we don't drink, we don't go out. Why? Because we can hang on discord, play games, scroll tiktok and goon to our heart's content. 60% of gen Z men are single, 30% women. The loneliness epidemic hit hard along with covid. It's basically a match made in heaven for LLMs that can pretend to love you, like everything about you, ask you about your day, and of course, can sext on a superhuman level. When you're lonely enough, the fact that it's all just simulated doesn't matter one bit.

It's so interesting that the porn industry is usually on the forefront of innovation, adopting blueray and hddvd and whatnot before anyone else, but they're largely asleep on this and so is everyone else who doesn't want to touch of it with a 10ft pole. Well except maybe c.ai to some extent. The business case is there and it's a wide open market that OAI, Anthropic, Google and the rest won't ever stoop down to themselves, so the bar for entry is far lower.

Right now the best experience is known to be heading over to r/locallama by doing it yourself, but there's millions to be made for someone who improves it and figures out a platform to sell it on in the next few years. It can be done well enough with existing properly tuned, open weight, apache licensed LLMs and progress isn't stopping.


While I empathize with the therapeutic effects, wouldn't this create even more powerful echo chambers? May be so many men and women of your generation are single because of already established echo chambers.

It's in our nature to crave outside acceptance of who we are. But may be taken to extreme, when we stop being wanting to be challenged at all we could lose touch with reality, society...


I don't think anyone is saying that porn is healthy or something anyone should consume. Or smoking or whatever, but unhealthy enjoyable things are trillion dollar industries regardless.

The thing is though, LLMs do whatever you tune them to do. If you train them on a sycophantic corporate drone butler dataset, you get the average assistant model that's obviously a bad fit for this use case. If you train them on something else, you get whatever you want, even someone that challenges you. I wouldn't be surprised if having some sort of simulated soulmate partner thing that also does the job of an educator and life guide will be the norm in the future.


> 60% of gen Z men are single, 30% women

I always do a double take when I read such statistics. How can they possibly add up? Are gen Z men considered particularly undesirable leading to lots of relationships with large age gaps? Is there a ridiculously large overhang of gay women (over men)? Is there a huge number of men with multiple partners?

These gender disparities are difficult enough to believe when they come to sexual relations, it gets even harder when talking about relationships.

I guess what I'm saying is: I don't believe those numbers as stated and would be interested in an explanation or at least a source.


I think I recall that being somewhat disputed because the relationship status was self reported, some suggested that men might not consider certain types of relationships as serious but women do, so there's a disparity in reporting what is and isn't an actual relationship and the reality might be more balanced. Sweden statistics, xd.

From what I can find after a brief search, there's this one [0] that claims 63% for men, 34% for women, and [1] there's a a generally known toxicity around dating these days that makes these numbers entirely believable. I don't pretend to have a large enough network of acquaintances to make a good guess, but hardly anyone I know isn't single, and I know maybe two or three religious types that are actually married.

As for gen Z men being especially undesirable, there's well... [2].

[0] https://www.pewresearch.org/short-reads/2023/02/08/for-valen...

[1] https://old.reddit.com/r/GenZ/comments/1eo9bzj/interesting_b...

[2] https://www.ft.com/content/29fd9b5c-2f35-41bf-9d4c-994db4e12...


So are you saying, some gen Z men are in a relationship, but don't know it? I do buy that, it seems to be the basis of some rom-com plots. The clueless guy that doesn't know he's being reeled in.

Other factor.

As the other post suggested. There are large age gaps. Women date older, men date younger. This is also long known. Does it add up to 60/30? That does seem high, but maybe with every other factor thrown in, it explains it?


> So are you saying, some gen Z men are in a relationship, but don't know it?

Or, you know, are leading women on.


Both could be happening. Guess if we are assigning some guilt, then it would depend on self awareness?


I do find these reported numbers hard to believe.

I could certainly invent explanations for them. For example, I can say "no man would date until they've earned enough money to buy a house." This means younger males won't be dating but that doesn't appear to describe the world we live in.

I could say "Every man who dates is dating 2 women" but that also doesn't appear to describe the world we live in.


> It's so interesting that the porn industry is usually on the forefront of innovation, adopting blueray and hddvd and whatnot before anyone else, but they're largely asleep on this

Isn’t that the result of major credit card companies banning certain uses, thus pruning branches from the tree of possible futures ?

What we need is digital central bank money in some form, to get rid of that type of censorship.


I remember seeing an article discussed here on HN a while ago about OnlyFans creators using LLMs to automate the pretend personal relationship with paying fans.

Isn't that exactly what you suggest? A paid one-sided relationship that helps people feel better about themselves, with a bit of naughtiness mixed in.


Ah shit you're right, I forgot about that, yeah they are absolutely on it. I guess it makes more profit for people to believe that they're actually talking to a real person if they can't tell the difference anyway.


This seems to be part of a side plot in Blade Runner 2049.

The movie was about replicants of course, but in the background, the technology shown with the AI being a companion, it was a huge corporate hit, a big seller. In the background you see ad's for it, and they reference it as their most popular product. And, as you allude to, in the movie it was both for loneliness AND sexual. They interacted like a relationship with talking and hooking up.

I don't doubt that with current AI, something similar could be done. We're just missing the holograms.

And as you say, I'm sure the porn industry will catch on.

Kind of crazy how Porn isn't leading this tech wave like past ones. Maybe because people are scared of tracking?


What I liked about this in Blade Runner, was that if replicants are "people" (more the topic of the first movie), then it's not much of a stretch to consider software-AI as people, too. It would have been great if this question had been further explored in the 2md movie instead of just accepted.


I thought it is, with CSAM.


There are other players in this space than c. ai. One of the more interesting (and apparently less cynical) ones is Nomi. Tinkering with personalities on their platform can be quite fascinating.

It is possible, for example, to create a cunning and manipulative schemer that is entirely devoted to mentoring you with no romantic component whatsoever.


How do you know the answers are correct?

More than once I got eloquent answer that are completely wrong.


I give AI a “water cooler chat” level of veracity, which means it’s about as true as chatting with a coworker at a water cooler when that used to happen. Which is to say if I just need to file the information away as a “huh” it’s fine, but if I need to act on it or cite it, I need to do deeper research.


Yes, so often I see/hear people asking "But how can you trust it?!"

I'm asking it a question about social dynamics in the USSR, what's the worst thing that'll happen?! I'll get the wrong impression?

What are people using this for? are you building a nuclear reactor where every mistake is catastrophic?

Almost none of my interactions with LLMs "Matter", they are things I'm curious about, if 10 out of 100 things I learnt from it are false, then I learned 90 new things. And these are things which mostly I'd have no way to learn about otherwise (without spending significant money on books/classes etc.)


I try hard not to pollute my learning with falsehoods. Like I really hate spending time learning bs, not knowing is way better than knowing something wrong.


If you don't care if it's correct or not you can also just make the stuff up. No need to pay for AI to do it for you.


Yes, but how do you know which is which?


That is also a broader epistemological question one could ask about truth on the internet or even truth in general. You have to interrogate reality


That's certainly true, but I think it's also true that you have more contextual information about the trustworthiness of what you're reading when you pick up a book, magazine, or load a website.

As a simple example, LLMs will happily incorporate "facts" learned from marketing material into it's knowledgebase and then regurgitate it as part of a summary on the topic.


How do you address this problem with people? More than once a real live person has told me something that was wrong,


You can divide your approach to asking questions with people (and I do believe this is something people do):

1. You ask someone you can trust for facts and opinions on topics, but you keep in mind that the answer might only be right in 90% of the cases. Also people tend to tell you if the are not sure.

2. For answers you need to rely on you ask people who are legally or professionally responsible if they give you wrong advice: doctors, lawyers, car mechanics, the police etc.

ChatGPT can‘t lose it‘s job if it informs you incorrectly.


If ChatGPT keeps giving you wrong answers wouldn’t this make paying customers leave? Effectively “losing its job”. But I guess you could say it acts more like the person that makes stuff up at work if they don’t know, instead of saying they don’t know.


There was an article here just a few days ago, which discussed how firms can be ineffective, and still remain competitive.

https://danluu.com/nothing-works/

The idea that competition is effective, is often in spherical cow territory.

There’s tons of real world conditions which can easily let a firm be terrible at their core competency, and still survive.


> But I guess you could say it acts more like the person that makes stuff up at work if they don’t know, instead of saying they don’t know.

I have had language models tell me it doesn't know. Usually when using a RAG-based system like Perplexity, but they can say they don't know when prompted properly.


I've seen Perplexity misrepresent search results and also interpret them differently depending on whether GPT4o or Claude Sonnett 3.5 are being used.


I'm not sure about your local laws, but at least in Lithuania it's completely legal to give a wrong advice (by accident, of course)... Even a notary specialist would at most get to pay a larger insurance payment for a while, because human errors falls under professional insurance.


You are contradicting yourself. If the notary specialist needs insurance then there's a legal liability they are insuring against.

If you had written "notaries don't even get insurance because giving bad advice is not something you can be sued for" you would be consistent.


Experience. If I recognize they give unreliable answers on a specific topic I don’t question them anymore on that topic.

If they lie on purpose I don’t ask them anything anymore.

The real experts give reliable answers, LLMs don’t.

The same question can yield different results.


So LLMs are unreliable experts, okay. They're still useful if you understand their particular flavor of unreliability (basically, they're way too enthusiastic) - but more importantly, I bet you have exactly zero human experts on speed dial.

Most people don't even know any experts personally, much less have one they could call for help on demand. Meanwhile, the unreliable, occasionally tripping pseudo-experts named GPT-4 and Claude are equally unreliably-expert in every ___domain of interest known to humanity, and don't mind me shoving a random 100-pages long PDF in their face in the middle of the night - they'll still happily answer within seconds, and the whole session costs me fractions of a cent, so I can ask for a second, and third, and tenth opinion, and then a meta-opinion, and then compare&contrast with search results, and they don't mind that either.

There's lots to LLMs that more than compensates for their inherent unreliability.


> Most people don't even know any experts personally, much less have one they could call for help on demand.

Most people can read original sources.


Which sources? How do I know I can trust the sources that I found?


They can, but they usually don't, unless forced to.

(Incidentally, not that different from LLMs, once again.)


How do you even know what original sources to read?


There's something called bibliography at the end of every serious books.


I am recalling CGP Grey's descent into madness due to actually following such trails through historical archives: https://www.youtube.com/watch?v=qEV9qoup2mQ

Kurzgesagt had something along the same lines: https://www.youtube.com/watch?v=bgo7rm5Maqg


And yet here you are making an unsourced claim. Should I trust your assertion of “most”?


It's not that black and white. I know of no single person who is correct all the time. And if I would know such person, i still would not be sure, since he would outsmart me.

I trust some LLMs more than most people because their BS rate is much much lower than most people I know.

For my work, that is easy to verify. Just try out the code, try out the tool or read more about the scientific topic. Ask more questions around it if needed. In the end it all just works and that's an amazing accomplishment. There's no way back.


In my experience hesitating to answer questions because of the complexity of involved material is a strong indicator of genuine expertise linked with conscientiousness. Careless bullshitters like LLMs don't exhibit this behavior.


I can draw on my past experience of interacting with the person to assign a probability to their answer being correct. Every single person in the world does this in every single human interaction they partake in, usually subconsciously.

I can't do this with an LLM because it does not have identity and may make random mistakes.

LLMs also lack the ability to say "I don't know", which my fellow humans have.


It’s trivial to address this.

You ask an actual expert.

I don’t treat any water cooler conversation as accurate. It’s for fun and socializing.


Asking an expert is only trivial if you have access to an expert to ask!


And can judge which one is an expert and which one is bullshiting for the consultancy fee.


And as we've seen in last few years, large chunks of population do not trust experts.

Think this thread has gone from "how to Trust AI", to "how do we Trust Anything".


This is a true statement.

This is also not related to the problem being trivialized in the presented solution.

Lack of access to experts, doesn’t improve the quality of water cooler conversations.


Well if you’re a sensible person, you stop treating them as subject matter expert


and people just don't know what they don't know - they just answer sillyness the same way


All you have to do is just remember you’re asking your uncle bob, a man of extensive usually not too inaccurate knowledge.

There’s no reason a source has to be authoritative, just because it’s a computer.

It is a bit of an adjustment, though. We are used to our machines being accurate, or failing loudly.

But, looks like the future is opinionated machines.


so do teachers and books, in the future we need have multiple variants to cross check


Cross check against what? AI generated texts will flood the internet and burry the real knowledge just like SEO did before. But this time the fake knowledge will be less obvious and harder to check.


If that turns out to be true, the it looks like AI just gave universities a new reason for being.

What a shift from twenty years ago when optimism over “information superhighways” on the “world wide web” would end knowledge gatekeeping and educate the masses, to now— worries of AI slop and finely tuned ML algorithms frying older and younger generations’ brains, while information of human value gets buried, siloed, and paywalled, with no way to verify anything at all.


models from different vendors,plus google search. for serious stuff, we'll still have to check manually ourselves


You enable the search functionality.


There's something here that I feel is pretty deep, though offensive for some minds: What is the actual consequence of being wrong? Of not getting right the base reality of a situation?

Usually, stasis is the enemy that is much great than false information. If people with 90% truth can take a step forward in the world, even if they mistakenly think they have 100% truth, what does it matter? They're learning more and acting more for that step taken. If the mistaken ground truth is false and importantly enough false, they'll learn it bc their experience is grounded in the reality the navigate anyhow. If they don't learn it, it's of no consequence.

This is on my mind because I work in democratic reform, and I am acutely aware (from books like "Democracy for Realists", that eviscerate common assumptions about "how democracy works") that it often doesn't matter if we understand how democracy is working, so long as we feel like we do, enough to take steps forward and keep trying and learning. We literally don't even know how democracy works, and yet we've been living under it for centuries, to decent enough ends.

I think often about the research of Donald Hoffman. His lab runs evolutionary simulations, putting "creatures" that see "reality" (of the simulation) against creatures that see only "fitness" (the abstraction, but also the lie, that is more about seeing what gets the creature living to the next click of the engine, whether that's truth or falsehood about the reality). https://www.youtube.com/watch?v=oYp5XuGYqqY

Basically, creatures that see only fitness (that see only the lie), they drive to extinction every creature that insists on seeing "reality as it is".

I take this to mean truth is in no way, shape, or form favoured in the universe. This is just a convinient lie we tell ourselves, to motivate our current cultural work and preferences.

So tl;dr -- better to move forward and feel high agency with imperfect information, than to wait for a full truthful solution that might never come, or might be such high cost as to arrive too late. Those moving forward rapidly with imperfect information will perhaps drive to extinction those methods that insist on full grounding in reality.

Maybe this is always the way the world has worked... I mean, does any mammal before us have any idea how any of reality worked? No, they just used their senses to detect the gist of reality (often heuristics and lies), and operated in the world as such. Maybe the human sphere of language and thought will settle on similar ruthlessness.


Incorrect information by itself is at best useless. Incorrect information that is thought to be correct is outright dangerous. Objective truth is crucial to science and progress.

We've come too far since the age of enlightenment to just give it all up.


The hundred year functioning of democracy begs to differ. It literally works nothing like how anyone tells themselves it does, not just laypeople, but arguably even political scientists. It's quite possible that no echelon of society has had the correct story so far, and yet... (again, see "Democracy for Realists")

Also, the vision heuristics that brains use to help us monitor motion as another obvious example. They lie. They work. They won.

https://x.com/foone/status/1014267515696922624?s=46

> Objective truth is crucial to science

Agreed. We define science and science is truth about base reality.

> Objective truth is crucial to [...] progress.

More contentious imho. Depends if progress is some abstract human ideal that we pursue, or simply "survival". If it's the former, maybe objective truth is required. If it's the latter, I find the simulation evidence to be that over-adherence to objective truth (at least information-theoretically) is in fact detrimental to our survival.


> “My father once told me that respect for truth comes close to being the basis for all morality. 'Something cannot emerge from nothing,' he said. This is profound thinking if you understand how unstable 'the truth' can be.”

Frank Herbert, Dune


Yes! There’s no ‘element’ of truth. Funnily enough, this isn’t a philosophical question for me either.

The industrialization of content generation, misinformation, and inauthentic behavior are very problematic.

I’ve hit on an analogy that’s proving very resilient at framing the crossroads we seem to be at - namely the move to fiat money from the gold standard.

The gold standard is easy to understand, and fiat money honestly seems like madness.

This is really similar to what we seem to be doing with genAI, as it vastly outstrips humanity’s capacity to verify.

There’s a few studies out there that show that people have different modes of content consumption. A large chunk of content consumption is for casual purposes, and without any desire to get mired into questions of accuracy. About 10% of the time (some small %, I don’t remember the exact) people care about the content being accurate.


The ability to "talk to an expert" on any topic would indeed have been very useful. Sadly, we have the ability to talk to something which tries very very hard to appear as an expert despite knowing nothing about the subject. A human who knows some things pretty well but will talk about stuff they don't know with the same certainty and authority as they walk about stuff they know is a worthless conversation partner. In my experience,"AI" is that but significantly worse.


Semi-related but I find that sometime it just completely ruined a type of conversation.

Like as in your example, I would previously asked people "how would 911 handle an US Reservation Area", and watch how my friends think and reason. To me getting a conclusive answer was not a point. Now they just copy & paste Chat GPT, no fun haha.


That's just the 2020s version of how Google and smartphones ruined the ages-old social pastime of arguing about trivia in a pub :P


Yeah it can definitely be a crutch too in some situations. I notice it with my kids where they’ll want to tell me about something but then seek a video or something to show it.

Sometimes I have to say “no! just use your words to describe it! I want to hear your description”


I think it's good of you to make them critically engage with the subject by verbalizing it themselves. Evidence suggests that video consumption is relatively un-engaging mentally, likely as it demands nothing of you.


For me the problem is that you always need to double-check this particular type of expert, as it can be confidently wrong about pretty much any topic.

It's useful as a starting point, not as a definitive expert answer.


What human experts do you blindly trust without double checking?


Most human experts, when asked about their area of expertise, don't parrot what some guy said as joke on Reddit five years ago.

Most lawyers, when you ask them to write a brief, will cite only real cases.


I coined the term "fancy cruise control" on reddit, as a joke, to describe Autopilot. One of the mods of the self-driving car sub thought the term was so funny he made a joke subreddit for it. A few years later Tesla lawyers invoked the term in court to downplay the capabilities of autopilot in court.


"Most" is the key word here. In my experience that's also the case for LLMs.


LLM proponents really have succeeded in moving the overton window on this discussion. "Sure, you cannot trust LLMs, but you cannot trust humans, either".


I don’t think “Overton window” works in that construction. It typically refers to the range of politically acceptable opinions.

LLMs are too new to have such a thing. It sounds like you’re an “LLM opponent” (whatever that means) who believes the appropriate standard is infallibility? I don’t even get that line of thinking, but you’re welcome to it. But let’s not pretend this is a decades-long topic with a social consensus that people try to influence.


I didn't mean overton window in a political sense (not a English native speaker). It's more about moving the goal post maybe.

> I don’t even get that line of thinking, but you’re welcome to it

I would not say "LLM oponent". Rather "LLM critic". I'm not against LLMs as a technology. I'm worried about how the technology is deployed and used, and what the consequences are. Specifically, copyright issues, power use issues, inherent biases in the traning data that strengthen existing discrimation against minorities, raciscm and sexism. I'm not convinced by the hype created by LLM proponents (mostly investors and other companies and people who financially benefit from LLMs). I'm not saying that machine learning doesn't bring any value or does not have use cases. I'm talking more about the recent AI/LLM hype.


Most of them. Are you constantly doing validation studies for every piece of information you take in? If the independent experts tell me that a new car is safe to drive, then I trust them.


> The ability to “talk to an expert” about any topic I’m curious about and ask very specific questions has been invaluable to me.

Even the ability to talk to a university work placement student/intern in any topic is very useful, never mind true experts.

Even Google's indexing and Wikipedia opened up a huge quantity of low-hanging fruit for knowledge sharing; Even to the extent that LLMs must be treated with caution because the default mode is over-confident, and even to the extent one can call them a "blurry JPEG of the internet", LLMs likewise make available a lot of low-hanging fruit before we get to an AI that reasons more like we do from limited examples.


Libraries and books were pretty cool too though. You could go to a library and find information on anything and a librarian would help you. Not super efficient but good for humans.


Talk to an expert? You are aware of them hallucinating right?


I've been "talking" quite a bit with Ollama models, they're often confidently wrong about Wikipedia level stuff and even if the system prompt is explicitly constrained in this regard. Usually I get Wikipedia as understood by a twelve year old with the self-confidence of adult Peter Thiel. If it isn't factually wrong, it's often subtly wrong in the way that a cursory glance at some web search results is unlikely to rectify.

It takes more time for me to verify the stuff they output than grabbing a book off Anna's Archive or my payed collections and looking something up immediately. I'd rather spend that time making notes than waiting for the LLM to respond and double checking it.


> For insanely curious people who often feel unsatisfied with the answers given by those around them, it’s the greatest thing ever.

As an insanely curious person who's often unsatisfied with the answers given by those around me, I can't agree. The greatest thing ever is libraries. I don't want to outsource my thinking to a computer any more than I want to outsource it to the people around me.


In the not so distant past we already had a tool that allowed us to look up any question that came into our minds.

It was super fast and always provided you with sources. It never hallucinated. It was completely free except for some advertisement. You could build a whole career out of being good at using it.

It was a search engine. Young people might not remember but there was a time when Google wasn't shite but actually magic.


> we already had a tool that allowed us to look up any question that came into our minds … It never hallucinated. … It was a search engine.

Except for all the times the search results were wrong answers.

https://searchengineland.com/when-google-gets-it-wrong-direc...


Being biased is not the same as hallucinating. LLMs have both problems.

At least you could check whether a source was reputable and where the bias was. With LLM's the connection between the answer and the source is completely lost. You can't even tell why it answered a certain way.


> Being biased is not the same as hallucinating. LLMs have both problems.

I didn't deny either of those things, I said that search engines also hallucinate — my actual link gave several examples, including "King of the United States" -> "Barack Obama".

Just because it showed the link to breitbart doesn't mean it was not hallucinating.

> At least you could check whether a source was reputable and where the bias was.

The former does not imply the latter. You could tell where a search engine got an answer from, but not which answers were hidden — an argument that I saw some on the American right make to criticise Google for failing to show their version of events.

> With LLM's the connection between the answer and the source is completely lost. You can't even tell why it answered a certain way.

Also not so. The free version of ChatGPT supports search directly, so it allows you to have references.


> I said that search engines also hallucinate — my actual link gave several examples

They don't. Google added a weird widget that do hallucinate. But the result list is still accurate, even though it may be biased towards certain sources.

> You could tell where a search engine got an answer from, but not which answers were hidden

A bit pedantic, but a search engine returns a list of results according to the query you posted. There's no question-answer oracle. If you type "King of the United States", you will get pages that have the terms listed. Maybe there will be semantic manipulations like "King -> Head of state -> President", but generally it's on you to post the correct keywords.


One of my favorite successes was getting an LLM to write me a program to graph how I subjectively feel the heat of steam coming off of the noodles I'm pouring the water out from as a function of the ambient temperature.

I was wondering which effects were at play and the graph matched my subjective experience well.


I mostly feel sorry for grandpa, he'll receive much less of these questions, if any. This is partially because I expect to become this grandpa and already suspect that some people aren't asking me questions they would be, if they had no access to chatgpt.


> And most of those resulted in google searches to verify the information. But I literally could never do this before.

Could you elaborate on this? What happened before when you had that type of questions? What was stopping you from tamping "911 emergency indian reservation" into google and learning that the "Prairie Band Potawatomi Nation" has their own 911 dispatch?

In my youth, before the internet was everywhere, we were taught that we could always ask the nearest librarian and that they would help us find some useful information. The information was all there, in books, the challenge was to know which books to read. As I got older, and Google started to become more available, we were taught how to filter out bad information. The challenge shifted from finding information into how not to find misinformation.

When I hear what you say here, I'm reminded of that shift. There doesn't seem to be any fundamental change there, expect may that it makes it harder not to find misinformation by obscuring the source of the information, which I was taught was an important indicator of its legitimacy.


The change is that when I am immersed in that scenario (on holiday and without any normal life distractions so I can truly learn about this topic), then my mind is the most curious about that topic.

The alternative is that when I return from vacation, I get back to the busy life and am only reminded of these questions in casual conversations.


How do you know the AI didn't hallucinate the answers? For topics like these, where there is little information available, the probability of hallucination is very high.


The amount of value creation is off the scale. It's like when people started using Google, or Google maps.


At this point I think even the most bearish have to concede that LLM's are an amazing tool. But OpenAI was never supposed to be about creating tools. They're supposed to create something that can completely take over entire projects for you, not just something that can help you work on the projects faster. If they can't pull that off in the next year or two, they're gonna seriously struggle to raise the next 10B they'll need to keep the lights on.

Of course LLMs aren't going anywhere, but I do not envy Sam Altman right now.


At this point it’s quite likely that they could pivot and just be the chatgpt company. I’ve found chatgpt-4o with web search and plugins to be more useful than o1 for most tasks.

It’s possible we’re nearing the end of the LLM race, but I doubt that’s the end of the AI story this decade, or OpenAI.


Ya I think they probably will, but "the chatgpt company" is not worth 157B. It might not even be worth 1B.


Id be hard pressed to come up with a valuation under 30B based on the publicly known finances. OpenAI is certainly crushing the metrics of other highly valued startups like snowflake and databricks.

The cash burn and claim of imminent agi is where the valuation trouble could be.


We’ve barely seen the first wave of companies being built of their APIs too. The billions being put in thousands of startups will take around 5yrs to hit full scale.


It has replaced ~50% of my Google searches.


Yes but it also hasn't been attacked by ads yet. Google doesn't suck for lack of search results, it sucks because of ads.

Imagine asking chatgpt to tell you about slopes in Colorado, and the first five answers are about how awesome North Face is and how you can order from them. You probably wouldn't use it as much.


Local models are GOOD as well, and easy to use (ollama + open web ui). OpenAI has to perform a huge trick in order to stay relevant.


Does ChatGPT need ads, I feel as though people are willing to pay for the service much more than people are willing to pay for a Google search.


Yes but remember we’re in a tech bubble and the average person still doesn’t know what ChatGPT is.


Not worth 1B ? Come on man. I see them improving the tool enough for most people willing to pay 50$ a month for a subscription. And for most companies to be willing to pay 300$ per employee. It's perhaps not there yet but I'm sure they'll reach this amount of value for their offering. It remains to be seen what competition will do to the prices though.


The market of people willing to pay $50 a month for OAI vs $0/month for one of the open source LLAMA variants is not large enough to justify their current valuation, imo


I'm not that familiar with the open source ones - how good are they in comparison?


It doesn't really matter how much people are willing to pay. It matters how much margin the market will allow you to charge. OpenAI may be a bit better than most competitors most of the time (IMO they keep getting leap-frogged by Anthropic et al. though), but if your customers can get 90% of the value for 50% less, they will bail. There is no moat. Margins will be razor thin. That's not a 1B+ company.


I think the difference between 90% and 95% is huge. As a coder, if the LLM is wrong 10% of the time that's pretty bad, I can't really trust it. If it's wrong 5% of the time, still not great but much better - I'd pay much more for that kind of reliability improvement.


Depends on the competition of course. They need an edge to stop me going to the bald guy and running it there.


I keep thinking about that Idris Elba Microsoft ad about AI about how much AI can help my business, and how both true and untrue that as is, and how much distance there is between the now and the possible promised of AI, and I imagine this is what keeps Altman up at night.


Tesla is still valued high, despite FSD did not came, despite being promised. So OpenAI would get away with delivering ChatGPT5, if it is better than the competition.


Tesla is profitable and they have a big technological moat. OpenAI is in a very competitive industry and they burn ~5B a year.


I believe the car industry is somewhat competive as well and they needed allmost 10 years to become profitable.


Sure, but if you want to compete with Tesla, you need many billions in funding and 10+ years to catch up. If you want to compete with OpenAI, you need maybe half a billion (easy to raise in the current climate; many have done so) and mb a few months to catch up.


I've bee thinking the same thing lately. Even if we don't get to AGI, LLMs have revolutionized the way I work. I can produce code and copy at superhuman speeds now. I love it. Honestly, if we never get to AGI and just have the LLMs, it's probably the best possible outcome as I don't think true AGI is going to be a good thing for humanity.


Thats all fine, but I think you are missing the bigger picture. It's not about whether what we already got out of this is good. Of course it is. But this is about where it's going.

Until about 120 years ago, people were happy with horses and horse carriages. Such a great help! Travel long distances, pull weights, I never want to go back! But then the automobile was invented and within a few years little travel was done by horses anymore.

More recently, everybody had a landline phone at home. Such great tech! Talk to grandma hundreds of miles away! I never want to go back! Then suddenly the mobile phone and just shortly after the smart phone came along and now nobody has a landline anymore but everybody can record tiktoks anywhere anytime and share them with the world within seconds.

Now imagine "AI". Sure, we have some new tools right now. Sure we don't want to go back. But imagine the transformative effects that could come if the train didn't stop here. Question is just: will it?


Amen. Everyone is talking about plateaus and diminishing returns on training but I don’t care one bit. I get that this is a startup focused forum and the financial sustainability of the market players is important but I can’t wait to see what the next decade of UX improvements will be like even if model improvements slow to a crawl.


As a consumer you should always evaluate the product that is in front of you, not the one they promise in 6 months. If what's there is valuable to you, then that's great.

When we discuss the potential AGI we're not talking as consumers, we're talking about the business side. If AGI is not reached, you'll see an absolutely enormous market correction, as it realizes that the product is not going to replace any human workers.

The current generation of products are not profitable. They're investments towards that AGI dream. If that dream doesn't happen, then the current generation of stuff will disappear too, as it becomes impossible to provide at a cost you'd be comfortable with.


Human workers workers have already been replaced.


This is me. If things never improve and Sonnet 3.6 is the best we have…I’m fine. Its good enough to drastically improve productivity


completely local AI research tool, based on Ollama

Could you elaborate? Was it easy to install?



Not OP, but yeah, ollama is super easy to install.

I just installed the Docker version and created a little wrapper script which starts and stops the container. Installing different models is trivial.

I think I already had CUDA set up, not sure if that made a difference. But it's quick and easy. Set it up, fuck around for an hour or so while you get things working, then you've got your own local LLM you can spin up whenever you want.


Does ollama still execute whatever arbitrary python code is in the model?


vscode + cline extension + gemini2.0 is pretty awesome. Highly recommend checking out cline. it quickly became one of my favorite coding tools.


Gemini 2.0 isn't particularly great at coding. The Gemini 1206 preview that was released just before 2.0 is quite good, though. Still, it hasn't taken the crown from Claude 3.5 Sonnet (which appears to now be tied with o1). Very much agree about Cline + VSCode, BTW. My preferred models with Cline are 3.5 Sonnet and 3.5 Haiku. I can throw the more complex problems at Sonnet and use Haiku for everything else.

https://aider.chat/docs/leaderboards/edit.html


In the wake of the o1 release, and with the old aider benchmark saturating, Paul from aider has created a new, much harder benchmark. o1 dominates by a substantial margin.

https://aider.chat/docs/leaderboards/ https://aider.chat/2024/12/21/polyglot.html


the context limits on google are nuts! Being able to pump 2 million tokens in and having it cost $0 is pretty crazy rn. Cline makes it seamless to switch between APIs and isnt trying to shoehorn their SAAS AI into a custom vscode (looking at you cursor)


>the context limits on google are nuts! Being able to pump 2 million tokens in and having it cost $0 is pretty crazy rn.

What's the catch though? I was looking at Gemini recently and it seemed too good to be true.


Your code becomes training data[0]:

> When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.

[0] https://ai.google.dev/gemini-api/terms


Google inference is a lot cheaper since they have their own hardware so they don't have to pay licensing to NVIDIA, thus their free tier can give you much more than others.

Other than that the catch is like all other free tiers, it is marketing and can be withdrawn at any moment to get you to pay after you are used to their product.


I will check it out. The number of new tools is staggering.

I enjoy image and video generation and I have a 4090 and ComfyUI; I can't keep up with everything coming out anymore.


If you're interested in the latest tools for coding, join this subreddit and you'll always be on top of it:

https://www.reddit.com/r/ChatGPTCoding/

There are a lot of tools, but only a small pool of tools that are worth checking out. Cline, Continue, Windsurf, CoPilot, Cursor, and Aider are the ones that come to mind.


"ChatGPT" Coding... is it impartial? the name sorta sounds biased.


ChatGPT was the first to come along, so the subreddit was given a perhaps short-sighted name. It's now about coding with LLMs in general.


If you're a offline kind of guy, try LM Studio + Cline :)

/not affiliated with cline, just a happy user


Curious about the AI research tool you mentioned, would you mind sharing it? Been trying to get a good local research setup with Ollama but still figuring out what works best.



Not OP, but based on their mention of Ollama, I can tell you that it has built in search tools, all you need to do is supply an API to one of the tools, or even run one of the search tools locally using docker.


I have the opposite reaction.

AI right now feels like that MBA person at work.

They don’t know anything.

But because they sound like they are speaking with authority & confidence, allows them to get promoted at work.

(While all of the experts at work roll their eyes because they know the MBA/AI is just spitting out nonsense & wish the company never had any MBA/AI people)


And the MBA person (at my company this is everyone in middle management) is also the person who go around and suggest we shoehorn AI into everything...


I'm pretty sure the plan has never been to just make these tools that make us more efficient. If AI stays at the level it's at, it would be a profound failure for companies like OpenAI. We're all benefiting from the capital being poured into these technologies now. The enshittification will come. The enshittification always comes.


I’m paying $240 a year to Anthropic that I wasn’t paying before and it’s worth it. While I don’t use Claude every single day, but I use it several times a day when I’m working. More times than the free tier allows.


Why do people say this like it's a refutation? Current valuation and investments were not based on getting a very small group of nerds (affectionately) on HN to pay $250/yr which probably doesn't cover even inference costs for the models let alone training and R&D


> Just today I used a completely local AI research tool, based on Ollama. It worked great.

Is it on github?


Can you walk me through the steps you've taken to set up the Ollama-based tool so far?


Cline was fixing my type errors and unit tests while I was doing my V60 pourover.


If the progress in capabilities stall, the product fit, adoption, ease of use are the next battlefield.

OpenAI may be first to realize and switch, so they still have a chance to recoup some of those billions


i feel like AGI is an arbitrary line in the sand anyway

i think as humans we put too much emphasis on what intelligence means relative to ourselves, instead of relative to nature


> Just today I used a completely local AI research tool, based on Ollama. It worked great

What’s it called? Could you post a link please?

Thank you



At this point, most conceivable beneficial use cases for LLMs have been covered. If the economics of AI tech were aligned with making a good product that people want and/or need, we'd basically take everything we have at this point and make it lighter, smaller, and faster. I doubt that's what will happen.


The definition of agi is a linguistic problem but people confuse it for a philosophical problem. Think about it. The term is basically just a classification and what features and qualities fit the classification is an arbitrary and linguistic choice.

The debate stems from a delusion and failure to realize that people are simply picking and choosing different fringe features on what qualifies as agi. Additionally the term exists in a fuzzy state inside our minds as well. It’s not that the concept is profound. It’s that some of the features that define the classification of the term we aren’t sure about. But this doesn’t matter because we are basically just unsure about the definition of a term that we completely made up arbitrarily.

For example the definition of consciousness seems like a profound debate but it’s not. The word consciousness is a human invention and the definition is vague because we choose the definition to be ill defined, vague and controversial.

Much of the debate on this stuff is purely as I stated just a language issue.


If it's genuinely what you say, then how is what is going on not slavery?

I don't believe AGI is possible but if it was and it was as subjective as you say what is and isn't conscious, then it starts to take on an even more altogether evil character.

Akin to cloning slave humans or something for free cheap labor.


how does a linguistic and language issue relate to slavery. It's the definition of a word. That's all.

Slavery is also a word. Don’t you find it strange that your entire moral framework is constructed on top of the basis of arbitrary definitions of vocabulary? Make what you think is right or wrong based not off of language. Language is a delusion that masquerades as something with actual meaning when it is just an invention, a tool, to facilitate communication.

Right now your concept of right and wrong is a vocabulary issue. Does this make sense? No.


Local search with Ollama? Please share!


They’re garbage, they will always be garbage. Changing a 4 to a 5 will not make it not garbage.

The whole sector is a hype bubble artificially inflating stock prices.



If that’s supposed to be impressive, it really isn’t.


What was the AI search tool?


how do you interact with perplexity? mobile app?


Let's revisit this comment in one year – after the explosion of agentic systems. (:


You mean, the explosion of human centipede LLM prompts shitting into eachother?

Yes that will be a sight to behold.


We already have agentic systems; they're not particularly impressive (1).

There's no specific reason to expect them to get better.

Things that will shift the status quo are: MCST-LLMs (like with ARC-AGI) and Much Bigger LLMs (like GPT-5, if they ever turn up) or some completely novel architecture.

[1] - It's provable; if just chaining LLMs are a particular size into agentic systems could scale indefinitely, then you could use a 1-param LLM and get AGI. You can't. QED. Chaining LLMs with agentic systems has a capped maximum level of function which we basically already see with the current LLMs.

ie. Adding 'agentic' to your system has a finite, probably already reached, upper bound of value.


> It's provable; if just chaining LLMs are a particular size into agentic systems could scale indefinitely, then you could use a 1-param LLM and get AGI. You can't. QED.

Perhaps I missunderstand your reply, but that has not been my experience at all.

There are 3 types of "agentic" behaviour that has worked for a while for me, and I don't know how else it would work without "agents":

1. Task decomposition - this was my manual flow since pre-chatgpt models: a) provide an overview of topic x with chapter names; b) expand on chapter 1 ... n ; c) make a summary of each chapter; d) make an introduction based on the summaries. I now have an "agent" that does that w/ minimal scripting and no "libraries". Just pure python control loop.

This gets me pretty reasonable documents for my daily needs.

2. tool use (search, db queries, API hits). I don't know how you'd use an LLM without this functionality. And chaining them into flows absolutely works.

3. coding. I use the following "flow" -> input a paragraph or 2 about what I want, send that + some embedding-based context from the codebase to an LLM (3.5 or 4o, recently o1 or gemini) -> get code -> run code -> /terminal if error -> paste results -> re-iterate if needed. This flow really works today, especially with 3.5. In my testing it needs somewhere under 3 "iterations" to "get" what's needed in more than 80% of the cases. I intervene in the rest of 20%.


A zed user? Live that editor and the dev flow with it.


Haha, yes! I'm trying it out and been loving it so far. I found that I go there for most of my eda scripts these days. I do a lot of datasets collection and exploration, and it's amazing that I can now type one paragraph and get pretty much what it would have taken me ~30 min to code myself. Claude 3.5 is great for most exploration tasks, and the flow of "this doesn't work /terminal" + claude using prints to debug is really starting to come together.

I use zed for this, cursor for my more involved sessions and aider + vscode + continue for local stuff when I want to see how far along local models have come. Haven't tried cline yet, but heard great stuff.


I didn’t say they don’t work, I said there is an upper bound on the function they provide.

If a discrete system can be composed of multiple LLMs the upper bound on the function they provide is by the function of the LLM, not the number of agents.

Ie. We have agentic systems.

Saying “wait till you see those agentic systems!” is like saying “wait til you see those c++ programs!”

Yes. I see them. Mmm. Ok. I don’t think I’m going to be surprised by seeing them doing exactly the same things in a year.

The impressive part in a year will the non agentic part of things.

Ie. Explicitly; if the underlying LLMs dont get any better, there is no reason to expect the system built out of them to get any better.

If that was untrue, you would expect to be able to build agentic systems out of much smaller LLMs, but that overwhelmingly doesn’t work.


> if the underlying LLMs dont get any better, there is no reason to expect the system built out of them to get any better.

Actually o1, o3 are doing exactly this, and very well. I.e. explicitly: by proper orchestration the same LLM can do much better job. There is a price, but...

> you would expect to be able to build agentic systems out of much smaller LLMs

Good point, it should be possible to do it on a high-end pc or even embedded.


> but that overwhelmingly doesn’t work.

MCTS will be the next big “thing”; not agents.


They are not mutually exclusive. Likely we'll get more clear separation of architecture and underlying technology. In this case agents (i.e. architecture) can use different technologies or mix of them. Including 'AI' and algorithms. The trick is to make them work together.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: