Hacker News new | past | comments | ask | show | jobs | submit | tkellogg's comments login

The R1 paper did it as well. Agreed, it's always very interesting.


it's just an example, but it's great to see smolagents in practice. I wonder how well the import whitelist approach works for code interpreter security.


I know some of the point of this is running things locally, but for agent workflows like this some of this seems like a solved problem: just run it on a throwaway VM. There's lots of ways to do that quickly.


VM is not the right abstraction because of performance and resource requirements. VMs are used because nothing exists that provides same or better isolation. Using a throwaway VM for each AI agent would be highly inefficient (think wasted compute and other resources, which is the opposite of what DeepSeek exemplified).


To which performance and resource requirements are you referring? A cloud VM runs as long as the agent runs, then stops running.


I mean performance overheads of an OS process running in a VM to (vs no VM) and additional resource requirements for running a VM, including memory and additional kernel. You can pull relevant numbers from academic papers.


A linear bar graph comparing compute/memory requirements?

  - OS process
  - virtual machine
  - LLM inference
Could have longevity as PC master race meme template.


OK. Thanks for clarifying. I think you're pretty wrong on this one, for what it's worth.


Is “DeepSeek” going to be the new trendy way to say to not be wasteful? I don’t think DS is a good example here. Mostly because it’s a trendy thing, and the company still has $1B in capex spend to get there.

Firecracker has changed the nature of “VMs” into something cheap and easy to spin up and throw away while maintaining isolation. There’s no reason not to use it (besides complexity, I guess).

Besides, the entire rest of this is a python notebook. With headless browsers. Using LLMs. This is entirely setting silicon on fire. The overhead from a VM the least of the compute efficiency problems. Just hit a quick cloud API and run your python or browser automation in isolation and move on.


I think you are assuming that inference happens on the same machine/VM that executes code generated by an AI agent.


I'm not even talking about Firecracker; for the duration of time things like these run, you could get a satisfactory UX with basic EC2.


The rise of captchas on regular content, no longer just for posting content, could ruin this. Cloudflare and other companies have set things up to go through a few hand selected scrapers and only they will be able to offer AI browsing and research services.


I think the opposite problem is going to occur with captchas for whatever it's worth: LLMs are going to obsolete them. It's an arms race where the defender has a huge constraint the attacker doesn't (pissing off real users); in that way, it's kind of like the opposite dynamics that password hashes exploit.


I’m not sure about that. There’s a lot of runway left for obstacles that are easy for humans and hard/impossible for AI, such as direct manipulation puzzles. (AI models have latency that would be impossible to mask.) On the other hand, a11y needs do limit what can be lawfully deployed…


There’s a lot of runway left for obstacles that are easy for humans and hard/impossible for AI, such as direct manipulation puzzles.

That's irrelevant. Humans totally hate CAPTCHAs and they are an accessibility and cultural nightmare. Just forget about them. Forget about making better ones, forget about what AI can and can't do. We moved on from CAPTCHAs for all those reasons. Everyone else needs to.


Agreed. When I open a link and get a Cloudflare CAPTCHA I just close the tab.


We eliminated all CAPTCHA use at Cloudflare in September 2023: https://blog.cloudflare.com/turnstile-ga/


OK, what you now call turnstyle. If I get one of those screens I just close the tab rather than wait several seconds for the algorithm to run and give me a green checkbox to proceed.


>AI models have latency

So do humans, or can my friend with cerebral palsy not use the internet any longer?


Totally different type of latency. A person with a motor disability dragging a puzzle piece with their finger will look very different from an AI model being called frame by frame.



That's a great video. To be clear, I'm not defending Captchas - I just don't know if I believe they're dead yet.


Cloudflare is more than captchas, it's centralized monitoring of them too: what do you think happens when your research assistant solves 50 captchas in 5 min from your home IP? It has to slow down to human research speeds.


What about Cloudflare itself? It might constitute an abuse of sorts of their leadership position, but couldn’t they dominate the AI research/agent market if they wanted? (Or maybe that’s what you were implying too)


Additionally, in the following days, I've seen evidence suggesting that the SFT part might not even be necessary. I'd argue that work wouldn't have happened if R1 wasn't released in the open.


Ah! My bad, I edited the article to simply quote Francois. Thanks for catching this, Simon.


Author here. I do believe it's going to be exponential (not yet), but that's out of scope for the article. However, if someone has a good explainer link for that, please put it here and I'll link it into the post.


All past data shows is exponential growth in the cost of AI systems, not an exponential growth in capability. Capabilities have certainly expanded, but that is hard to measure. The growth curve is just as likely to be sigmoid-shaped. Just a phase transition from "computers process information strictly procedurally" to "computers use fuzzy logic sometimes too". And if we've exhausted all the easy wins, that explains the increased interest in alternative scaling paths.

Obviously predicting the future is hard, and we won't know where this stops till we get there. But I think a degree of skepticism is warranted.


Once AI becomes self-improving, using its intelligence to make itself more intelligent, exponential progress seems like the logical consequence. Any lack of exponential progress before it becomes self-improving doesn't have much bearing on that.

It certainly will be sigmoid-shaped in the end, but the top of the sigmoid could be way beyond human intelligence.


I'm not completely convinced of this, even in the presence of AGI that is peak-human intelligence in all ways (lets say on-par with the top 1% researchers from top AGI labs, with agency and online learning are fully solved). One reason for this is what the sibling comment argues:

> Exponentially smarter AI meets exponentially more difficult wins.

Another is that it doesn't seem like intelligence is the main/only bottleneck to producing better AIs right now. OpenAI seems to think building a $100-500B data center is necessary to stay ahead*, and it seems like most progress thus far has been from scaling compute (not to trivialize architectures and systems optimizations that make that possible). But if GPT-N decides that GPT-N+1 needs another OOM increase in compute, it seems like progress will mostly be limited by how fast increasingly enormous data centers and power plants can be built.

That said, if smart-human-level AGI is reached, I don't think it needs to be exponentially improving to change almost everything. I think AGI is possibly (probably?) in the near-future, also believing that it won't improve exponentially doesn't ease my anxiety about potential bad outcomes.

*Though admittedly DeepSeek _may_ have proven this wrong. Some people seem to think their stated training budget is misleading and/or that they trained on OpenAI outputs (though I'm not sure how this would work for the o models given that they don't provide their thinking trace). I'd be nervous if it was my money going towards Stargate right now.


Well we do have an existence proof that human-level intelligence can be trained and run on a few thousand calories per day. We just haven't figured out how to build something that efficient yet.


The inference and on-line fine tuning stage can run on a few thousand calories a day. The training stage has taken roughly 100 TW * 1bn years ≈ 10²⁸ calories.


Hmm I'm not convinced that human brains have all that much preprogrammed at birth. Babies don't even start out with object permanence. All of human DNA is only six billion bits, which wouldn't be much even if it encoded neural weights instead of protein structures.


Human babies are born significantly premature as a compromise between our upright gait and large head-to-body ratio. A whole lot of neurological development that happens in the first couple of years is innate in humans just like in other mammals, the other mammals just develop them before being born. E.g. a foal can walk within hours of being born.

Babies are born with a fully functioning image recognition stack complete with a segmentation model, facial recognition, gaze estimator, motion tracker and more. Likewise, most of the language model is pre-trained and language acquisition is in large part a pruning process to coalesce unused phonemes, specialize general syntax rules etc. Compare with other animals that lack such a pre-trained model - no matter how much you fine-tune a dog, it's not going to recite Shakespeare. Several other subsystems come online in the first few years with or without training; one example that humans share with other great apes is universal gesture production and recognition models. You can stretch out your arm towards just about any human or chimpanzee on the planet and motion your hand towards your chest and they will understand that you want them to come over. Babies also ship with a highly sophisticated stereophonic audio source segmentation model that can easily isolate speaking voices from background noise. Even when you limit yourself to just I/O related functions, the list goes on from reflexively blinking in response to rapidly approaching objects to complicated balance sensor fusion.


If you're claiming that humans are born with more data than the six gigabits of data encoded in DNA, then how do you think the extra data is passed to the next generation?


I'm not claiming that humans are somehow born with way more than a few billion parameters, no. I'm agreeing that we have an existence proof for the possibility of an efficient model encoding that only requires a few thousand calories to run inference. What we don't have is an existence proof that finding such an encoding can be done with similar efficiency because the one example we have took billions of years of the Earth being irradiated with terawatts of power.

Can we do better than evolution? Probably; evolution is a fairly brute force search approach and we are pretty clever monkeys. After all, we have made multiple orders of magnitude improvements in the state of the art of computations per watt in just a few decades. Can we do MUCH better than evolution at finding efficient intelligences? Maybe, maybe not.


I agree with your take and would slightly refine it to remark that having in mind how protein unfolding / producing works in our bodies, I'd say our genome is heavily compressed and we can witness decompression with an electronic microscope (how RNA serves like a command sequence determining the resulting protein).


The human genome has 6 billion bases, not 6 billion bits. Each base can take one of 4 values, so significantly more data than binary. But maybe not enough of a difference to affect your point.


Looks like actually three billion base pairs in human DNA: https://www.genome.gov/genetics-glossary/Base-Pair#:~:text=O...

So six billion bits since two bits can represent four values. Base pairs and bases are effectively the same because (from the link) "the identity of one of the bases in the pair determines the other member of the pair."


It's 6 billion because you have 2 copies of each chromosome. So 12 billion bits right? But I do think your original point stands. I'm mostly being pedantic.


self improving only when it knows how to test itself . if the test is predictable outcome defined by humans most companies are going to fine tune to pass self improving test , but what happens next . Improvement is vague in terms of who seeks the benefit and may not fall as how humans have thought over millions of years of evolution.


I think we are already way past single-human intellence. No one person understands (or could possibly understand) the whole system from the silicon up. Even if you had one AI "person" a 100x smarter than their coworkers, who can solve hard problems at many levels of the stack, what could they come up with that generations of tens of thousands of humans working together haven't? Something surely, but it could wind up being marginal. Exponentially smarter AI meets exponentially more difficult wins.


>No one person understands (or could possibly understand) the whole system from the silicon up.

I'm not a fan of this meme that seems to be very popular on HN. Someone with knowledge in EE and drivers can easily acquire enough programming knowledge in the higher layers of programming, at which point they can fill the gaps and understand the entire stack. The only real barrier is that hardware today is largely proprietary, meaning you need to actually work at the company that makes it to have access to the details.


Good point. I agree actually, many people do put the work in to understand the whole stack. But one person could not have built the whole thing themselves obviously. All I was trying to say is we already live with superhuman intelligences every day, they are called "teams".


Your argument is that no one person can build a whole cargo container ship, hence cargo container ships are intelligent? The whole of humanity cannot build from scratch a working human digestive track, hence human digestive track is more intelligent than all of humanity?

Things can be complex without being intelligent.


Nope, not my point. My point was that even if we get superhuman AGI, the effect of self-improvement may not be that large.


Care to justify those beliefs or are we just supposed to trust your intuition? Why exponential and not merely quadratic (or some other polynomial)? How do you even quantify "it"? I'm teasing, somewhat, because I don't actually expect you're able to answer. Yours isn't reasoned arguments, merely religious fervor dressed up in techy garb. Prove me wrong!


Not necessarily 'exponential' (more superlinear) in capabilities (yet) but rather in parameters/training data/compute/costs, which may sometimes be confused for the other.

[0]: https://ourworldindata.org/grapher/exponential-growth-of-par...

[1]: https://ourworldindata.org/grapher/exponential-growth-of-dat...

[2]: https://epoch.ai/blog/trends-in-training-dataset-sizes

[3]: https://ourworldindata.org/grapher/exponential-growth-of-com...

[4]: https://blog.tebs-lab.com/p/not-exponential-growth


If you read the article, he explains that there are multiple scaling paths now, whereas before it was just parameter scaling. I think it's reasonable to estimate faster progress as a result of that observation.

I like that the HN crowd wants to believe AI is hype (as do I), but it's starting to look like wishful thinking. What is useful to consider is that once we do get AGI, the entirety of society will be upended. Not just programming jobs or other niches, but everything all at once. As such, it's pointless to resist the reality that AGI is a near term possibility.

It would be wise from a fulfillment perspective to make shorter term plans and make sure to get the most out of each day, rather than make 30-40 year plans by sacrificing your daily tranquility. We could be entering a very dark era for humanity, from which there is no escape. There is also a small chance that we could get the tech utopia our billionaire overlords constantly harp on about, but I wouldn't bet on it.


>There is also a small chance that we could get the tech utopia our billionaire overlords constantly harp on about, but I wouldn't bet on it.

Mr. Musk's exitement knew no bounds. Like, if they are the ones in control of a near AGI computer system we are so screwed.


This outcome is exactly what I fear most. Paul Graham described Altman as the type of individual who would become the chief of a cannibal tribe after he was parachuted onto their island. I call this type the inverse of the effective altruist: the efficient psychopath. This is the type of person that would have first access to an AGI. I don't think I'm being an alarmist when I say that this type of individual having sole access to AGI would likely produce hell on earth for the rest of us. All wrapped up in very altruistic language of "safety" and "flourishing" of course.

Unfortunately, we seem to be on this exact trajectory. If open source AGI does not keep up with the billionaires, we risk sliding into an inescapable hellscape.


Ye. Altman, Musk. Which Sam was the exploding slave head bracelet guy, was that Sam Fridman?

Dunno about Zuckerberg. Standing still he has somewhat slided into the saner spectrum of tech lords. Nightmare fuel...

"FOSS"-ish LLMs is like. We need those.


that seems a bit harsh dont you think? besides youre the one making the assertion, you kinda need to do the proving ;)


No, I don't think it's overly harsh. This hype is out of control and it's important to push back on breathless "exponential" nonsense. That's a term with well defined easily demonstrated mathematical meaning. If you're going to claim growth in some quantity x is exponential, show me that measurements of that quantity fit an exponential function (as opposed to some other function) or provide me a falsifiable theory predicting said fit.


I believe they are using 'exponential' as a colloquialism rather than a strict mathematical definition.

That aside, we would need to see some evidence of AI developments being bootstrapped by the previous SOTA model as key part of building the next model.

For now, it's still human researchers pushing the SOTA models forwards.

When people use the term exponential I feel that what they really mean is 'making something so _good_ that it can be used to make the N+1 iteration _more good_ than the last.


Well, any shift from "not able to do X" to "possibly able to do X sometimes" is at least exponential. 0.0001% is at least exponentially greater than 0%.


I believe we call that a "step change". It's only really two data points at most so you can't fit a continuous function to it with any confidence.


> It's a bit crazy to think AI capabilities will improve exponentially. I am a very reasonable person, so I just think they'll improve some amount proportional to their current level.

https://www.lesswrong.com/posts/qLe4PPginLZxZg5dP/almost-all...


>No, I don't think it's overly harsh.

Where's the falsifiable framework that demonstrates your conclusion? Or are we just supposed to trust your intuition?


Why is it “important to push back”? XKCD 386?


I think their thought process is unconvincing, although I think they're probably correct.

A much better paper is, "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models". They took it much further and trained an LLM such that for every inference, they could see exactly which document from the training dataset it referenced to answer a particular question. ngl This paper AGI-pilled me.

https://arxiv.org/abs/2411.12580


Right, there's already some very encouraging trends (this study out of Nigeria). Clearly AI can lead to laziness, but it can also increase our intelligence. So it's not a simple "better" or "worse", it's a new thing that we have to navigate.

https://blogs.worldbank.org/en/education/From-chalkboards-to...


I added Phi-4 to my reasoning model collection because it seems to exhibit reasoning behavior, it stopped to consider alternatives before concluding. I assume this is related to their choice in training data:

> Chain-of-Thought: Data should encourage systematic reasoning, teaching the model various approaches to the problems in a step-by-step manner.

https://github.com/tkellogg/lrm-reasoning/blob/main/phi4.md


I feel like this conversation isn't complete without referencing Hillel Wayne's "Are We Really Engineers?"

https://www.hillelwayne.com/post/are-we-really-engineers/


Entropix gives you a framework for doing that sort of thing. The architecture is essentially to detect the current state, and then adjust sampler settings or swap in an entirely new sampler strategy.

You absolutely could experiment with pushing it into a denial, and I highly encourage you to try it out. The smollm-entropix repo[1] implements the whole thing in a Jupyter notebook, so it's easier to try out ideas.

[1]: https://github.com/SinatrasC/entropix-smollm



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: