You can tell this claim is false, because that level of productivity increase would be glaringly obvious to an outside observer; it wouldn't need to be self-reported.
A YC summer batch is 84 days culminating in Demo Day. So a 100x speed improvement would be like a team spending less than 1 day of coding and ending up with something that's on par with Demo Day in terms of functionality. Maybe the design would be wrong, but that wrong design would be just as fully-featured as a Demo Day app.
So if 100x were true, the partners in that video would be talking about how the new batch dynamic is "They get breakfast with a customer, learn something new, have an epiphany, and then later the same day they have their entire app rewritten based on what they learned, and that scratch-rewrite is already at a Demo Day level of functionality." The partners aren't talking about that dynamic because it's not happening. So clearly the self-reported 100x is inaccurate.
Even 10x would result in partners saying "Whoa, in this batch people have a Demo Day-quality app in production by the end of week 1 instead of week 12." The partners have a huge sample size on how much teams get done in what time period, so it would be glaringly obvious to them if this batch were shipping 10x as fast as previous batches.
That external observation would be the headline if it were what the partners were actually seeing. Since that's not the headline, it's clearly not what they're seeing, so 10x can't be the number either.
To be fair if your benchmark is against demo day, at some point Amdahl's law kicks in regardless of how many multiples you have on engineering. Not sure if I believe the multiple of 10x or 100x anyway, but a better metric is “number of customer feedback loops” is a better metric than “can complete one (1) demo day in X time”. My (non YC) impression is that people are hitting more loops.
Also multiples “up” versus “down” are not symmetric. Airplanes are around 10x faster than cars, but that doesn’t mean I’ll be getting to work in 60 seconds.
If you can't do something extremely impressive with the equivalent of hundreds worth of full time engineers then there is something wrong with you as a founder.
Note that real engineers helps you come up with new features and test your product and all that and not just add code, adding code was never a bottleneck on just about any problem ever.
So Amdahl's law applies in terms of the time to add code, not in terms of engineers, they don't do the work of 100 engineers, at best they take 100x less time to add lines of code when they know what they wanna make. But that isn't particularly game changing, as adding code is not the hard or even time consuming part.
I see your problem (at least according to Yegge's theory): the batch applicants are just too senior. If they were more junior, only then would they benefit from the 100x multiplier. The olds are just too far removed from the enlightened way, you see
At this point in time, we're following the time corporate got on the outsourcing craze step for step. It had all the hype, hands off, cheaper for the same work, faster to market, every other argument you've all certainly heard. Then the reality hit. The whole discussion around LLM coding agents feels indistinguishable.
The reality in which people like me get to do work for US/UK for 4x the salary relative to equivalent work locally, and some of this work is actually cleaning up after folks elsewhere, who being cheapest labor available still got 4x their local salary for this work, and the total is still 4x cheaper than what the US/UK company would pay locally? :).
(I'm only half-joking; in a previous life, I worked on a project with this exact development history.)
The outsourcing market is alive and kicking, and offers a whole spectrum of quality and price. The more to the east of US you are, the easier it is to see :).
> The whole discussion around LLM coding agents feels indistinguishable.
Nah, the difference here is, in outsourcing-to-LLMs scenario, there are no people who do the work and benefit from favorable salary/costs-of-living ratio.
The company I worked for 15 years ago outsourced QA to save money. It was sold as paying $15 for a 4080 video card. When they opened the package, instead of a 4080, it was a brick. The salt in the wound was when they realized they overpaid 3x for a $5 brick. If it wasn't for vendor lock-in (they being the vendor), they would be dead.
The local QA person could run through 100 or so scenarios a day. The offshore people could do 2 a day. They never improved. The offshore people who are tops aren't cheap.
There's an art to outsourcing, and - even worse than with LLMs - it's not something you can ever just do and forget, because without active management, you'll eventually end up wasting money and time while getting nothing in return.
QA is a whole other story, too. Outsourcing QA is stupid, but even more stupid and short-sighted is not having QA in the first place, and that unfortunately is becoming a norm.
There's lots of false economy going with jobs, too. Getting rid of QA may save you salaries, but the work doesn't disappear - it just gets dumped on everyone else, and now you're distracting much more expensive engineers (software or otherwise), who do a much worse job at it (not being dedicated specialists) and cost more. On the net, I doubt it's ever saving companies any money, but the positives are easy to count, while negatives are diffused and hard to track beyond overall feeling that "somehow, everything takes longer than it should, and comes out worse than it should, who knows why?".
>Outsourcing QA is stupid, but even more stupid and short-sighted is not having QA in the first place, and that unfortunately is becoming a norm.
Yes, they were moving in that direction. They centralized the QA team over a suite of probably 15 products, which means no one has any expertise. The QA VP would get mad when QA found bugs because "the dev's were supposed to find all the bugs and the QA was just supposed to just certify the release." The amount of people who don't understand how software dev works in high positions is mind boggling.
They ended up firing all the devs except me and this other guy who gave zero shits and wanted to be a manager. "We" maintained 3 products. Two were pretty standard web apps but one was a full blown decision support system (rules engine) that only I knew. I quit after a few months of killing myself. They paid me a whole lot of money a few years later when they were trying to add features to get a very lucrative government contract.
In my experience, I started my career 15 years ago cleaning up codebases developed by cheap outsourced developers. They were an absolute mess. Today, I work with many very talented developers overseas. People who develop code at my level of quality and above (and I am a stickler for high quality, maintainable code).
The thing is how you research, what you expect to get out of it, and what you're willing to pay. There was absolutely a gold rush on bottom-dollar development by cheap overseas developers by management who had no idea how software really worked, and thought they could build a business on cheap offshore development. These were software farms staffed by unappreciated undertrained people from diploma mills. I saw truly shocking things. I saw code written entirely with gotos instead of loops, because the developer had never learned how to write a while loop. The companies spent way more in the long run trying to iterate, ask for changes, and ultimately having to hire higher quality talent for much more money.
I agree with the GP. The LLMs will get better, maybe people will learn that you need an LLM with a skilled developer, or maybe the agents will get good enough to fully drive themselves properly, or maybe just good enough that a non-technical pilot can get good work out of them. Right now, "vibe coding" is largely non-technical people making messy, unmaintainable, insecure code. Some of these are programmers and non-programmers just playing around, but some people are trying to build money-making businesses off this, and it does feel like a very similar situation.
And who cares if it is true? So far programming is one of very few professions when person can set themselves for life in a relatively short period of time. When / if it is gone there will be something else. I have few friends who'd switched to be a handyman. They are doing great from what I see.
> So far programming is one of very few professions when person can set themselves for life in a relatively short period of time.
If you specifically optimize for it. Most people don't - they specialize and expect to be in their line of work for decades.
> When / if it is gone there will be something else.
There will be something else for young people who are just starting. If you're 20 years into a career and then your line of work disappears overnight, of course you can switch to something else - and enjoy your entry-level salary while competing for jobs with people who are 20+ years younger than you and have no meaningful costs or obligations yet.
Also, on what you're optimizing for. If, like many, you're looking for a job that makes sense, for instance (e.g. working to develop technologies or research that you think can change the world for the better), you're probably never going to strike that particular gold.
My father never dissuaded me from the computer science degree I went for (starting in 2007), but later on in life, he told me that, at the time, he was worried about my choice because outsourcing was all the rage and everyone was saying there’d be no programming work in the states-that it was all going to be done in India.
I think that's a great point. There is a place for outsourcing. Some projects and organizations are well suited for it. Some projects end up only using outsourcing at supplementing some parts of the work. Some can't do it for quality or compliance reasons.
I think we will see something similar with LLMs. There will be areas where it will deliver cheaper and faster. There will be areas where it will deliver nothing but disaster. It'll change the industry but not eat it alive. The folks talking about fully autonomous coding on the near horizon are dreaming.
I’ve seen a number of outsourcing project failures. The two things they all had in common were that the organizations in question were terrible at managing projects but they blamed the developers for management’s inability to plan or make decisions, and they were trying for unrealistic savings – it wasn’t enough to save 30-50% on salary, they wanted 90% even if that was below the market rate for those skills even in India.
The first one is definitely happening with the LLM bubble where companies really want to pretend that the hard part of the job isn’t understanding what to build and how to do so maintainability.
The second one is going to be more interesting: I expect LLMs to put downward pressure on wages in a lot of places but also for smarter companies to realize that nothing short of true AGI is going to replace the need for people who can actually understand what the customer needs. If I’m right, this will swing the pendulum back towards specialists again – the seagull guys who come in, declare that their favorite framework will solve everything, and leave are more vulnerable to being replaced by an LLM than someone who knows how to code but is also bringing actual business-relevant experience and judgement which an LLM can’t have.
> the seagull guys who come in, declare that their favorite framework will solve everything, and leave are more vulnerable to being replaced by an LLM than someone who knows how to code but is also bringing actual business-relevant experience and judgement which an LLM can’t have.
But that's just a continuous variant of the discrete-sounding claim that programming will get eaten by AI soon. After all, the "actual business-relevant experience and judgement which an LLM can’t have" is mostly not related to programming - and the better LLMs get at coding, the less value will the programming parts of the skillset have; take it to the limit, and it's just saying the managers and sales people will stay, while software developers will be gone.
I think that’s a question of how you define jobs. For example, I’ve worked with very few managers who could document their business processes in sufficient detail to build an app. Now, is the person who does a business analyst, architect, senior developer, etc.? Who sits down with the users, gets feedback, but understands the needs of multiple parties well enough to tell which points are traps, which should be developed in a different direction, etc.?
Basically, I’m saying people should stop expecting to get six figures for being able to run create-react-app and deploy a container. The analytical and social parts of the job are where I predict LLMs to make fewer inroads because they require non-generic understanding.
> I’m saying people should stop expecting to get six figures for being able to run create-react-app and deploy a container. The analytical and social parts of the job are where I predict LLMs to make fewer inroads because they require non-generic understanding.
That's a fair take. I do wonder though, how much will those "analytical and social parts of the job" be paying - I imagine you might no longer get six figures for that either, because high tech salaries are fueled by absurd growth of the industry, which manifests in a large part in software that's basically just {framework du jour + basic-level CRUD, that hasn't changed much in 30 years + branding}, and that kind of software I expect to get eaten by LLMs entirely.
Even with cookie-cutter app coders out of the way, the remaining software engineers might see the number of jobs implode, crashing salaries for some time, until (maybe) the growth restarts around new kinds of software, kinds that'll be in high demand and not something that can be made by a few "analytical/social" people herding LLMs. I'd normally say this won't happen, but rather that the software economy will slow down, stabilize, and get boring like everything else - but then, so much of software is driven purely by advertising, and advertising is a negative sum game, so surely they'll invent more bullshit jobs for us.
Yeah, I’m really not sure either with the general backdrop of looming American disinvestment and the entire world reconsidering reliance on American companies. I don’t think reversing the trend of consolidation is going to be enough to balance it out.
The market can stay irrational for longer than you can stay solvent.
This is not to say outsourcing can't ever work, but the situations where it does work are much rarer than what every outsourcing vendor would like you to believe.
I bet a previous client's attempt at outsourcing (well into the 6 figures now) is included in that number... yet the expensive onshore devs outsourcing was supposed to replace are still there 2 years later except now they have to also babysit the offshore idiots and fix their messes.
But hey, the vendor got paid, the idiot executive who fell for their pitch wouldn't want to lose face, so it all gets handwaved away as a continuing success and more money gets thrown into the dumpster fire.
FWIW, if you believe that long-term, the market is a good optimizing engine (I think it's a very reasonable and well-proven belief), then this is just a matter of time before things sort themselves out.
The need for companies to get more value out of less spend won't disappear, nor will the comparative advantage of companies in lower CoL areas of the globe. That's two fundamental incentives on both sides that are aligned, driving the market to find the lowest-energy path from here to there. It'll get there, even if it ends up looking strange (like, idk., maybe cutting out management intermediaries but involving a middleman acting as insurance).
That is, if LLMs won't leapfrog it all and end software dev outsourcing before it started to work well.
Even at its current size, the software outsourcing business is multiple orders of magnitude smaller than the software business itself. While there's money to be made, clearly the hype didn't live up to even remotely what it promised to be.
My first job in the industry was cleaning up a large codebase created overseas by indian developers. Maybe the new kids today will break into the industry by cleaning up messes that have been generated by AI.
Nah, the magic/promise of AI is that it has positive chance of getting there, so you can keep feeding it dollars until it eventually gets you the thing you want, and that it's still cheaper than having people do it the old-school way.
We're not there yet, but I don't see anything preventing us from getting there in ~5 years.
(Remember: 5 years ago, SOTA in this space was letting a genetic algorithm poke at an AST and hopefully maybe arrive at a trivial program solving a small algorithmic problem.)
I wouldn't bet against it. Self-driving tech benefits directly from the outputs of sudden and continued growth of R&D in AI, fueled by hype-driven investments.
Maybe just like with self-driving cars, trying to hook mechanical precision up to messy human society is going to be fraught and lead to blowback. Meaning that once planes start falling out of the sky from vibe coding Boeing contractors there will be PR and regulatory panics that soften the hype somewhat. Or, the next time Equifax gets mass leaked and they blame their security setup on generative AI. You can’t vibe code your way out of human stupidity and the consequences of production environments.
Maybe, but the same thing could be said about JavaScript and webshit and the world is still there. We can just continue to not use YOLO technologies and culture in safety-critical applications.
> It had all the hype, hands off, cheaper for the same work, faster to market, every other argument you've all certainly heard.
Like some of the other responses, I'm baffled by your comment. Have you not seen what's happened in the past 5 years or so?
Yes, there was an outsourcing craze to India after the .com bubble burst in the early 00s that largely failed - the timezone, cultural differences, and lack of good infrastructure support made it fail.
The past 2 companies I've worked for offshored the majority of their software engineering work, and there was no quality difference compared to American devs. The offshore locations were Latin America and Europe, so plenty of timezone overlap. The companies are fully remote, so what difference does it make if the dev is in your same city or a thousand miles away?
I think offshoring has absolutely put downward pressure on US dev salaries in the past couple years.
I think you're talking about a different phenomenon, having remote mixed teams is IMHO different from offshoring.
There's at least the crucial difference that you had devs in both western countries and traditional "third world", where the 90s view of offshoring was throwing whole processes abroad and only keep "heads" in-house while the remote teams/companies would deal with all the execution, making it inherently difficult to deal with production monitoring.
PS: to your point offshoring to India has become more common but Indian companies are also not that cheap, so we're past the initial framework. Perhaps the same way outsourcing production to China used to be about sweatshops, when it can now be about unrivaled expertise at a cost.
It's about those agents being (mis)used in the very specific blind faith approach of "vibe coding", not least due to the hype merchants and grifters picking up the phrase and running with it shorn of the original cautionary notes about it being useful for bringing a bit of fun back into non-serious coding.
Criticizing the idea (and conflating it with the wider field of LLM coding agents) without understanding that original context is not really any better.
Vibe-coding <> LLM coding agents, which - when used properly - are brilliant for use in serious code and are here to stay.
This is the go community saying a computer will never best human go players.
We already have examples of a model finding more performant sorts [0], given the right incentives and time, and the right system for optimizing (LLMs trained on “average code” probably aren’t it) the computer will best us at creating things for the computer.
Is “vibe coding” real today? Not in my experience, with even Claude code. My hand has to be firmly on the tiller, using my experience and skill to correct its mistakes and guide it. But I can see the current trajectory of improvement, and I’m sure it’ll get there.
> This is the go community saying a computer will never best human go players.
I don’t see this. Board games are fundamentally different from software development problems. The latter have imperfect information, unknown requirements and constraints, fuzzy success criteria, and more.
> So you’re saying you can automate the coding part by… writing the code (in an inferior language)
Not OP, but yes. But that means you don't need a dev, just someone who knows how to spec correctly in English/Jira, right? Is that likely a dev who moved on to PM? Very likely in 2025.
For better or worse, the future I imagine is a Jira plugin or MCP server that can read a project, the LLM IDE client then asks questions to fill in the blanks... and out comes the app.
For many years this will require a human in the loop. But will that human need to know the intricacies of the latest frontend framework? Less and less as time goes on.
Trying to provide a programming tool for non-programmers has been a wet dream of some for a while. See SQL, 4GL, DRAKON (lol), VBA, no-code platforms (btw what happend with the hype? How are we not all replaced already) and the most recent but certainly not the last is LLMs. All while sometimes yielding something useful past attempts have consistently and spectacularly failed this objective. Fundamentally because non-programmers don't want to deal with this, otherwise they would have learned some damn proper PL long time ago, it's not THAT hard. And LLMs adds quite some special spice to that. How a vibe coder going to fix LLM output that fails? Without understanding the code, that is.
Regarding "the latest frontend framework" the whole situation is a bit mysterious to me, because somehow everyone keep spending millions of man-hours on yet another react contraption where a static HTML would be enough. From the user perspective all this stuff brings no value, 80% of frontend stuff could have been automated long time ago or just not done at all, yet we keep reinventing the wheel. I don't see how LLMs can change the situation because there was clearly no demand for improved productivity there before.
Vibe coding is 100% real. Or maybe we should call it code vibing when there is no coding ability. But I just taught 18 professionals with no coding ability to build functional software. Their minds were blown
Not according to your very specific stakeholder demands / environment/naming/data tables/data protection requirements, otherwise you would just use a library.
Those might seem like trivial differences but plenty of things go wrong there, plenty enough that you can't just use a library instead of a programmer, and then they are plenty enough of such errors that vibe coding will also cause issues.
I think the point being made is that even though the specific set of requirements may be unique in a stakeholder basis, all the components already exist, and have been combined in many ways. So it really boils down to prompting in such a way that the right set of components are brought together in the right way. That's where the skill now lies.
> So it really boils down to prompting in such a way that the right set of components are brought together in the right way. That's where the skill now lies.
And how is this different from just calling the libraries in the right way to make it adhere to stakeholder requirements?
The statement isn't "its impossible to get an AI to print the code for a right program", but "the work and skills you need to get an AI to print the right program is as much or more than to do it yourself.". That seems to be true for all but trivial programs. Here trivial means you can download a git repo and change some variables to get that result.
> And how is this different from just calling the libraries in the right way to make it adhere to stakeholder requirements?
In that you need humans that can understand stakeholder requirements, constraints of the ___domain, and limits of existing software, so that they can write the necessary glue to make everything work.
Thing is, LLMs know more about every ___domain than any non-expert (and for most software, "___domain experts" are just non-___domain-expert programmers who self-learn enough of it to make the project work), and they can understand what stakeholders say better than other humans can, at least superficially. I expect it won't take long until LLMs can do all of this better than an average professional.
(Yes, it may take a long time before an LLM can replace a senior Googler. But that's not the point. It's enough for an LLM to replace an average code monkey churning out mobile apps for brands, to have a huge chunk of the industry disappear overnight.)
Paul Graham argued that if you act like the average startup, you'll get the same results as the average startup. And the average startup fails.
It follows that if you want to have success, you need to do something new which hasn't been done before.
> LLMs know more about every ___domain than any non-expert
As soon as you're creating something new, or working in a niche field, LLMs struggle.
So do junior developers. But they learn and get better with time. While onboarding a junior developer requires more effort than doing the work yourself, it's worth it in the long run.
IMHO, that's the largest issues LLMs have today. They can't really adapt and learn "in the field". We build a lot of workarounds with memory to circumvent that, but that too only works until the memory exceeds the context.
I've tried using ChatGPT, Copilot, custom GPT 4o models and Cursor. The task they did best at was generating a simple landing page (though they struggled with tailwind 4, cursor spent almost 8 hours debugging that issue).
With tasks that require more niche ___domain knowledge, it went much worse. Cursor finished some of the tasks I gave it, but it took over 10x more time than a junior developer would've spent, and I had to constantly babysit it the entire time, providing context, prompting, writing cursor rules, prompting again, etc. The others failed entirely.
If I start working on an unfamiliar task, I read all the docs, write some notes for myself, maybe build some sample projects to test my understanding of the edge cases. Similarly, if faced with a new task, I build some small prototypes before committing to a strategy for the actual task.
Maybe ML agents would fare better with that approach, instead of today's approach of just creating a mess in the codebase like an intern.
It's a switch of focus. Instead of being occupied by the grunge work of integrating libraries and code logic from the outset, now focus can remain on the larger picture for longer, with a need to jump into raw code only for the really tricky/unique problems, if there are any. That can be a lot of overheat avoided, if done well.
This one guy made, in bolt.new, a system that would generate song lyrics for a song and sync the text to speech to different parts of the song. Creative and interesting.
Someone else made an “exquisite corpse” drawing game.
And another, a way to annotate medical images.
I think of all these things as functional prototypes. It’s obviously not engineering. But it is pretty magical —
What were your expectations and how did it exceed them?
I have to say, on they basis of your comment I just decided to try Cursor, and I'm sorry to report it immediately disappointed me.
First thing it did was it told me it found a syntax error in code that compiles perfectly. It went ahead and added a closing brace, telling me that "I've fixed the issue by properly closing the method with its brace. The method now has valid syntax and should compile correctly".
It was never broken! This seems like such a regression of tooling; we have parsers for the purpose of finding wrong syntax, and they don't just make up things like missing braces that aren't missing, and then break your code by inserting them. This seems like developing in Kafka's nightmares.
Should I even bother continuing to evaluate Cursor, when it can take perfectly valid and correct code, and immediately make it worse? What other nightmares will it reveal to me, and do I even want to know? I'm kind of astounded at how bad that first impression was, couldn't have been worse really.
Try refactoring anything in Cursor, that's where the shit really hits the fan. I guess all those folks claiming 100x productivity or applications made with sole vibing, are only building little proof-of concept apps, something which "works" but definitely doesn't need to follow proper requirements. Can it bootstrap an application for you? For sure, but that's just half a day sunk otherwise - all the next weeks of building up on that scaffolding cannot today be automated, or vibed.
How did you build that experience? It's speculative now if people can develop equivalent experience when you're job is now to constantly guide an LLM to do the right thing. We can speculate that you see patterns and seek to correct them, but the type of hands-on experience and muscle memory is threatened to be atrophied (very much like just relying on stackoverflow for answers can atrophy your ability to seek core knowledge.)
Where we are now, I wonder if is similar to the early days of compiled languages existing, back when people still somewhat commonly wrote assembly by hand and didn't trust compilers.
Sure, things like Roller Coaster Tycoon exist, but but writing in a compiled language is so much faster, easier, and more broadly accessible than writing in assembly that compilers took over.
TIL that Rollercoaster Tycoon was written in Assembly.
“[Developer Chris] Sawyer wrote 99% of the code for RollerCoaster Tycoon in x86 assembly language for the Microsoft Macro Assembler, with the remaining one percent written in C.” - Wikipedia
My worry about this approach is: there is a reasonably popular saying that writing code is hard but debugging it is twice as hard (at least), which I think is an accurate description.
LLMs will greatly increase code production, will they also increase debuggability to match?
I think probably people will also use LLMs to debug.
At this point, I think you can consider vibe-coding with an LLM to be pretty equivalent to using a fairly junior developer with access to stack overflow. It's going to make a lot of mistakes, it's going to make a lot of questionable decisions. Sometimes it will be able to fix its mistakes, sometimes it will spin its wheels and never fix it. It may make a big ball of mud. The entire project may fail.
Two-three years from now, who knows. I'm pretty sure you still won't see 100% reliability in translating English->code, but you also don't see that even with senior developers.
LLMs don't just make it easy to accumulate code - they make it easy to throw code away and start again. This already enables taking a different approach to debuggability - if there's a bug and it can't be trivially solved, trash that bit of the code and write it again. It may not be broadly viable just yet, but it will be if the models keep getting cheaper and better.
This is also implied in the idea of "vibe coding" - don't bother understanding the code or debugging it yourself; if it doesn't work the way you like, just say it and have the model fix it until it gets it right (or you run out of money).
> LLMs don't just make it easy to accumulate code - they make it easy to throw code away and start again.
To throw away code you have to understand it, so no its the opposite. Code you don't understand is the hardest to get rid of, so it stays the longest in your codebase.
> if there's a bug and it can't be trivially solved, trash that bit of the code and write it again
How do you know where the bug is if you don't understand the code? There is no known algorithm to take a bug description and return the place in the code the bug is, otherwise bug fixing would be trivial.
Edit: Not to mention that in real production systems your bugs will corrupt the database, and if you haven't set up a logging system etc you will likely not realize for a while forcing you to do a rollback to a very old state losing so much data. You wont last long doing that.
> To throw away code you have to understand it, so no its the opposite. Code you don't understand is the hardest to get rid of, so it stays the longest in your codebase.
No, you don't.
> How do you know where the bug is if you don't understand the code? There is no known algorithm to take a bug description and return the place in the code the bug is, otherwise bug fixing would be trivial.
Yes, there is.
At worst, the bug is somewhere in the entire project. But you probably have a more narrow idea where the bug is, or when it was introduced. "In module X", "In feature Y", "In the last N days/weeks". Not to mention, for most bugs, `git bisect` is enough to precisely narrow down the problematic change, and doing that doesn't actually require understanding anything about the code.
It all boils down to costs. Even if it takes AI a whole day and 1000 attempts to do what would be a relatively simple fix for an experienced developer, if those 1000 attempts cost less than the developer's work-hours, the business will soon learn to prefer AI over people. If and when we get to that point, is mostly just a function of LLM performance and cost. If they get cheap enough, it'll make as much sense to have human developers fix bugs in code, as it makes sense for you to mend holes in your socks instead of buying them in bulk on-line.
> Not to mention that in real production systems your bugs will corrupt the database, and if you haven't set up a logging system etc you will likely not realize for a while forcing you to do a rollback to a very old state losing so much data.
This really depends on the kind of system you're doing, and the kind of data you're storing.
> if there's a bug and it can't be trivially solved, trash that bit of the code and write it again
When I was in undergrad, I knew a few guys who approached every problem by pasting in snippets from Stack Overflow and tutorial sites until the code “worked”. Did not end well…
Look, I understand the sentiment. I too want the code to be done properly, well-engineered and thought through. But recall, this is not our job. We are Professionals, and by modern definition, a Professional does whatever is best for the business. And the business doesn't care about the product - it cares about the product's ability to earn them money. If generating shit code, then throwing it away and generating anew at any sign of bug (or spec change) gets cheap enough, this is what the business will want to do. This is what being Professional will mean.
(You can imagine I don't hold "professionalism" in a very high regard.)
See also: basic goods in meatspace. In developed economies, people generally don't repair clothes anymore, and increasingly rarely anyone bothers with repairing appliances. It's cheaper to just throw it away and buy a new one, than to try and repair it. Hell, even construction and remodeling these days involves a lot more of "affix it here permanently; if you need to move it, just smash it and install a new one" approach.
Why wouldn't the same eventually happen with code?
This is an interesting thought experiment, but I don’t think it’s realistic. Software that “works” doesn’t necessarily work. And software that doesn’t work can cost catastrophic financial losses. Worse yet, those losses are going to be felt more painfully when they are incurred recklessly.
There is a reason that humans developed analytical problem solving as an alternative to trial and error: when you can do it, it’s more effective, and safer.
That’s not to say that disposable code doesn’t have some interesting implications! One of them is that experimentation becomes a lot cheaper, so it’s faster to navigate the search space of possible solutions to a problem. But just taking a “solution” directly from an LLM without validating its correctness is fundamentally unserious and will be punished by reality sooner or later.
Since when does any business care about "engineering for the long-term"?
We, the software engineers, care. The business doesn't. In fact, the industry has systematically been trying to beat the care out of engineers - it's unprofessional to care about the work beyond the point it stops making money for the business.
I'm not saying this is right or wrong - but this is how companies roll; if they can fix product issues by having AI throw chunks of code away and do it again, if that's reliably cheaper than having engineers do actual engineering, then that's what businesses will do.
Hopefully it won't. You don't put webshit in control of rockets or cars, and you shouldn't put vibecoded software in control of them either. Programming safety-critical systems is its own thing, and it should be resistant to LLM incursions at least as long as human sign-off is an important part of the job.
In the past, if you'd tear your shirt, you'd spend time mending it, or pay someone to do it for you. Today, you just throw it in the trash and buy a new one, as it's much cheaper and faster.
Think of any other goods we don't repair anymore. Regardless of their internal complexity and beauty of engineering, and no matter how small the defect is, if it's cheaper to replace it wholesale than to repair it, people end up replacing it.
There's no reason to believe the same won't happen to code.
That's only cheaper because we've made it someone else's problem. Someone is paying that cost -- fast fashion is a real issue that causes waste, pollution, and human rights violations. Now as we face tarriffs on foreign manufactured goods, things don't seem so cheap anymore. Mending socks looks a little better.
You're right, there's no reason to believe the same won't happen to code, but there's also no reason to believe it won't similarly end in all kinds of problems that come back to bite us down the line when the goodtimes are over.
o1 has lately seemed more useful for debugging than for actually writing code. With writing code, it may give me some scaffold but I have a pretty particular idea of what I want and I have to rewrite most of it.
With debugging, there have been multiple cases where I was stuck on something, I described the problem in detail and o1 gave me an insightful explanation of what I didn't understand.
It's not magic, but mostly what makes it not magic is it can't respond to very detailed questions. But if I can get it the relevant information into a small space it can draw connections I can't.
Your example of a model finding more performant sorts did not involve the use of LLMs. I've said multiple times on HN that I believe we need to look past LLMs for the next breakthrough in AI assisted coding. I don't believe LLMs can exceed humans in quality; they can only match humans in quality and beat humans in speed.
For creating website (not apps) it absolutely is there. This is just the first rung on the ladder though. It’s not doing Linux kernel development yet, but that time will come eventually. In between are all the other rungs. AI will climb them one by one.
For me it is very much hit or miss. I use claude sonnet with Cline. Sometimes I'm blown away that it can create a new page with CRUD functionality, nice UI etc. in one go. Another time it struggles to create a simple web page. Yesterday I needed a very simple landing page. My prompt was something along `Very dark grainy background with centered "name of the page" text.`. It couldn't get neither the background, nor the centering right. Later when it got one right, it screwed up the second. I had to give up after consuming ~$1.5 because we were not getting closer to the result.
Grok3 has been amazing, but it keeps on wanting me to do dark mode for an app I’m creating. When I finally said yes, it wasn’t able to get dark ode to work, despite multiple tries. Was funny tbh.
Check the actual paper on the type of sorts it actually got speedup on :-) (hint: a few percentage points on larger n,similar to what pgo might find, the big speedup is for n around 8 or so, where it basically enumerated and found a sorting network)
Well… to be frank, this is someone not reading TFA.
The article is pointing out real limitations of vibe coding today (which you appear to agree with). It does suggest AI coding won’t be viable in the future. You should probably update your comment to say something like, “spot on”.
Nah, the article is generalizing from a single sample of current state, ignoring the larger trajectory (that, for AI coding, went from sci-fi to reality in two years).
Sure, the tools aren't perfect, so there's some art to using them now - which the author of TFA seems to be unaware of. Take for example:
> You cannot ask these tools today to develop a performant React application. You cannot ask these tools to implement a secure user registration flow. It will choose to execute functions like is user registered on the client instead of the server.
Of course you can ask them to do it. You can literally ask them to "write better code" (yes, with this exact phrase; see [0]), and you'll get better code. More performant, or more secure - it depends on specifics of the case. Or, you can ask them specifically to focus on security or performance, and you will typically get improvements on those axis.
That's today. Next year, people will know how to prompt the models away from most common failure modes, and the models themselves will be further trained to avoid those same failure modes. In bringing specific problems up, TFA isn't making a convincing argument against future of "vibe coding" - it's literally helping in making it happen.
> Of course you can ask them to do it. You can literally ask them to "write better code" (yes, with this exact phrase; see [0]), and you'll get better code. More performant, or more secure - it depends on specifics of the case.
That wasn't the requirement though. He said he wanted a performant or secure app, not "more performant" or "more secure". "more" is trivial, but actually getting to a good state is not.
To actually make a larger program secure or performant you need a unified higher level architecture that is adhered to everywhere, vibe coding can't get you that. It can do some micro optimizations as you said, but it can't do these macro contexts and architecture. You can ask it for suggestions for such architectures, but you can't make it implement a full scale large app with all components using it.
> Nah, the article is generalizing from a single sample of current state, ignoring the larger trajectory (that, for AI coding, went from sci-fi to reality in two years).
You seemed to have missed the part of the article that clearly lays out the recent progression of AI coding. You’re also refuting the arguments the article makes against the future of vibe coding, but the article doesn’t make any arguments about the future of vibe coding!
We all just end up talking past each other if no one is actually talking about the same thing.
Ah yes, the classic "we didn't have this at all 2 years ago, so we can expect a linear increase in capability 2 years from now! Trends will of course continue!"
I think you're wrong. I think AI is going to stagnate and only the surrounding tooling will improve, but not enough to get us to the promised land. Arguably we already are seeing that happen. To see the supposed "this is how all code is written now" world AI proponents keep declaring, we're going to need to see improvements to the current AI where they can operate on context windows two orders of magnitude bigger than they are now, while costs for doing so also drop accordingly. Maybe that can be done, I'm betting it can't.
Remember 2016/17 when we all thought we would have self-driving cars by now? People were modelling intersections without traffic lights, and talking about not needing parking space because the cars would just drive around when you weren’t in them.
So much technology these days seems to be “get it to 80% so we can demo and cash out” but 80% isn’t just an arbitrary number - it seems to be the point and which the remainder of the work to finish is either very hard or (I suspect) impossible.
We do have self car driving by now. No drivers. Easy to find in SF and many other cities. It's not yet available everywhere, but as the Amara's Law describes, "we tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run".
It's not available most anywhere. I don't know what exactly what the threshold should be, but it should be usable by most people in first world countries to make the claim "we have it". We don't have it.
Self-driving has been long proven to be a workable idea. Yes, there's the other 90% of the work making it work reliably and safely enough in diverse environments, but we know this can be done, it's just a matter of throwing money at it.
And it's not like there isn't a possible alternative either - we could be adapting roads to be much easier on self-driving cars. It's just that cooperation and coordination between humans is a way harder problem than self-driving, so it's easier to have a bunch of vendors solve the problems in tech the hard way, rather than to rely on the world to maintain roads properly.
My belief is that, if full self-driving isn't becoming widely available in first world countries in 5 years, it's not going to be because of engineering problems, but rather because of legal and process issue around deploying it.
I don't think we know it can be done - has anyone demonstrated a self-driving car that works in all conditions, but is prohibitively expensive in its current form? If they had, you could extrapolate and say 'it's coming as soon as the tech gets cheaper'.
You can't adapt roads to fog or snow (unless you're going to enclose them in a tunnel). You can't adapt roads to pedestrians or bicycles (unless you prevent them from going near the road). Whatever adaptions you could make would be prohibitively expensive to roll out to all roads everywhere. In both the country I live in, and the USA, the government can barely afford to maintain the existing highway infrastructure.
Of course it will get better, and self-driving will be more widely deployed, but I don't think you'll ever get 100% percent coverage (assuming that's what we both agree is the goal).
I think in a way you're agreeing with my point - the last bit is just too hard, whether it's engineering, legal or political.
I'll share a new wrinkle that casts more shade on the coding LLMs.
We have a fair number of offshore resources that are used for dev. They developers are fully integrated into the team, are in all the stand-ups, and substitute for the usual role of junior programmers. They don't get the grunt-work shoveled on them, they get the same work as everyone else, they're just expected to not be as fast.
In 6 months 2 out of 4 of them been sacked, and surprise, not because we could replace their work with LLM output, but because their use of LLMs was so unrestrained and scattershot the pull requests they submitted had become nightmares. One thing mentioned in the article about unit test creation was something we saw as well. Perhaps this is partly due to working an existing code base where the LLM loses some of its advantage, and certainly some of it was cultural in that progress was thought more important than actual manageable code. The two sacked fellows where told, literally, from my own mouth, multiple times, "You cannot just ask Copilot to write you code, paste the entire thing into Visual Studio with no thought of what has changed, with the end goal of just compiling and meeting the single set of acceptance criteria on your story. You're breaking other things and introducing bugs." It went on deaf ears, and now they're gone. They were nice people, I didn't know how to get through to them, but they were convinced the LLMs were the way to go.
I use LLMs to help write code every day, and I wouldn't want to be without it, but I'm fairly surgical about it. Most of the time Copilot gives you a page of say, React code, or EF Core queries, you have to be really careful about anything you didn't explicitly ask for. Honestly, there is a time savings, but there is not a quality increase. The benefit is subverted by the time it takes to figure out how to ask correctly, the time to vet the output, and the time to fix the little tiny insidious bugs it can introduce.
So, don't go vibe coding and lose your job, is something to think about. I have to admit that it has worn me down meeting these interesting people from far-flung locations only to watch them flounder and get let go.
I've been contracting for a fairly largish company (around 300 devs split across ~40 teams). The US based company was bought out by private equity about a year ago and many senior/lead engineers were forced out and contracted out to cheaper overseas labor.
The company has been relatively ambivalent about the usage of code assistant AI, but during PR reviews it has become very apparent that its seen widespread adoption among the outsourced dev teams purely because of code duplication. Our company has a fairly large number of repositories and bespoke libs for utility type functionality.
In the past, a programmer might have internally said to themselves, "There's no way that somebody hasn't already written this stupid function X or method Y", and they'd take a few minutes to search or reach out to see if it exists within an organization.
Instead during some of the recent code reviews, there has been a huge uptick in core functionality that is very obviously being spit out by the LLM. At best its just extra unnecessary code. At worst it will introduce new bugs since our custom functions often handle business ___domain specific edge cases that an LLM simply wouldn't know about.
Totally lines up with my experience as well. We also have the opposite problem of reams of code generated years ago from a period of unsupervised offshore work where we're slowly paying down the debt but the LLMs will attempt to use the old code for new work. Most of it is spaghetti UI code and nearly impossible to reuse effectively but the LLMs give it their best and we have prompt around it.
I think people miss that "vibe coding" is a senior engineering tool. You still have to architect the project. You still have to ensure security and performance are handled (if relevant for the task). You still have to envision edge cases and usage patterns. You just don't have to type it out.
Now non-programmers can fumble forward to a working demo. But junior engineers are walking around with a loaded weapon - if they are not learning from how AI solves a problem or using it like StackOverflow to answer specific questions, they are blowing up their own careers.
The future is we are all product engineers or ___domain experts. No one is going to want an army of React engineers in 2 years.
BTW, Copilot is not the best at coding. With the quality LLM return exponentially grows. Bigger chunks, fewer bugs, less time checking. From my experience LLMs do not impress on complex algorithms and shine on small utilities. They can use libs I even don't want to learn about.
I haven't really noticed that myself. I go "LLM shopping" fairly frequently trying to find which one of the few I'm paying for gives the best result for the current problem. They all seem to have their shortfalls, although I will say Claude is better for Greenfield work.
Well, probably you can try to go down to simpler models to get the idea. (they are almost useless) From my experience better model like Claude or o3 can do things that others simply cannot. At some complexity they start going circles making wrong decisions, forgetting things that are still in context window. But the thing is those complex tasks are usually the most interesting and important.
> "Vibe Coding" might get you 80% the way to a functioning concept. But to produce something reliable, secure, and worth spending money on, you’ll need experienced humans to do the hard work not possible with today’s models.
This would have been clear from Karpathy's full statement:
> It's not too bad for throwaway weekend projects, but still quite amusing.
I hate how what is effectively a stupid meme phrase became an actual term in a few days.
> "Vibe Coding" might get you 80% the way to a functioning concept. But to produce something reliable, secure, and worth spending money on, you’ll need experienced humans to do the hard work not possible with today’s models.
The problem is that 80% of the job is a proof of concept at best. 80% is effectively a QA walking into a bar[1].
And that's not even getting into the class of testers which try to order ';--drop table beers;-- beers, <script>alert(1)</script> beers or <![ENTITY q1 "&q0;"><![ENTITY q2 "&q1;&q1;&q1;">&q2; style beers
Makes me wonder how organic or astroturfed the name is. That a bunch of influencers all started using it at the same time makes me think someone is pushing it, perhaps because a focus group (or LLM) decided 'vibe' was a warm and fuzzy way of pushing the latest iteration of 'move fast and break things'.
80% mark means you just finished the happy flow and written 30% of the code bases. Now you need to handle the unhappy parts and need to write extra test code for covering all those edge cases.
Precisely, now you need to finish off the remaining 70%, which is where most of the work that is required truly lies, with or without all the AI bollocks. Whether it takes 2 or 8 days of work to get those 30% done, it's not much of a difference if you need several months(or in some cases years) to get a project to a tolerable "production" state. I'd much rather spend 8, as opposed to an AI-generated codebase which will shoot itself in the head because someone scrolled over it and thought "lgtm".
The alternative to using AI to generate 80% of the project (30% of the code) isn't much better: coding the initial 30%, which is also a challenge (from 0 to 1).
So there is still a productivity gain for senior developers.
Over the last week I tried to use a combination of Claude and OpenAI o3-mini to do a direct conversion of about 500 lines of uncommented academic modeling code from Matlab to Python. I can’t stress enough how badly these models performed. Nearly every consequential line had some variety of off by one or logic error, often very subtle. I didn’t try cursor or the more agentic systems, but I would be astounded if they properly rigged up a test harness, inspected the output and were able to respond to the runtime errors. I’d be happy to share the code if anyone wants to surprise me.
This is exactly the kind of semi-mechanical, low added value work that would greatly benefit from automation, and they really fell on their faces. I really benefit from these models on greenfield tasks where I can delegate minor drudge work, but in this case I honestly think they actually increased the difficulty.
As a counter to this, I had grok build an entire set of micro services and all I had to clean up was some format strings. It blew me away. Did in an hour what should have taken a week.
Exactly. "uncommented academic modeling code from Matlab" is something very few people ever do, and the importance of that work is negligible in isolation. Meanwhile, "run of the mill micro services" are like most of what the software industry is doing these days; in aggregate, it's also how the industry delivers most of its value and how it makes most of its money.
I use cursor, and if there’s a nice interface for cursor to use grok I don’t know about it. Which leads to pain if I try to have the built in options try to fix things because they’re nowhere near as competent currently.
I've been "Vibe-TDDing" all afternoon and I'll tell you what, vibe tests are better than no tests.
And so long as you have some decent-to-solid understanding of coding and testing (this is non-trivial, I've been coding for professionally for ~20 years) then you can direct the machine to put up decent guardrails first, and then you can kinda go nuts and let shit grow, prune it back, repeat.
Basically, if you know what code/tests ought to look and act like, then you can significantly reduce the negative externalities of having LLMs do your coding for you.
Yes, agreed. I just ressurected an old code base that I had never finished, as I really couldn't be bothered to test it properly. It would have been several weeks or even months doing this thankless task in my spare time, so wasn't going to happen.
Just for fun, I asked the AI assistant in IntelliJ (free trial) to write most of the tests for me. I was actually blown away. The tests were largely really good. In many cases they were more thorough than I would have bothered with. Even when it didn't manage to write good tests on its own, the AI complete as I was writing them myself was incredibly useful. Most of the time it would predict the line I was going to write next, just press tab to accept.
I did have to review the tests, and there were a few minor mistakes I had to correct. The entire process took about 4 hours - and I was trying it out for the first time.
So this is a massive time saver for me, and lets me take on coding in future I would simply not have had the patience to complete.
I think it comes down to the fact that LLMs are powerful tools but just like a really good saw can be extremely helpful to a talented carpenter to an untrained person it useless unless they actually learn to do the job
> ever since I started to share how I built my SaaS using Cursor
> random thing are happening, maxed out usage on api keys, people bypassing the subscription, creating random shit on db
This idea that now everyone with little knowledge can code is absurd. For sure everyone can code: with hours of dedication, sitting down and trying things out, learning and improving from errors and past experiences. There is no other way round. I don't know how the next generation of coders is going to be like, but my advice still stands: read books, realiable sources, do your homework and "vibe coders" will become so irrelevant that will be extinguished by their own ignorance. Don't get fooled by number crunching programs that seem to "program".
It makes me feel very secure in my job that so many engineers ITT are downplaying the ability and productivity of AI coding tools. You can pry cursor out of my cold dead hands. If you aren't seeing a 10x boost, then you must not have tried it lately, or haven't got the experience to prompt well.
What it excels at:
- Boilerplate code that's been written 1000x, which can saps your time and enthusiasm for the meaty problems beyond that.
- Complex DSA work. It has been demonstrated millions of times in training material.
- Simple and tedious tasks like making dummy data for tests and struct literals.
- Tightly scoped refactors.
Where does it falter?
- Mapping your product/business to the code or abstractions needed. I think this is where junior devs struggle to leverage it.
- Doing large scale multi-file refactors without proper specifics, guidance, and context. It also can't write a huge project from scratch. Humans are still need to fit the pieces all together or provide guidance. I think this gap closes soon.
Code quality simply isn't a problem IME. If it didn't one-shot your dream abstraction, you probably weren't specific enough in the prompt. Most human-written code is also junk, so pointing out a minor gaffes isn't really a dunk on AI. It's still a massive productivity booster if wielded by even a half-competent engineer.
The things you mentioned it does well on are things that help you avoid tedium, but I don't think that's what's most important to businesses. The things you mentioned it does poorly at are the things that matter most.
To pile on: if a large part of our job is purely mechanical, then there is a bigger problem with our engineering processes and AI can't fix that.
> if a large part of our job is purely mechanical, then there is a bigger problem with our engineering processes and AI can't fix that.
It is! And AI is fixing precisely that. What businesses actually care about (well, 99% of them where code is written) is shipping fast and solving the immediate problem, NOT code quality and craft. It goes against what I want to believe as an engineer. Most problems are not new, they are not hard, they are not sensitive. You will need to start with a good understanding of the business need. It's not that the AI can't code to this. I will often stub out an abstraction, explain inputs/outputs in detail, provide sample data etc. That's all. There are frighteningly few showstopper problems with AI coding at this point, and it's moving so quickly.
We're not at the point where non-engineers are capable engineers with AI, but if you are an engineer not using AI extensively, you are being lapped.
I don't think AI is really fixing business problems, though. I think it's only fixing developer problems. And unfortunately nobody really cares about that except for developers.
I just find it sad that instead of focusing on improving how we build things and reducing the need for so much mindless, tedious, repetious, mechanical work, we're content to just build bad things faster with AI and call it a win.
> I just find it sad that instead of focusing on improving how we build things and reducing the need for so much mindless, tedious, repetious, mechanical work, we're content to just build bad things faster with AI and call it a win.
The AI is doing precisely that: reducing the mindless, tedious, repetitious, mechanical work. And what "vibe coding" wants you to embrace is treating high-level code as if it were compiled assembly: an implementation detail you never want to look at or care about if you can help it.
Yes, in some sense AI isn't fixing anything, because all that "mindless, tedious, repetitious, mechanical" code still exists, it's just autogenerated. I too wish we could've first eliminated the need for that entirely. But we didn't, because most programmers and the industry at large still don't understand where the problem is in the first place. They can't see we've long reached Pareto frontier in our programming languages, that we're being limited by the default paradigm of working directly on plaintext codebase that's a single source of truth.
So yeah, in this sense, LLMs aren't fixing anything - they're just an abstraction layer on top of our exhausted coding paradigm.
> shipping fast and solving the immediate problem, NOT code quality and craft
This is also what puts many companies out of business and create huge security issues. If AI is not fixing this but making it worse, then that's not improving software engineering.
> This is also what puts many companies out of business
Those companies you mention just overdid it. Like with everything else on the market, there's a limit to how much value/quality you can optimize away before the end result stops being fit for purpose. However, existence of this limit doesn't stop companies from racing to the very edge of it.
> and create huge security issues.
Security is mostly a solved problem.
Yes, it truly is - at least from the business point of view.
Nobody except attackers and infosec people cares about the mathematical and technical details, or whether your stack or coding practice is secure enough. Not the customers, as they neither understand any of this, nor could do anything about it even if they did. Not the companies, since they manage it at a higher level of abstraction. Whatever holes and vulnerabilities the AI coding introduces, the industry will account for it. Some headlines will be made, some stocks will move, and nothing will change.
FWIW, I don't like either of these things. I'm an engineer in my heart, so it pains me to be constantly reminded that our work is merely means to an end, and matters only to the extent it can't be substituted by some alternative.
Hot take: Vibe coding is going to be the new Excel of technical debt.
Most tech savvy places will avoid it, most good programmers will never encounter it. A bunch or us will make a career out of fixing the mess it makes after it explodes.
My first real job was doing just that at a broker trader which lost 10m on a trade made by an Excel spreadsheet that used a stale yahoo finance API to get exchange rates.
Not to mention the fact that those training clusters are not going to supervise themselves (or, maybe I just think that because I haven't had enough koolaid; where's the AWS Console MCP endpoint?)
Judging by the comments, most people couldn't even tell it was satire, which goes to show how absurd the hype is right now (and probably why it was buried).
He works for a company who tries to sell coding agents. He's absolutely trying to pump it up and induce FOMO. If you can't see that because he hides it behind a layer of humor, that's on you.
ThePrimeagen is doing some kind of vibe coding ad on twitch right now trying to build game in 7 days. There are 10x coders in the room and two days later they are struggling with hilarious basics like off by 1 errors while tweaking something that could be described as donkey.bas.
"ever since I started to share how I built my SaaS using Cursor"
random thing are happening,
maxed out usage on api keys,
people bypassing the subscription,
creating random shit on db
as you know
I'm not technical so
this is taking me longer that usual
to figure out
That one is almost a meme on Linked In. For fun I hope it is a long troll (like a long con but for trolling)
And most of linked in (as the algo show me!) is basically this HN post in 100 words or the polar opposite saying how software engineers have had their chips.
I think it might be referring to a current lack of experience in programming and debugging.
The situation seems to reflect the issue that Kernighan's Law refers to, which is that debugging code is twice as hard (perhaps more?) as writing it in the first place. I imagine debugging AI-generated code might be even harder.
Doing something that's not your primary interest is one thing.
Not wanting to learn new things or understand how things work or put in any effort to succeed because you think you can just cheat your way through life is a totally different thing.
And that's what the difference between using AI programming tools responsibly and "vibe coding" is.
Agreed, but you’re your bunching a lot of people into the category of slacker that don’t belong there.
A new tool released that basically lets you design a GUI through guided voice prompts. The 50yr old school teacher can “vibe code” all she wants.
The real problem arises when someone makes false claims they’re a developer when they’re just managing AI generated code. Making it your primary interest and then cheating to make it seem true.
> Or I’m in a different industry and it’s ridiculous to require years of schooling for a skill that’s not my primary interest?
It took me 6 months when I was 10 years old to become technical and create a website. If a 10 years old can do it in 6 months, an adult can do it in 1 month if they work hard.
I feel exactly the same. On the other hand, the “speed up my work a lot” part is important, and shouldn’t be overlooked. Like, if someone reads this article and figures they don’t need to learn to use AI for their coding job, that’s the wrong conclusion to make.
Agreed, I feel that a developer who refuses to learn to work with AI tools will inevitably fall behind those that do. But I don't see AI being able to replace a developer who can work WITH AI anytime soon (at least on anything non trivial).
I feel like a lot of people are forgetting how good llms are at small isolated tasks because of how much better they've gotten at larger tasks. The best experiences I've had with llms all involve sketching out the interfaces for components I need and letting it fill in the implementation. That mentality also rewards choices that lead to good/maintainable code. you give functions good names so the AI knows what to implement. You make the code you ask it to generate as small as possible to minimize the chance of it hallucinating/going off the rails. You stub simple apis for the same reason. And (unsurprisingly) small, well defined functions are extremely testable! Which is a great trait to have for code that you know can very well be wrong.
In time the AI will be good enough design whole applications in this vibe-code-y way... But all of the examples I've seen so far indicate that even the best publicly available models aren't there. It seems like every example I've seen has the developer bickering with the ai about something it just won't get right - often wasting more time than they were slightly more hands on. Until the tech gets over that I'll stick to it being the "junior developer I give a uml diagram to so they can figure out the messy parts".
Now think where we were 5 years ago, and where we will be in the next 5-10 years.
A lot of kids are going to enroll college to study CS, computer engineering, software engineering, etc. - and will not finish their degrees until 3-5 years. They might just find themselves redundant (junior positions, that is)
The only defense of vibe coding I'll make is that LLMs are very good at identifying decent implementations of business logic, such as workflows I may not have otherwise considered or found on StackOverflow. That then becomes a decent starting point for future iteration, but would never trust the "vibe" of the code itself even if that's what all the AI hypesters are doing.
Despite developing LLMs for years I haven't actually used them much in day-to-day work, but asking Claude 3.7 Sonnet my coding questions has been a superior experience to just Googling them (particularly if there are specific functional requirements/constraints)
That assumes a static demand for development services, which has, more or less, never happened since computing became a thing.
Python, and other high level languages made a lot of development much faster, but it never lead to reduced engineering needs. Cloud made deploying services massively easier, and as a result we actually have a lot more people working in infrastructure.
Faster development mostly leads to expanded economic viability for new types of software. The real question is what becomes economically feasible if development costs are halved.
"In 1865, the English economist William Stanley Jevons observed that technological improvements that increased the efficiency of coal use led to the increased consumption of coal in a wide range of industries. He argued that, contrary to common intuition, technological progress could not be relied upon to reduce fuel consumption."
This definitely feels true for tech companies where the prospect of more more productive engineers improves their bottom line. The same sort of companies that have a near infinite appetite for talented engineers.
But there's a lot of coders in industries whose core business isn't technology. Knapheide for example, a truck outfitting company where my brother codes. I'd imagine in those companies, being able to do the same work with fewer engineers and less cost would lead to fewer hires. Technology isn't their core product and they aren't being held back by software.
I actually think we may see the opposite happen with those kinds of companies too.
At the moment, a truck outfitting company building a customer CRM optimized for their workflow is an absurd idea: they would need a team of a dozen developers working for a year before they could even get a feel for if it was a feasible project or not.
Add LLM assistance and maybe a team of three developers could get to an initial working version in three months.
At that point, companies that had previously ruled out custom software development entirely may find that it makes sense for them - growing the demand for software engineers as a whole.
I guess I'm skeptical how much of an impact better software could have on the business success a truck outfitting company? But maybe I'm being overly skeptical. I bet if I really got in the weeds there I'd see tons of opportunities were better/more software could really accelerate things. Thanks for checking my pessimism!
The job loss depends on the average speed up, however. If the AI is only effective in 10% of tasks (the basic stuff), then that 3x improvement goes down to 1.3x.
That's such a economical fallacy that I'd expect the HN crowd to have understood this ages ago.
Compare the average productivity of somebody working in a car factory 80 years ago with somebody today. How many person-hours did it take then and how many does it take today to manufacture a car? Did the number of jobs between then and now shrink by that factor? To the contrary. The car industry had an incredible boom.
Efficiency increase does not imply job loss since the market size is not static. If cost is reduced then things are suddenly viable which weren't before and market size can explode. In the end you can end up with more jobs. Not always, obviously, but there are more examples than you can count which show that.
This is all broadly true, historically. Automating jobs mostly results in creating more jobs elsewhere.
But let's assume you have true, fully general AI. Further assume that it can do human-level cognition for $2/hour, and it's roughly as smart as a Stanford grad.
So once the AI takes your job, it goes on to take your new job, and the job after that, and the job after that. It is smarter and cheaper than the average human, after all.
This scenario goes one of three ways, depending on who controls the AI:
1. We all become fabulously wealthy and no longer need to work at all. (I have trouble visualizing exactly how we get this outcome.)
2. A handful of billionaires and politicians control the AI. They don't need the rest of us.
3. The AI controls itself, in which case most economic benefits and power go to the AI.
The last historical analog of this was the Neanderthals, who were unable (for whatever reason) to compete with humans.
So the most important question is, how close actually are we to this scenario? Is impossible? A century away? Or something that will happen in the next decade?
> But let's assume you have true, fully general AI.
Very strong assumption and very narrow setting that is one of the counter examples.
AI researchers in the 80s already told you that AI is around the corner in the next 5 years. Didn't happen. I wouldn't hold my breath this time either.
"AI" is a misnomer. LLMs are not "intelligence". They are a lossy compression algorithm of everything that was put into their training set. Pretty good at that, but that's essentially it.
This is what is really interesting. What will they do in 10 years. Will you need to learn mathematics to Phd level to produce code that an LLM cannot produce. Will we all become business analysts (Will AI do that too?). I don't think BA is a step down or up. It is probably interesting I did think of going down that path.
People laugh at coders like we are the only manual loom operators when everyone's job, even PotUS can replaced by the AI we can dream will exist.
My thoughts: buy SPX so you own a sliver of the new overlords.
Guess what happens when you can suddenly build features ahead of schedule? You can make them bulletproof instead of cutting corners.
Our code is better and more robust than it's ever been. Our rate of user-reported bugs have dropped more than 50% since we started "vibe" coding 6-ish months ago.
I'd take strong opinions for/or against with a grain of salt. It's likely you're suffering a bit from baader meinhof.
I could certainly see a possible reduction in engineering team size, but going from 9 to 2 makes me question how much of that reduction was a result of over-hiring in the first place.
There's def a lot of gatekeeping going on by the real coders™. The rest of us are just learning and adapting. Personally, I'm glad that it's now easy to build decent looking UIs and quickly tune SQL.
Product people love the idea of being able to fire their dev teams, but I'm not sure they understand the implications (some of wich may not become clear for years).
It's interesting that you describe yourself as a developer now.
Because just three months ago, in your first post [1] to HN, you said:
> I'm somewhat non-technical but I've been using Claude to hack MVPs together for months now.
Sure: you might feel as though you have now 10x'ed yourself. But, quite honestly, when the reality is that just a few months back you self-described as "somewhat non-technical", it's clear that (a) you're at such an early stage in your learning and understanding of tech, as a developer, that it's relatively easy to experience bigs gains, and (b) you can't actually have much of an objective measure on this, because you are in fact quite new to the field.
I read a lot of your other comments. To me, even before I had confirmation that you were actually "somewhat non-technical", and fairly new to the field — effectively a junior developer by any real measure — this was already quite apparent to me.
Based upon having been a developer for some decades myself already: I can generally spot those that talk-the-talk — and similarly: I can generally spot those who have non-trivial / deeper experience with various fields of tech.
Powering-up with AI tooling doesn't remedy that. Even if it might seem otherwise from your "somewhat non-technical"-but-newly-empowered position.
Good luck with your coding endeavours though, and with your evangelism.
I have no doubts at all that the world is changing — including how software is developed. But I see your posts for what they are.
Yeah I was a jr developer for a year before I became a PM. That's the definition of being "somewhat non-technical" as I put it.
you've been a developer for some decades which is why your reality is threatened that your craft is increasingly becoming irrelevant so you had to snoop my profile to find some confirmation that your reality doesn't get shattered
this is nothing new of course. obnoxious neckbeard engineers who don't understand where the world is going have existed since the unix debates on irc. you'll find plenty of people who agree with you on mastodon lol.
> you've been a developer for some decades which is why your reality is threatened that your craft is increasingly becoming irrelevant so you had to snoop my profile to find some confirmation that your reality doesn't get shattered
Hahaha - no, that's really not accurate at all. On lots of levels. The ability to read another user's comments is there so that anyone who chooses can actually get a better understanding of who they're talking with, and what that person is about. One doesn't have to feel threatened at all to want to use it, one simply has to be intellectually curious, and interested to find out more...
There's no need to try and portray it as a negative, and make out there's something afoot which isn't actually taking place.
Anyone who's been here on HN for any significant amount of time knows exactly what that feature is for — as well as when it might be best to use it. And people absolutely will use it.
It helps separate the wheat from the chaff.
— Please do try and take care that your wide-of-the-mark unnecessary put-downs and name calling don't violate the HN guidelines! (Just for your own good!)
Ah right, junior dev for a year. Wow, how amazing.
Plenty of room for you to 10x many times over then.
Over the years, I’ve met plenty of folk who have dabbled with software development, before deciding it wasn’t for them - then pivoting to something less technical.
Nah, I don’t feel threatened at all by AI. My job is secure. Tools change, sure. But there’s plenty of years left in software development for sufficiently skilled humans. No matter what a junior-level dev / AI evangelist might claim.
I’ll be cleaning up and properly re-implementing the MVPs that less knowledgeable folk are throwing together, slap dash. For a long while yet. And doing other stuff that AI simply can’t do properly - and quite honestly is quite far from doing.
Your rhetoric betrays your knowledge, and your bravado and insults can’t make up for that in any way.
It’s easy to get enchanted by current generative AI, and believe it far more capable than it is. Particularly if not overly skilled in whatever ___domain. Particularly if one doesn’t have much of a grasp on how generative AI actually works. Good luck with that.
Unfortunately there’s no bans for stuff like that.
But that’s why I call it out: yes, exactly, it degrades the conversation when someone is preaching about a new tech, and how it’s gonna change development, and claiming they’re a developer themselves - while not being upfront about the fact that they’ve not actually got much real-world experience as a developer at all in general.
And this kind of thing should always be called out when spotted. It’s just plain disingenuous at the end of the day.
I’ve probably been contracted to fix more broken projects (by devs who royally messed up), than the count of MVPs this person has made, or indeed the number of months they’ve been coding.
But at the end of the day, these kinds of folk simply make us more experienced folk more valuable to those that need a professional service in a bail-out scenario. I’ve got decades of real-world coding experience, and a healthy list of successfully published / deployed projects, including some fairly big clients over the years. My CV speaks volumes, particularly when contrast against someone with little experience in the field of software development. I’ve seen languages and tooling come and go. I’ve headed teams and worked solo. I’ve witnessed plenty of folk
like this in my time. It’s certainly not my first rodeo!
Unfortunate that someone chose to downvote me, as opposed to engaging me in conversation as to why my view might perhaps be incorrect or maybe shortsighted - as per the HN guidelines. But no real surprise - I guess that in itself is quite telling here.
Karma points might come and go sometimes, but whatever: I’ve been posting on HN (and other sites) for years, on and off. I’ve no need to try and portray myself as something I’m not, nor portray myself to have skills or experience that I don’t have. I generally post to share my knowledge and experience, because real-world experience adds up over time.
How ironic. I have literally made my career and fortune building the very systems that you’re yapping about.
Edit: see the other comment where you are called out for the lying fake that you are. How delightfully pathetic of you.
lol "fortune" whatever you say. that's why you're desperately snooping my other comments including that lame "gotcha" to find some confirmation I'm lying?
it's ok, all craftsmen who got automated away once thought they were special. you're not the first you won't be the last. you will likely be unemployed soon though.
Open mind here. Spill more details please. We're the 9 good or was their dead weight. Could the 9 to 2 have been done without AI anyway (because less work to do).
So why wouldn't you keep them? If you're able to produce even more with AI enabled engineers, why downsize? To me, it sounds like a startup's dream to be able to output more without increasing headcount.
Trying to get my head around this. It must mean 90% of what they were doing was writing code. Like not even thinking, architecture, gathering requirements, make sure you built the right thing etc. Just generating syntax.
In my experience it's usually a few devs doing the most of that, and the rest are largely banging out features, debugging, refactoring. There's also just a lot more efficiency when you shrink the team.
> Cursor has some sort of "concise mode" (archived) that they'll turn on when there is high load where the model will still be rated at the normal price but behaves in a useless manner. This mode will omit details, drop important findings, and corrupt the output that is being produced.
This is a real problem that I have experienced on and off. It's getting to the point where everyone on my team is actively looking for alternatives. Generally, I've found Cursor works correctly after business hours. But, it's increasingly giving absolutely useless responses during business hours.
-----
That being said, I agree with many of the author's observations. However, for me, it's not really a breaker. It's not much different than working with an intern or junior engineer. If you ask them to do too much all at once, they come up with bad solutions. Plus, they have a tendency to make "dumb" decisions.
For me, I've found solutions for nearly all of the listed issue. Much of it comes down to being diligent during code review (like you should). For example, the Typescript issue, I come back later to have it fix it.
Specs are the one that still baffles me. It's absolutely terrible at writing proper specs. In particular, it falls into a really bad cycle whenever there are errors. I don't have a solution for this one.
> Generally, I've found Cursor works correctly after business hours.
Interestingly the times I've experienced the most weirdness were during extremely not normal business hours (from the California perspective). For 3 nights in a row last week, I found myself coding at/after 2:30am during what were apparently periods of excessive load on Claude Sonnet. When asking Cursor to do things, it would fail, tell me about the high load, and encourage me to try again soon. Well, I just kept clicking the button over and over again, thinking it would eventually be able to handle the request properly, and otherwise continue presenting the error. Not the case!
Incorrect/hilarious things Cursor/Claude did at points during those nights:
- repeat the inquiry back to me in full, then do nothing at all after that
- confidently assert it had located the bug I was looking for, then direct me towards the entire codebase
- assert that it had done what I asked, and request that I approve the changes it wanted to make to my code, which were... nothing, none whatsoever
- (possibly the most hilarious) begin to answer questions in borderline leetspeak, randomly substituting numbers in place of letters in words, before eventually devolving into total gibberish
Mostly just annoying due to the wasted time, though it's possible the entertainment value negated it. I don't expect miracles from Cursor to begin with, nor do I give it wide latitude to change very much in my projects, so the risk of damage wasn't really any worse then than at any other time. Of course, I am not a team working against deadlines on critical projects, just a guy screwing around at 2:30am.
The reference to Claude Plays Pokemon isn't applicable to the discussion of vibe coding, although the suggestion that AI agents can fix the issues with vibe coding is funny in an ironic way given the disproportionate hype around both.
The issues with Claude Plays Pokemon (an overview here: https://arstechnica.com/ai/2025/03/why-anthropics-claude-sti... ) is essentially due to the 200k context window being finite, which is why it has to use an intermediate notepad. In the case of coding assistants like Cursor, the "notepad" is self-documenting with the code itself, sometimes literally with excessive code comments. The functional constraints of code are also more defined both implicitly and optionally explicitly: For Pokemon Red, the 90's game design doesn't often give instructions on where to go for the next objective, which is why the run is effectively over after getting Lt. Surge's badge as the game becomes very nonlinear.
Although, both vibe coding and Claude Plays Pokemon both rely on significant amounts of optimism around the capabilities around LLMs.
My, very limited experience with LLM assisted coding is that it depends...
For basic frameworks done in something like Python it is very good, but not perfect, yet. But the iteration cycle to get to where you want to be is still faster than doing the whole job manually and I see this as a big win.
For more esoteric fast changing languages/frameworks it has me chasing my tail in a chain of code updates where each fix breaks something in the n-1th, or n-2th version. Sometimes it's deprecated code, or it halucinates functions that would be valid if your were using a a different language of framework. And sometimes simple coding errors.
But it will get better, a lot better.
The main benefit is that it will let a invested non programmer client build a functional framework prototype and then combine that with a list missing features that a more skilled programmer can flesh out to a first cut solution.
For the first time we 'might' get better requirements with an actual working model instead of having the implementor doing most of the requirements as a first pass from a high level hand wavy requirement. I think we're going to see some amazing tools for this.
What I don't see it doing is creating original algorithms to solve things being done for the first time.
I see statements like this a lot when talking about AI in general. People seem to think it is a foregone conclusion that no limit to LLM model improvement and capability exists. What causes you to believe this and what evidence do you have to back it up?
Because compared to a year ago, it's much better. Compared to two years ago, it almost didn't exist. Compared to three years ago, nobody was actually talking about it.
Like I get the jokes, and I totally agree that they won’t totally replace the humans. But come on, the way an average coder writes anything nowadays has dramatically changed. Especially for web and app stuff. I’ve onboarded some junior/mid-level engineers recently, and it’s such a different experience compared to 5 years ago.
Really? Opinion, based on the fact that there are basic improvements that can be implemented on what we have now, using the skills that we have now. If you don't agree, that's ok.
Now, about your comprehension skills, where is there any mention on my part of there being 'no limit'? In fact I go as far as to speculate on at least one.
I've been messing with this for a few days now so I'm not going to claim to be any sort of expert but as someone who has been coding for more than 20 years I do appreciate the set it and forget it nature of being able to throw q developer or whatever at a relatively simple problem that I'm curious about and let it crank away for half an hour while i'm working on something else. I've tried it on a couple of reasonably small and well defined problems, mainly focusing on python, and it works surprisingly well. It'll run the scripts and fix errors and can suggest prompt improvements. I've also tried it in a large codebase with much less success, so YMMV.
Also it is important to be able to review the code because it could be the case that it looks mostly correct but has some subtle errors in it that can mislead you. For example I was trying a couple of different ways of computing some indices that have a bunch of variables and one way had a mask that made no sense involved. "Vibe coding" without being able to check the work of an LLM is almost certain to go poorly, IOW.
And when it finally works..? I suspect (vibe) coding can be significantly improved by multi-step structural approach. Including coding, review, testing in a loop. Fully automatic. Like Chain of Thoughts helps solving logical tasks. This can be done today. The main limitation here is affective context window. I.e. big project doesn't fit. The solution is splitting, separating, and documenting. Brief doc instead of the whole code in prompt.
With this developers need to level up to architects, get more ___domain knowledge.
Making the arguing that these tools have flaws seem like a losing battle. Soon those flaws will be fixed[1] and youll have to find new flaws to complain about. Eventually hopefully you'll realize that you just don't like feeling displaced.
[1] it's unbelievable what a difference in quality 1 year made for chat gpt
Why are you so certain that flaws will be fixed? Seems like there is a giant leap between a machine spewing words based on probability and actual deep understanding of the code it's suppose to write
A "machine spewing words based on probability" is an implementation detail. I'm not making a grandiose prediction about the future. All I'm saying is that these machines are improving super fast.
I'm also stricken by the superficiality of analysis like "oh it's just probabilities" from so many devs; might as well say "it's magnets".
In my comment I was questioning the certainty that those fundamental flaws will be fixed. I'm one of those people who don't believe that iterating over LLM will make that giant leap.
You can call it an implementation detail but it's like both a wheel and a wing can take your over some distance but the difference between them is staggering. Wheel will never send you flying (normally)
> For now, they are worth evaluating and discussing, but are not ready for us to delegate the precise task of creating reliable, secure, and scalable software that powers our society.
The good thing about vive coding is it avoids the software development lifecycle completely from the user perspective in a platform that has an integrated SDLC which means from defining idea to ensure visibility in changes to a runtime where the user can see it. In my mind, modifying without a hassle in a controlled environment is what users look for. Software development assisted by AI will be a thing for engineers but Vive coding is aimed for users outside of engineering.
I sadly see only a handful of companies being able to pull this off.
The thing is the SDLC exists to harden software to a point where you can run a business on it. It didn't appear because people were bored. This feels more like a comparison to a Figma or other prototyping tool than something that can produce quality software.
If you believe in vibe coding, surely you are holding a massive short position in every major software company, no? I mean, surely, any day now, a bored student will vibe code a full replacement for a major software package and destroy the income of the SW giants one by one, right?
Wake me up when someone vibe codes a Chrome replacement, or an iOS replacement, or MS Office...
Except we know this won't happen anytime soon because we all know vibe coding isn't very useful beyond toy projects that leverage complex libraries written by actual developers.
I mean ... most code out there is pretty bad, so LLM assistants contributing pretty bad code just keeps the mean where it is. And obviously it has to be, how can anybody expect an LLM to produce output with quality that's higher than its training input? Expecting that is appealing to magic or some consciousness that doesn't actually exist or just plain anthropomorphising.
If you are working at a place where that quality level is standard -- and let's face it, a large number of companies produce average or below-average quality code (by definition) -- then using an LLM assistant isn't that bad. At least if such an assistant doesn't have some extra flaws beyond producing the best summary of its training data, which is exactly what an LLM does. It actually justifiably replaces developers in such an average-or-below place. But if you are aiming for the top end of the quality scale then there is no way this can be achieved by LLM output. Purely on principle.
This shouldn't even be a controversial opinion. I'm quite surprised every time this is questioned or even just debated.
"Bad" is doing s lot of work in your sentence. Do you mean slow? Or buggy? Or unmaintainable? Or unextendable? Uses patterns the tech lead hates? Hard to read? High cyclomatic complexity? Doesn't meet requirements? Security issues? Uses out of date libraries? Too much reliance on 3rd party code? Too much NIH? ...
I think "one shot ready for production code" is what AI cannot do yet. Which is why I am not worried for another 12 months at least :)
Strongly agree with the article, and happy to see so many lucid people, comments and articles on HN that thoroughly deconstruct the "vibe coding" illusion.
Also, Andrej Karpathy really disappointed pushing such brittle BS as a revolution.
I wrote this because I was worried that "vibe coding" was being misinterpreted to mean "any time an LLM outputs code", as opposed to the intended definition of code where you deliberately don't review the code and see how far you can get.
1) Cursor has been crashing several times an hour for me recently.
2) Cursor seems to ignore .cursorrules files. I'm using the json format that's supposed to let you filter on file name patterns (although how that works for cross-cutting agent stuff I don't know).
3) Cursor is obsessed with making sketchy iffy defensive code checking for the most recent symptom and trying to guess and shart its way out of it instead of addressing the real problem? And it's extremely hard to talk it out of doing that, I have to keep reminding it and admonishing it to cut it the fuck out, fail instead of mitigate, address the root cause not the symptoms, and stop trying to close the barn door after all the horses have escaped. It's as of it was only trained on Stack Overflow and PHP manual page discussions.
(3) Cursor does not do the coding, it delegates that task to whatever model you have picked. In more recent versions, it does pick the model automatically, which is not necessarily what I would prefer. Some of the models it delegates to may have been trained the way you suggested here, others are more sophisticated. The fact that none of the models is controlled by the Cursor crew quite naturally means that it will have quirks talking to newer models.
Saw this submission earlier in the day and chuckled. The whole "vibe coding" thing is hilarious, and I say this as someone who heavily leverages AI in my coding tasks.
I honestly am not sure if this ad is a joke. I assume not, which is hilarious. Put in you 12-16 hour days for hilariously bad pay, and your onboarding will be doing one of the most pathetic, deadbeat jobs possible which is making collection calls. And your "vibe coding" is to use voice agents to...make collection calls.
Must be pretty grim pickings if this trash is getting advertised on here.
Please can everybody who dislikes LLMs stop conflating "vibe coding" (fun, see how far you can get without actually coding but not intended for serious projects as per karparthy's original tweet) with the grifter version that sells it without the cautionary note, or with LLM tool usage as a whole class.
They are 3 different things, and neither of the first two represent anything more than a subset of the capabilities of the latter.
If you don't like LLMs that's cool, but at least take some time to understand the context here.
That one would be extra awesome because by the time anyone realized there was some subtle stats bug in the resulting kernels, it'd have vaporized $10MM in cloud training opex
> There's a trend on social media where many repeat Andrej Karpathy's words: "give in to the vibes, embrace exponentials, and forget that the code even exists." This belief — like many flawed takes humanity holds — comes from laziness, inexperience, and self-deluding imagination.
I'm going to go ahead and give the author the benefit of the doubt that they aren't literally saying Andrej Karpathy is "lazy and inexperienced", because that claim is obviously absurd.
In general though, I think the author is missing the actual point Karpathy was making! Let's look at his detailed criticisms for the typescript agent run, for example:
> Regularly clones TypeScript interfaces instead of exporting the original and importing it.
> Reinvents components all the time with the same structure without searching the code base for an existing copy of that component.
These are only problems for human codebases. You're not vibing if you are expecting agents to write code the way humans would.
Duplicating interfaces and implementations is inefficient, and would be a nightmare, in a human codebase. But, the code will still work! So if an AI agent is managing the codebase, who cares if it duplicates things all the time?
Maybe it'll see that it did that later and decide to consolidate things, maybe it won't. It doesn't affect the actual outcome of the code, unless you actually look at the code as a human, which is not "vibe coding."
> When told to fix styles with precise details, it will alter the wrong component entirely.
> When told specifically where there are many duplicated components and instructed to refactor, will only refactor the first instance of that component in the file instead of all instances in all files.
> When told to refactor code, fails to search for the breaks it caused even when told to do so.
You're thinking about the code again, gotta stop doing that if you actually want to ~vibe code~. Refactoring code isn't a thing when you're vibe coding, English is your programming language now, the Typescript (or w/e language) is the assembly. You wouldn't spend much time observing the assembly output of your compiler (especially for web dev), so why are you observing the code output of your agent?
If you don't want to vibe code, that's fine, nobody is forcing you to. But if you're going to do it, grade it on the metric that Andrej was actually claiming: that you can get working results on a lot of software projects today by telling coding agents to make some code do something, and then just keep running it with "fix this bug" until it works, and it'll often get to a working result.
He never claimed that the code outputted would be beautiful, from a human perspective, or well formatted, or well architected, or efficient.
Vibe Coding is a trigger word for devs who insist it's a pointless exercise because it doesn't do 100% of the job. Devs don't seem to realize that's not the point - the point is you can hire less devs if you're only worried about the remaining 20%.
Currently, AIs emulate a less skilled, junior developer. They can certainly get you up and running, but adding junior developers doesn’t speed up a lot of projects. What we are seeing is people falling into the “mythical man month” trap, where they believe that adding another coding entity will reduce the amount of work humans do, but that isn’t how most projects come out.
To put it simply, it doesn’t matter if AI does 80% of the work if that last 20% takes 5x longer. As long as you need a human in the loop who understands the code, that human is going to have to spend the normal amount of time understanding the problem.
Indeed. My roommate has just been put on a new project at his workplace. No AI involved anywhere. But he inherited a half-done project. Code is even 90% done. But he is spending so much time trying to understand all that existing code, noting down the issues it has which he'll need to fix. It's not just completing the remaining 10%. It's understanding and fixing and partially reworking the existing 90%. Which he has to do, since he'll be responsible for the thing once released. It's approaching a point where just building it from scratch on his own would have been more time efficient.
It seems to me that LLM output creates a similar situation.
Yeah but AI coding does speed up some simple tasks. Sometimes by a lot.
But we have to endure these tedious self-congratulatory "mwa ha well it's still not as good as my code" posts.
No shit. Nobody is saying AI can write a web browser or a compiler or even many far simpler things.
But it can do some very simple things like making basic websites. And sure it gets a lot of stuff wrong and you have to correct it, or fix it yourself. But it's still usually faster than doing everything manually.
This post feels like complaining about cruise control because it isn't level 5 autonomy. Nobody should use it because it doesn't do everything perfectly!
> This post feels like complaining about cruise control because it isn't level 5 autonomy.
It's nothing like that, because cruise control works reliably. There is never a situation where cruise control randomly starts going 90mph or 10mph while I have it set to 60mph. LLMs on the other hand...
This is why I disagree with people who argue (as you did) "it really does speed up simple tasks". No it doesn't, because even for simple tasks I have to check its work every time. In less than the time it takes me to do that, I could've written the code myself. So these tools slow me down, they don't speed me up.
> In less than the time it takes me to do that, I could've written the code myself.
This hasn't been my experience at all. At worst you skim the code and think "nah that's total nonsense, I'll write it myself from scratch", but that only takes a few seconds. So at worst it wastes a few seconds.
Usually though it spits out a load of stuff, which definitely requires fixing up and tweaking, but is usually way faster than doing it all.
Obviously it depends on the ___domain too. I wouldn't ask it to write a device driver or something UVM or whatever. But a website interface? Sure. "Spawn a process in C and capture its stdout"? Definitely. There's no way you are doing that faster by hand.
Honestly, I'm not sure if there is any correspondence between an AI and a particular skill level of developer. A junior developer won't know most of the things an AI does; but unlike an AI, they can be held accountable for a particular assignment. I feel like AI is more like "a skilled consultant who doesn't know that much about your situation and refuses to learn more than the bare minimum, but will spend an arbitrary amount of time on self-contained questions or tasks, without checking the output too carefully." Which is exactly as useful yet infuriating as it sounds.
Remember that 80% of your time and resources is going to be spent finishing up the last 20% of the project. If the first 80% is borked by LLM code salad, you’re going to need to spend time fixing that code and making it actually work. That might take just as much time, if not more, than only using AI as an assistant (i.e. code completion) instead of the main source of code.
I'm currently 2x to 10x as productive with Cursor. The larger the project, the lower my multiplier.
However, on small tasks and bug fixes, it often fixes the bug before I've even root caused it. It's amazing when I can focus on throwing it information about the bug then have it think in the background while I continue researching. In a surprising number of simpler cases, it one-shots the fix and eliminates any need to root cause (this is a bit easier when it's a feature you understand intimately).
Exactly. I see these threads over and over, and it's just senior devs complaining about how it's the tool's fault, and not that they haven't put the time in to learn the new tool
The cycle of tool and framework re-skilling is constant in industry, and those trying to fight the wave always lose. And this one is a tidal wave. UPDATE YOUR SKILLS FOLKS!
Most programmers are already doing a form of vibe coding when they, for example, let an ORM write their database queries for them. I think a decent number of Rails and Django devs probably would struggle to write raw SQL queries from scratch. Mostly I see vibe coding as an extension of that, and it's not necessarily a bad thing since it lets you spend more time focusing on the actual problem you're solving rather than on implementing it.
Of course, I haven't seen a single vibe-coded thing that I'd want to spend money paying for yet, but that's probably more reflective of the difficulty of making something people want than whether or not you use vibe coding to do it.
And some circles hand wave away all criticism of any new thing as luddism.
This article is a bit more balanced, though, and clearly isn't criticising use of AI in programming, but specifically the "Jesus take the wheel" style of vibe coding. It's the same old "if you write code as cleverly as you possibly can, you are not smart enough to debug it", but to the next level, where people are writing code that they aren't even smart enough to read.
For example, here are YC partners quoting a company in a batch claiming "100x speedup" in coding performance compared to the previous month:
https://www.youtube.com/watch?v=IACHfKmZMr8&t=1837s
You can tell this claim is false, because that level of productivity increase would be glaringly obvious to an outside observer; it wouldn't need to be self-reported.
A YC summer batch is 84 days culminating in Demo Day. So a 100x speed improvement would be like a team spending less than 1 day of coding and ending up with something that's on par with Demo Day in terms of functionality. Maybe the design would be wrong, but that wrong design would be just as fully-featured as a Demo Day app.
So if 100x were true, the partners in that video would be talking about how the new batch dynamic is "They get breakfast with a customer, learn something new, have an epiphany, and then later the same day they have their entire app rewritten based on what they learned, and that scratch-rewrite is already at a Demo Day level of functionality." The partners aren't talking about that dynamic because it's not happening. So clearly the self-reported 100x is inaccurate.
Even 10x would result in partners saying "Whoa, in this batch people have a Demo Day-quality app in production by the end of week 1 instead of week 12." The partners have a huge sample size on how much teams get done in what time period, so it would be glaringly obvious to them if this batch were shipping 10x as fast as previous batches.
That external observation would be the headline if it were what the partners were actually seeing. Since that's not the headline, it's clearly not what they're seeing, so 10x can't be the number either.