I would be very interested to know from everyone here:
What is the most impressive thing you have managed to get an AI to code - WITHOUT having to babysit it or give it any tips hints or corrections that a non-coder would have been unable to do.
So far most impressive AI achievements for me were both by ChatGPT o1 pro:
Case one. I configured IPsec VPN on a host machine which run docker containers. Everything worked from the host itself, however containers were not able to reach IPsec subnet. I spent quite a bit of time, untangling docker iptables rules, figuring out how iptables interacts with IPsec, running tcpdumps everywhere. However my skills were not enough, probably I would resolved the issue given more time, however I decided to try ChatGPT. I made a very thorough question, added everything I tried, related logs and stuff. Actually I wanted to ask the question on some Linux forums, so I was preparing the question. ChatGPT thought few minutes and then spewed one iptables command which just resolved the issue. I was truly impressed.
Case two. I was writing firmware for some device using C. One module was particularly complex, involved management of two RAM buffers and one external SPI buffer. I spent two weeks writing this module and then asked ChatGPT to review my code for major bugs and issues. ChatGPT was able to find out that I used SPI to talk to FRAM chip, it understood that my command usage was subtly wrong (I sent WREN and WRITE commands in the one SPI transaction) and highlighted this issue. I tried other modes, I also tried Claude, but so far only o1 pro was able to find that issue. This was impressive because it required to truly understand the workflow of the code and it required extensive knowledge of protocols and their typical usages.
Other than that, I don't think I was impressed by AI. Of course I'm generally impressed by its progress, it's marvellous that AI exists at all and can write some code that makes sense. But so far I didn't fully integrate AI into my workflows. I'm using AI as Google replacement for some queries, I use AI as a code reviewer and I'm using Copilot plugin as a glorified autocomplete. I don't generate any complex code with it and I rarely generate any meaningful code at all.
I can take my blog or website, fire up aider, and ask it to completely reformat or restyle it or add effects. It does a fantastic job at that, and that's something I'd have to pay $20 on Fiverr to get done before because I cannot coordinate colors or style things to save my life.
This is a good example. It's really good at updating old, small projects. Whereas most humans need to build context about the project, and do so slowly.
I wonder over time how those small, infrequent updates might hamper the ability to perform the next, small infrequent update (as your code begins to resemble less and less any examples the AI might have seen and more and more a kludge of differing styles, libraries, etc.), but that's really not any different than how most projects like a personal website operate today.
Well in this case this is a complexity-preserving change. The same few template files get updated with each change. So, I can't say for sure but I imagine this is effectively zero cost.
The conditions you’ve set here are very strict, but I have something that may almost fit. I’ve used Grok 3 to create a decompiler and then a compiler for a made up scripting language (for scripts extracted from Final Fantasy VII data files). I’ve fed it an example of how the low-level “assembly/opcodes” look like and then how I would like the higher level scripting language to look. With minimal guiding (I was telling it what I want, but not how I want it) it implemented a working decompiler and compiler that worked exactly to my specifications, written in Typescript. Created a custom tokenizer, parser to a custom AST structure and everything else that was needed. It took a few prompts to get right (and to add features I didn’t think about initially), but the resulting 1400+ lines of code I found to be very impressive for what a “fancy autocomplete” as many people say could generate.
> I’ve fed it an example of how the low-level “assembly/opcodes” look like and then how I would like the higher level scripting language to look. With minimal guiding (I was telling it what I want, but not how I want it) it implemented a working decompiler and compiler that worked exactly to my specifications, written in Typescript.
I think this already falls out of OP's guidelines, which you pointed out are quite strict. They also happen to be the guidelines an AI would need to meet to "replace" competent engineers.
If there used to be 10 engineers in a team, and with LLM they only need eight because LLM help the team achieve the same workload, then two of them have been "replaced", even though the LLM can't do everything an engineer does.
I haven't heard of any prediction that AI will completely replace all programmers.
This doesn't make sense though unless your company isn't growing. Most dev teams have large backlogs of work and any increase in productivity can lead to an increase in revenue generation.
Programmers aren't paid to generate code. They're paid to figure out how to do X Y and Z business initiatives. Unless you don't have that many initiatives, how much sense does it make to let go of the people who can do that for you? It makes sense when companies are trying to cut costs, but that happens when the cost of borrowing is higher than the realizable profit margin
Exactly. The argument never was that AI will replace every single programmer. But it looks like it will replace a good chunk of them. How many exactly remains to be seen.
I know popular sentiment is writing compilers is hard but not simple ones like this one without optimizations. Basic stages like lexing, parsing, building AST, and code gen aren’t conceptually difficult, just a lot code to churn out.
Though I’ll admit LLMs have been weirdly good at regex.
Well you’re saying this because you’re probably already familiar with the subject. I remember learning about writing your own compiler for the first time and it took me many hours to write the most simple lexer and fully understand it. And now an LLM can write that to my exact specific in minutes instead of hours. To me that’s quite a breakthrough.
"WITHOUT having to babysit it or give it any tips hints or corrections that a non-coder would have been unable to do"
I find that question impossible to answer, because my programming experience influences everything I use LLMs for. I can't turn that part of my brain off.
Getting AI to write code for you starts with understanding what's possible, and that's hugely informed by existing programming knowledge.
I won't ask an LLM to build me something unless I'm reasonably confident it will be able to do it - and that confidence comes from 25+ years of programming experience combined with 2+ years of intuition as to what LLMs themselves can handle.
Nothing that I've cared much about. Little python utility scripts like "merge the rows of these CSV files in such and such a way" are about the only thing that has worked first time without me adding more of the "how" to do it.
Essentially, for "real-life work scenarios", the performance is not that great comparing to gpt-3.5 with the exception of Claude 3.7 and GPT-4.5. Bear in mind, the question was not "challenging" in a "genius thinking required" way.
That being said, I use LLM regularly for discovery. They do waste some of your time because of hallucinations but you can call their bullshit most of the time and if not, it is still faster than Googling.
You steer ChatGPT 3.5 model to find the handlers HashMap issue that it didn’t find initially, but you dismiss the other models for not immediately recognizing it?
Python scripts for managing and automatically editing photos. Anything bigger and once the code reaches a certain point it starts actively breaking things, and in some cases, makes them unrecoverably worse. I have gotten benefit as a programmer being able to break down projects and spot when it's going down the wrong path, but I think you need quite a bit of experience to use AI on large codebases or with types of code it has few training examples of. I've also caught it 'fixing' code by essentially silencing errors without addressing the underlying problems.
Built out an entire web accessibility monitoring / scanning tool. The backend/scanning system required my knowledge as a programmer but the UI was entirely vibe coded. I said "build a UI for this scanning system on Cloudflare workers and use Hono.dev", and then just described in plain English how I wanted the UI to work.
Did this over a few weeks in my free time and now it has all of the features of accessibility monitoring SaaS that I was previously paying $600/mo for.
This is also something I've realized about LLM coding: I have learned wayyyy more about tech I wouldn't normally try just by vibe coding and explaining what I want to exist. In the last few months I've learned Cloudflare workers/queues/Puppeteer, heavy Postgres features like policies, triggers, functions, trigram search, migrations, and a ton about RLS.
I _could_ have learned this stuff on my own but the friction is just too high, but with LLM coding it just shows me a working solution and I can tweak and ask questions from there.
Hono is pretty great, too! It has exactly what I've always wanted in a backend JS framework which is a React/JSX-like way of writing UI components but without actually using the React renderer on the FE. Next.js and SSR obviously also offers this but I don't think Hono even uses that, it's just JSX.
How to write a Presto SQL query to get X result given example Y database where I fill in Y as a csv describing the table with the columns my query needs and fill in X as a csv describing the results I'd like to get out.
It doesn't always work, but I have an easy, quick way to test it out. When it does work, I've often saved lots of time.
I've gotten it to write complex graph layout algorithms, including converting them to all work in the same orientation (about half of graph layout papers are written to produce vertical graphs, and about half horizontal graphs).
This part of a node ui for modeling satisfactory (the game) factories. Mostly for fun and learning, honestly.
So like, I have code to find optimal production chains and solve the node graph, using ILP through pyomo or Z3.
Some of the optimal production chains are ... a lot of nodes, as are plenty of factories.
Without really good layout, it becomes a mess. Existing modelers sort of suck and have no auto-layout.
I bridged Elk (and elk.js, actually) into python, but wanted a fallback since this was ... crazy enough already and who knows if it will work on anyone else's computer :)
So AI wrote me about 4000 lines of just about 100% correct python graph layout code. In batches, one algorithm at a time, that i then combined. I did have to tell it what I wanted piece by piece, so i had to learn a lot about it - i could not get it to do all the pieces at once iteslf. I suspect due to context length limitations, etc.
It comes very close to ELK when it comes to this type of layout (elk supports other layouts, edge routing, etc), and implements the same algorithms with the same advanced techniques.
Wow! I'm glad I asked, that sounds awesome. I hope you get a chance to do a writeup with some screenshots at the the end of it; that would make for a great read. Do you have any code public yet?
And TIL about elk! I've had kind of a half project, called `gstd`, which tries to create a standard library/API to make working with node graphs in code as easy as working with other abstract data types, like Arrays. One of the important bits is that it lets you console log a graph so you can see it. I've been using d3 force directed graphs for layouts, and have played around with some custom layout algorithms, but have yet to find a good solution. Elk might actually be just what I've been looking for there!
What is the most impressive thing you have managed to get an AI to code - WITHOUT having to babysit it or give it any tips hints or corrections that a non-coder would have been unable to do.