Hacker News new | past | comments | ask | show | jobs | submit login

I tried this (though with a different tool called aichat) for extremely simple stuff like just "convert this mov to mp4" and it generated overly complex commands that failed due to missing libraries. When I removed the "crap" from the commands, they worked.

So much like code assistance, they still need a fair amount of baby sitting. A good boost for experienced operators but might suck for beginners.




Plus you need to know the format of your source file to design the command correctly. How many audio tracks, is the first video track a thumbnail or the video, are the subtitles tracks forced, etc.

And in some situations ffmpeg has some warts you have to go around. Like they introduced recently a moronic change of behaviour where the first sub tracks becomes forced/default irrespective of the original forced/default flag of the source. You need to add "-default_mode infer_no_subs" to counter that.


I usually just paste the output of `ffprobe` into Claude when it's ambiguous. Works a treat.


My feelings exactly, but I think that's OK!

It's another tool and one that might actually improve with time. I don't see GNU's man pages getting any better spontaneously.

Whoa, what if they started to use AI to auto-generate man pages...


> Whoa, what if they started to use AI to auto-generate man pages...

That’s the time to start my career in woodworking.


I already generate man pages (and POD) with Claude for my new projects. :D

It works really well.


Any links you wanna share? I've never seen an AI generated man page.


They are not public projects, but man pages and their README.md is generated albeit refined through prompts.

It is not simply a "generate man page" without context.


> what if they started to use AI to auto-generate man pages...

Then they'd be wrong about 20% of the time, and still no one would read them. ;-)

(NB: I'm of the age that I do read them, but I think I'm in the minority.)


Reading this feels like seing a guy getting his first car in 1920 and complaining he still has to drive it himself.


To me it's more like a guy getting his first car and complaining that the car is driving him in a direction that may or may not be correct, despite his best efforts to steer it where he wants to go. And the only way to know whether he ends up in the right place is to get out of the car, look around, and maybe ask more experienced drivers. Failing that, his only option is to get back in and hope to be luckier in the next trip.

Or he can just ditch the car and walk. Sure, it's slower and requires more effort, but he knows exactly how to do that and where it will take him.


The beer brewers in my home town used to have a self-driving horse and cart which knew the daily delivery route going by all pubs and didn't really need a human to steer it or indeed be conscious during the trip. Expectedly, the delivery guy would get drunk first thing in the morning and just get carted about collecting the money.


Pony & trap could be largely self-driving, after an initial training period. That would have been a distinct negative to "upgrading" for some, I'd imagine.


It's speed and load capacity vs self-driving.

If we could imagine wiring a pony to control a car, its brain, while good at navigation, would likely be inadequate at the speed that a car attains.


Sell that guy probably got carried home by his horse after drinking half a bottle of whiskey, so maybe he had a point.


Or maybe calling a cab and telling the cab driver each direction to get to the destination instead of the cab driver just taking you there.


My experience exactly.

I no longer check with these AI tools after a number of attempts. Unrelated, a friend thought there was a NFL football game last Saturday at noon. Checking with Google's Gemini, it said "no", but there was one between two teams whose season had ended two weeks before at 1:00 Eastern Time and 2:00 Central. (The times are backwards.)


Do LLMs have knowledge of current events?


> Do LLMs have knowledge of current events?

I don't think the notion of "current" has been explained to them. Thay just define it out of context.


Meta.ai got it right. The free chatGPT only has data up till 2021 or something like that.


I mean, some are capable of searching the web.

Ask them about the fire in LA in 2025 January.


> "convert this mov to mp4"

Did any of the commands look like the ones in the left window:

https://beta.gitsense.com/?chats=12850fe4-ffb1-4618-9215-c13...

The left window contains a summary of all the LLMs asked, including all commands. The right window contains the individual LLM responses.

I asked about gotchas with missing libraries as well, and Sonnet 3.5 said there were. Were these the same libraries that were missing for you?


Looking at this, I am pretty sure I also received a "libx264" clause. Removing it made the command work for me.


libx264 is the best encoder for h264 ffmpeg has to offer so it's pretty important you bundle it in your ffmpeg install. Those commands are perfectly standard, I've been using something like that for 10+ years


I don't disagree that we need to be cautious with LLMs, but I've personally stopped asking GPT-4/GPT-4 mini for technical answers. Sonnet 3.5 and DeepSeek V3 (which is much cheaper but still not as good as Sonnet) are your best bet for technical questions.

Where I find GPT to perform better than Sonnet is with text processing. GPT seems to better understand what I want when it comes to processing documents.

I'm convinced that no LLM provider has created or will create a moat, and that we will always need to shop around for an answer.


everyone stopped using 4/4mini because theyre old.

4o replaced 4 back in April 2024. 01/01mini replaced 4o in Fall 2024.

stop using 4. use 01mini always. its cheaper, faster, and better.

o1/o1mini will be replaced by o3/o3mini in a couple months.


Unfortunately you need to be tier 2 to use o1-mini. The only time I really use GPT is to summarize documents and for that, GPT-4o mini works well enough and it is significantly cheaper than other high quality models, so I never really rack up an OpenAI bill.


o1 is such a joke, worse than 4o in some ways like multiturn,

The months old sonnet feels a generation ahead of any OAI product I've used, I'll believe the hype on o3 when I see it, remember the sora and voice roll out?


You may want to reconsider this position.

I had this bizarre bug in rust networking code where packets were getting dropped.

i dumped all 20k lines into o1pro. it thought for about ten minutes and came back telling me that my packets had a chance of being merged if set in quick succession and i needed to send the length before each message and scan packets in a loop for subdivisions on the client. this bug hadnt happened before, only when running locally on a newer faster machine, and was frequent but hard to replicate.

it was correct, and provided detailed pseudo code to solve it.

the second case involved some front end code where during an auth flow ios would force refresh on returning to the browser causing authentication state to be lost. o1pro thought for about 5 minutes before telling me ios has a heuristic with which it decides to close an app on context switch based on available ram, etc, and that i needed to conditionally check for ios and store partial state in local store on leave assuming the app could be deloaded without my control.

it was correct. with some more back and forth we fixed the bug.

these are not the kinds of problems that claude and gpt<4 have been able to help with at all.

I also used voice, and video voice extensively for translation tasks in korea, japan, and taiwan, and for controlling japanese interfaces and forms for tax documents and software.

These are very good tools.


o1 is not a general-purpose model, and it's not very good at multi-turn; it should instead be given all the context upfront: https://www.latent.space/p/o1-skill-issue


what exactly do you want the llm to do here? if the ask was so unambiguous and simple that it could be reliably generated, then the interface wouldn't be so complicated to use in the first place! LLMs are not in any way best suited for one-shot prompt => perfect output, and expectations to that effect are extremely unreasonable. the reason why LLMs are still hard for beginners to use is because the software is hard to use correctly. as with LLM output goes life itself: the results you get from using a tool can only ever be as good as the (mental) model used to choose that tool & the inputs to begin with. if all the information required to generate the output were contained by the initial prompt, then there would be absolutely no need to use the LLM at all in the first place.


Hate to be that guy, but which LLM was doing the generation? GPT-4 Turbo / Claude 3.x have not really let me down in generating ffmpeg commands - especially for basic requests - with most of their failures resulting from ___domain-specific vagaries that an expert would need to weigh in on m


GPT-4


Fair enough. If you remember what you were testing with, I'd love to try it again to see if things are better now.


You have a fair point. Some LLMs are better at some tasks, and prompts can make a difference no doubt.

Perhaps at some point there will be a triage LLM to slurp up the problem and then decide which secondary LLM is most optimal for that query, and some tertiary LLMs that execute and evaluate it in a virtual machine, etc.

Maybe someday


Oh I talked to some guys who started a company that does that. This was at an AI meetup in SF last year. They were mainly focused on making $/token cheaper by directing easy/dumb queries to smaller dumber models, but it also increases output quality because some models are just better at certain things. I'm sure all the big companies already have implementations of this by now even if they don't use it everywhere


I was suggesting optimizing for answer quality, but optimizing for cost might be useful too I suppose for "business innovation" purposes.


Yes they are called routers. One is https://withmartian.com/


Hate to be that guy, but which model works without fail for any task that ffmpeg can do?


"Writing working commands first try for every single ffmpeg feature that exists" is the highest bar I've ever heard of, I love it. I'm gonna start listing it as a requirement on job postings. Like an ffmpeg speedrun.


Yes and every failure of a product turns into a support ticket.


Obligatory xkcd: https://xkcd.com/1168/.


To be fair `tar` is quite easy to use once you understand the grammar of the options.


I don't think there's a single human on or outside of this planet that can meet that requirement, but Claude has been pretty good to me. It's certainly a much better starting point than pouring over docs and SO posts.


In my experience you still get a lot of stuff that used to work or stuff that it just makes up.


I know I struggled on getting a good command to “simply” make the videos from my Z8 smaller (in file size).

Usually the color was wrong and I don’t care enough to learn about colorspaces to figure out how to fix it and it’s utterly insane how difficult it is even with LLMs.

Just reencode it as is but a little more lossy. Is that so hard?


Handbrake may be a better option for you. I find that for some tasks it’s not only simpler but straight up works better than FFmpeg.

https://handbrake.fr/docs/en/latest/cli/cli-options.html


This doesnt exist in reality so in one sense, you could challenge the relevance


I think in the non LLM world though you at least have the trail of documentation you can unwind once you're in a bind. I don't care for prompt-a-mole fighting.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: