Hacker News new | past | comments | ask | show | jobs | submit | more mlepath's comments login

This is really awesome. Tested it on both Nvidia and Mac GPUs.

Interested to know how debugging in a real application would work since WASM is pretty hard to debug and GPU code is pretty hard to debug. I assume WASM GPU is ... very difficult to debug.


You can download the intermediate files fwiw


The process-centric taxonomy in this paper is one of the most structured frameworks I’ve seen for anomaly detection methods. It breaks down approaches into distance-based, density-based, and prediction-based categories. In practice (been doing time series analysis professionally for 8+ years), I’ve found that prediction-based methods (e.g., reconstruction errors in autoencoders) are fantastic for semi-supervised use cases but fall short for streaming data.


The first rule of programming with LLMs is don't use them for anything you don't know how to do. If you can look at the solution and immediately know what's wrong with it, they are a time saver otherwise...

I find chat for search is really helpful (as the article states)


That seems like a wild restriction.

You can give them more latitude for things you know how to check.

I didn't know how to setup the right gnarly typescript generic type to solve my problem but I could easily verify it's correct.


If you merely know how to check, would you also know how to fix it after you find that it's wrong?

If you are lucky to have the LLM fix it for you, great. If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.


It did fix it, I iterated passing in the type and linter errors until it passed all the requirements I had.

> If you merely know how to check, would you also know how to fix it after you find that it's wrong?

Probably? I'm capable of reading documentation, learning and asking others.

> If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.

You may be surprised by how little time, but regardless it would have taken more time to hit that point without the tool.

Also sometimes things don't work out, that's OK. As long as overall it improves work, that's all we need.


If you don't understand what the generic is doing, there might be edge-cases you don't appreciate. I think Typescript types are fairly non-essential so it doesn't really matter, but for more important business logic it definitely can make a difference.


I understand what it's doing, and could easily set out the cases I needed.


If you understand what it is doing, you could do it yourself, surely?


Have you never understood the solution to a puzzle much more easily than solving it yourself? I feel there's literally a huge branch of mathematics dedicated to the difference between finding and validating a solution.

More specifically, I didn't know how to solve it, though obviously could have spent much more time and learned. There were only a small number of possible cases, but I needed certain ones to work and others not to. I was easily able to create the examples but not find the solution. With looping through claude I could solve it in a few minutes. I then got an explanation, could read the right relevant docs and feel satisfied that not only did everything pass the automated checks but my own reasoning.


That's the wrong approach.

I use chat for things I don't know how to do all the time. I might not know how to do it, but I sure know how to test that what I'm being told is correct. And as long as it's not, I iterate with the chat bot.


A better way to phrase it might be don't use it for something that you aren't able to verify or validate.


I agree with this. I keep harping on this, but we are sold automation instead of a power tool. If you have ___domain knowledge in the problem that you are solving, then LLMs can become an extremely valuable aid.


Similar to a developer who copy-pastes sections of code from StackOverflow and puts their faith in it being correct. The bigger issue with LLMs is that it's easier to be tricked into thinking you actually understand the code when your understanding may actually be quite superficial.


I think it's just a broader definition of "know how to do". If you can write a test for it then I'm going to argue you know "how" to do it in a bigger picture sense. As in, you understand the requirements and inherent underlying technical challenges behind what you are asking to be done.

The issue is, there are always subtle aspects to problems that most developers only know by instinct. Like, "how is it doing the unicode conversion here" or "what about the case when the buffer is exactly the same size as the message, is there room for the terminating character?". You need the instincts for these to properly construct tests and review the code it did. If you do have those instincts, I argue you could write the code, it's just a lot of effort. But if you don't, I will argue you can't test it either and can't use LLMs to produce (at least) professional level code.


I feel like that's a good option ONLY if the code you are writing will never be deployed to an environment where security is a concern. Many security bugs in code are notoriously difficult to spot and even frequently slip through reviews from humans who are actively looking for exactly those kinds of bugs.

I suppose we could ask the question: Are LLMs better at writing secure code than humans? I'll admit I don't know the answer to that, but given what we know so far, I seriously doubt it.


"Trust but verify" is still useful especially when you ask LLMs to do stuff you don't know. I've used LLMs to help me get started on tasks where I wasn't even sure of what a solution was. I would then inspect the code and review any relevant documentation to see if the proposed solution would work. This has been time consuming but I've learned a lot regardless.


I'd like to rephrase as, "don't deploy LLM generated code if you don't know how it works (or what it does)"

This means, it's okay to use LLM to try something new that you're on the fence about. Learn it and then once you've learned that concept or the idea, you can go ahead to use same code if it's good enough.


"don't deploy ̶L̶L̶M̶ ̶g̶e̶n̶e̶r̶a̶t̶e̶d̶ code if you don't know how it works (or what it does)"

(Which goes for StackOverflow, etc.)


I've seen a whole flurry of reverts due to exactly this. I've also dabbled in trusting it a little too much, and had the expected pain.

I'm still learning where it's usable and where I'm over-reaching. At present I'm at about break-even on time spent, which bodes well for the next few years as they iron out some of the more obvious issues.


IMO this is a bad take. I use LLMs for things I don’t know how to do myself all the time. Now, I wouldn’t use one to write some new crypto functions because the risk associated with getting it wrong is huge, but if I need to write something like a wrapper around some cloud provider SDK that I’m unfamiliar with, it gets me 90% of the way there. It also is way more likely to know at least _some_ of the best practices where I’ll likely know none. Even for more complex things getting some working hello world examples from an LLM gives me way more threads to pull on and research than web searching ever has.


>It also is way more likely to know at least _some_ of the best practices

What's way more likely to know the best practices is the documentation. A few months ago there was a post that made the rounds about how the Arc browser introduced a really severe security flaw by misconfiguring their Firebase ACLs despite the fact that the correct way to configure them is outlined in the docs.

This to me is the sort of thing (although maybe not necessarily in this case) out of LLM programming. 90% isn't good enough, it's the same as Stackoverflow pasting. If you're a serious engineer and you are unsure about something, it is your task to go to the reference material, or you're at some point introducing bugs like this.

In our profession it's not just crypto libraries, one misconfigured line in a yaml file can mean causing millions of dollars of damage or leaking people's most private information. That can't be tackled with a black box chatbot that may or may not be accurate.


> if I need to write something like a wrapper around some cloud provider SDK that I’m unfamiliar with

But "writing a wrapper" is (presumably) a process you're familiar with, you can tell if it's going off the rails.


Writing a wrapper is easier to verify because of the context of the API or SDK you're wrapping. Seems wrong? Check the docs. Doesn't work? Curl it yourself.


> write something like a wrapper around some cloud provider SDK that I’m unfamiliar with

you're equating "unfamliar" with "don't know how to do" but I will claim you do know how to do it, you would just be slow because you have to reference documentation and learn which functions do what.


You can ask the LLM to teach it to you step by step, and then you can validate it by doing it as well as you go, still quicker than learning it and not knowing how to debug it.

Learning how something works is critical or it's far worse than technical debt.


Yes, I have a friend learning their first programming language with much assistance from ChatGPT and it's actually going really well.


Awesome, I wish more people knew about this compared to trying to do magic Harry Potter single prompt to do everything.


How you use the LLM matters.

Having an LLM do something for you that you don't know how to do is asking for trouble. An expert likely can off load a few things they aren't all that important, but any junior is going to dig themselves into a significant hole with this technique.

But asking an LLM to help you learn how to do something is often an option. Can't one just learn it using other resources? Of course. LLMs shouldn't be a must have. If at any point you have to depend upon the LLM, that is a red flag. It should be a possible tool, used when it saves time, but swapped for other options when they make sense.

For an example, I had a library I was new to and asked copilot how to do some specific task. It gave me the options. I used this output to go to google and find the matching documentation and gave it a read. I then when back to copilot and wrote up my understanding of what the documentation said and checked to see if copilot had anything to add.

Could I have just read the entire documentation? That is an option, but one that costs more time to give deeper expertise. Sometimes that is the option to go with, but in this case having a more shallow knowledge to get a proof of concept thrown together fit my situation better.

Anyone just copying an AI's output and putting it in a PR without understanding what it does? That's asking for trouble and it will come back to bite them.


I completely agree. In graphics programming, I love having it do things that are annoying but easy to verify (like setting up frame buffers in WebGL). I also ask it do more ambitious things like implementing an algorithm in shader code, and it will sometimes give a result that is mostly correct but subtly wrong. I only have been able to catch those subtle errors because I know what to look for.


>>f you can look at the solution and immediately know what's wrong with it, they are a time saver otherwise...

Indeed getting good at writing code using LLMs demands being very good at reading code.

To that extent its more like blitz chess than autocomplete. You need to think and verify in trees as it goes.


Exactly, you have to (vaguely) know what you’re looking for and have some basic ideas of what algorithms would work. AI is good at helping with syntax stuff but not really good at thinking.


> ... don't use them for anything you don't know how to do ... I find chat for search is really helpful (as the article states)

Not really. I often use Chat to understand codebases. Instead trying to navigate mature, large-ish FOSS projects (like say, the Android Run Time) by looking at it file by file, method by method, field by field (all to laborious), I just ask ... Copilot. It is way, way faster than I and are mostly directionally correct with its answers.


Don't use them for anything you don't know how to test. If you can write unit tests you understand and it passes them all (or visually inspect/test a GUI it generated), you know it's doing well.


My experience is the opposite. I find them most valuable for helping me do things that would be extremely hard or impossible for me to figure out. To wit, I just used one to decode a pagination cursor format and write a function that takes a datetime and generates a valid cursor. Ain’t nobody got time for that.


At University I once talked to a former Soviet professor about publications and he mentioned how they proliferated. It was uncommon to have a lot of publications until recently and now everyone has a really high count.

I think we came to any metric that is used for rewards loses its value as a metric. That includes citations, sadly.


I have had a somewhat similar journey but 14 years instead of 25 and I always wonder how it would be different today.

We were lucky enough to grow up with the industry and progressively learn more complexity. The kids out of school today are faced with decades worth of complexity on day one on the job.


Physicists have centuries to catch up on just to get started. I think they will survive. The main issue today is more the saturation of useless information in my opinion. There’s little time for your own thoughts as too much time is spent sorting the thoughts others want you to think.


This is true for every field. Everyone has had to step into a field that was built upon the hard-won experience of others and had to get up to speed, and the easiest way to do so is to recognize that fact and take advantage of the wisdom of those who came before.


This is a great application of various domains of ML. This reminds me of Vesuvius Challenge. This kid of thing is accessible to beginners too since the data by definition are pretty limitted.


Perhaps you missed it while skimming, but indeed, the Vesuvius Challenge is a primary topic of discussion in the article :)


Hello, fellow Metamate ;)


I have worked with Jetson Orin platform, and honestly Nvidia has something that is really easy to work with there. The Jetsons are basically a full GPU (plus some stuff) at very low power. If I were tasked with building a robot it would likely be the first place I look.


They are OK. If you need advanced vision - yes, because CUDA.

But off the shelf mini PCs are much more user friendly for existing software IME.

Thankfully ARM being so wide spread and continuing to grow this wont matter as much.


Maybe you've had a different experience with GPU drivers on ARM for Linux than most of the rest of us? (i.e. it's the fact that nVidia actually has Linux support on ARM that is the real appeal)


> But off the shelf mini PCs are much more user friendly for existing software IME.

I'd love you to point me in the direction of an off-the-shelf mini PC that has 64gb of addressable memory and CUDA support.


Off-the-shelf mini-PCs with 64 GB of addressable memory and reasonably powerful integrated GPUs, i.e. faster than the smaller Ampere GPUs of the cheaper NVIDIA Orin models, are plenty.

On the other hand, if you force the CUDA support condition and any automatic translation of CUDA programs is not accepted as good enough, then this mandates the use of a discrete NVIDIA GPU, which can be provided only by a mini-ITX mini-PC.

There are mini-ITX boards with laptop Ryzen 7940HX or 7945HX CPUs, at prices between $400 and $550. To such a board you must add 64 GB of DRAM, e.g. @ $175, and a GPU, e.g. a RTX 4060 at slightly more than $300.

Without a discrete GPU, a case for a mini-ITX motherboard has a volume of only 2.5 liter. With a discrete GPU like RTX 4060, the volume of the case must increase to 5 liter (for cases with PCIe extenders, which allow a smaller volume than typical mini-ITX cases).

So your CUDA condition still allows what can be considered an off-the-shelf mini-PC, but mandating CUDA raises the volume from the 0.5 L of a NUC-like mini-PC to 5 L and the price is also raised 2 or 3 times.

This of course unless you choose an Orin for CUDA support, but that will not give you 64 GB of DRAM, because NVIDIA has never provided enough memory in any of their products, unless you accept to pay a huge overprice.


Yea, people have a really hard time dealing with data leakage especially on data sets as large as LLMs need.

Basically if something appeared online or was transmitted over the wire should no longer be eligible to evaluate on. D. Sculley had a great talk at NeurIPS 2024 (same conference this paper was in) titled Empirical Rigor at Scale – or, How Not to Fool Yourself

Basically no one knows how to properly evaluate LLMs.


No, an absolute massive amount of people do. In fact they have been doing exactly as you recommend, because as you note, it's obvious and required for a basic proper evaluation.


Naive question, do people no longer respect robots.txt?


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: