Hacker News new | past | comments | ask | show | jobs | submit login

Assuming you're serious for a moment, I don't think AI is really a practical tool for working with big data.

- The "hallucination" factor means every result an AI tells you about big data is suspect. I'm sure some of you who really understand AI more than the average person can "um akshually" me on this and tell me how it's possible to configure ChatGPT to absolutely be honest 100% of the time but given the current state of what I've seen from general-purpose AI tools, I just can't trust it. In many ways this is worse than MongoDB just dropping data since at least Mongo won't make up conclusions about data that's not there.

- At the end of the day - and I think we're going to be seeing this happen a lot in the future with other workflows as well - you're using this heavy, low-performance general-purpose tool to solve a problem which can be solved much more performatively by using tools which have been designed from the beginning to handle data management and analysis. The reason traditional SQL RDBMSes have endured and aren't going anywhere soon is partially because they've proven to be a very good compromise between general functionality and performance for the task of managing various types of data. AI is nowhere near as good of a balance for this task in almost all cases.

All that being said, the same way Electron has proven to be a popular tool for writing widely-used desktop and mobile applications, performance and UI concerns be damned all the way to hell, I'm sure we'll be seeing AI-powered "big data" analysis tools very soon if they're not out there already, and they will suck but people will use them anyway to everyone's detriment.




> The "hallucination" factor means every result an AI tells you about big data is suspect.

AI / ML means more than just LLM chat output, even if that's the current hype cycle of the last couple of years. ML can be used to build a perfectly serviceable classifier, or predictor, or outlier detector.

It suffers from the lack of explainability that's always plagued AI / ML, especially as you start looking at deeper neural networks where you're more and more heavily reliant on their ability to approximate arbitrary functions as you add more layers.

> you're using this heavy, low-performance general-purpose tool to solve a problem which can be solved much more performatively by using tools which have been designed from the beginning to handle data management and analysis

You are not wrong here, but one challenge is that sometimes even your ___domain experts do not know how to solve the problem, and applying traditional statistical methods without understanding the space is a great way of identifying spurious correlations. (To be fair, this applies in equal measure to ML methods.)


A comment from the old post: https://news.ycombinator.com/item?id=34696065

> I used to joke that Data Scientists exist not to uncover insights or provide analysis, but merely to provide factoids that confirm senior management's prior beliefs.

I think AI is used for the same purpose in companies: signal to the world that the company is using the latest tech and internally for supporting existing political beliefs.

So same job. Hallucination is not a problem here as the AI conclusions are not used when they don't align to existing political beliefs.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: