> This is because in the workflow that is one of the steps that is naively applied without consideration of multi edit possibility.
Unconvinced by that tbh. This could simply be a bias with the encoder/decoder or the model itself, many image generation models showed behaviour like this. Also unsure why a sepia filter would always be applied if it was a workflow, what's the point of this?
Personally, I don't believe this is just an agentic workflow. Agentic workflows can't really do anything a human couln't do manually, they just make the process much faster. I spent 2 years working with image models, specifically around controllability of the output, and there is just no way of getting this kind of edits with a regular diffusion model just through smarter prompting or other tricks. So I don't see how an agentic workflow would help.
I think you can only get there via a true multimodal model.
To be fair, it's not "obviously" better, but it opens a new point on the tradeoff curve. For a lot of use cases full autoregression is clearly better, and for some others ful diffusion will still be better.
Autoregressivity has high quality outputs but is fairly slow.
Diffusion has low quality output but is quite fast.
This allows you to go in the middle, not as high quality as full autoregression and not as fast as full diffusion, but a balance between both.
> It’s a misplaced backlash on the founders or the factories, while the main enablers are the Big retailers. Factories are at the mercy of big retailers.
Everyone always think the fault lies somewhere else. Go ask big retailers and they will say they are the mercy of competition and consumers' demands.
Ultimately, everyone is trying to extract the maximum profit they can, and that includes factories.
Yes that’s totally true. I have seen huge orders being cancelled at last min due to few days delay. Factories loose contracts over minor issues.
The whole mass production phenomenon is very unhealthy esp in fashion where trends changes every week; the pressure to churn the production on time is even greater than other industries.
Not every factories are sweatshops; even good humane factories are under tremendous pressure to run efficiently. Running a business efficiently isn’t a crime.
However I do agree that the demo could have been a little more better.
> Wouldn't believe that it's about to replace human intellectual work
Yea idk about that one chief. I have been working in ML (specifically scaling of large model training) at FAANG for the past 8 years, and have been using AI for my work since basically the first time this became even slightly usable, and I don’t share your optimism (or pessimism depending on how you see it).
Yes it’s still pretty bad, but you have to look at rate of improvement, not just a static picture of where we are today.
I might still be wrong though and you may be right, but claiming that anyone using AI believes like you do is flat out false. A lot of my colleagues also working in ML researcher think like me btw.
It's a figurative speech, obviously its a spectrum where some believe that AGI is around the corner or that all this is nothing more than some overblown statistics exercise and LLMs have nothing to do with actual intelligence.
In my opinion, this generation of AI is amazing but isn't it.
* 1 - If you manage to do something interesting/beautiful after struggling, doubting yourself and working through it, you come out with a tremendous sense of accomplishment and satisfaction, even though the day to day might have been miserable (in that sense the mountaineering comparison from the article is apt).
* 2 - If you get something interesting/beautiful after little (I don't mean no, I mean little) effort, you still enjoy it somewhat.
* 3 - If you struggle through something for a while, and come out at the end with nothing to show for it that you like or are proud of, it feels pretty terrible.
So it seems in your case you used do do (3), and now can do (2), which must definitely feel nicer.
But for people that can do (1), such as proficient artists, having to move to (2) feels like it's completely destroying the entire reason why they like to do this in the first place.
Parent is (I assume) talking about the entire budget to get to DeepSpeek V3, not the cost of the final training run.
This includes salary for ~130 ML people + rest of the staff, company is 2 years old.
They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.
The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.
You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.
I don't know if it really is $1B, but it certainly isn't below $100M.
[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+
> that could be like a side-project for a company like that, whose blood and sweat is literally money.
From the mouth of Liang Wenfeng, co-founder of both High Flyer and DeepSeek, 18 months ago:
"Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this."
> This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle
Can we stop with this nonsense ?
The list of author of the paper is public, you can just go look it up. There are ~130 people on the ML team, they have regular ML background just like you would find at any other large ML labs.
Their infra cost multiple millions of dollar per month to run, and the salary of such a big team is somewhere in the $20-50M per year (not very au fait of the market rate in china hence the spread).
This is not a sideproject.
Edit: Apparently my comment is confusing some people. Am not arguing that ML people are good at security. Just that DS is not the side project of a bunch of quant bros.
A bunch of ML researchers who were initially hired to do quant work published their first ever user facing project.
So maybe not a side project, but if you have ever worked with ML researchers before, lack of engineering/security chops shouldn't be that surprising to you.
> A bunch of ML researchers who were initially hired to do quant work
Very interesting! I'm sure you have a source for this claim?
This myth of DS being a side project literally started from one tweet.
DeepSeek the company is funded by a company whose main business is being a hedge fund, but DeepSeek itself from day 1 has been all about building LLM to reach AGI, completely independent.
This is like saying SpaceX is the side-project of a few caremaking bros, just because Elon funded and manages both. They are unrelated.
Again, you can easily google the name of the authors and look at their background, you will find people with PhD in LLM/multimodal models, internships at Microsoft Research etc. No trace of background on quant or time series prediction or any of that.
From the mouth of the CEO himself 2 years ago: "Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this." [0]
It's really interesting to see how after 10 years debating the mythical 10x engineer, we have now overnight created the mythical 100x Chinese quant bro researcher, that can do 50x better models than the best U.S. people, after 6pm while working on his side project.
TDLR Highflyer started very much as exclusive ML/AI focused quant investment firm, with a lot of compute for finance AI and mining. Then CCP cracked down on mining... then finance, so Liang probably decided to pivot to LLM/AGI, which likely started as side project, but probably not anymore now the DeepSeek has taken off and Liang just met with PRC premiere a few days ago. DeepSeek being independent company doesn't mean DeepSeek isn't Liang's side project using compute bought with hedge fund money that is primarily used for hedgefund work, cushioned/allowed to get by with low margins by hedgefund profits.
That's a fair distinction. IMO should still be categorized as side project in the sense that it's Liang's pet project, the same way Jeff Bezos spend $$$ on his forever clock with seperate org but ultimately with Amazon resources. DeepSeek / Liang fixating on AGI and not profit making or loss-making since hardware / capex deprecation is likely eaten by High Flyer / quant side. No reason to believe DeepSeek spent 100ms to build out another compute chain from High Flyer. Myth that seasoned finance quants using 20% time to crush US researchers is false, but reality/narrative that a bunch of fresh out of school GenZ kids from tier1 PRC universities destroying US researchers is kind of just as embarassing.
The carmaking bro predates SpaceX. He had a BMW in college and got a supercar in 1997. While he wasn’t a carmaker yet he got started with cars earlier.
First ever? Their math, coding, and other models have been making a splash since 2023.
The mythologizing around deepseek is just absurd.
"Deepseek is the tale of one lowly hedgefund manager overcoming the wicked American AI devils". Every day I hear variations of this, and the vast majority of it is based entirely in "vibes" emanating from some unknown place.
What I find amusing is that this closely mirrors the breakout moment OpenAI had with ChatGPT. They had been releasing models for quite some time before slapping the chatbot interface on it, and then it blew up within a few days.
It's fascinating that a couple of years and a few competitors in, the DeepSeek moment parallels it so closely.
Models and security are very different uses of our synapses. Publishing any number of models is no proof of anything beyond models. Talented mathematicians and programmers though they may be.
OP means to say public API and app being a side project, which likely it is, the skills required to do ML have little overlap to skills required to run large complex workloads securely and at scale for public facing app with presumably millions of users.
The latter role also typically requires experience not just knowledge to do well which is why experiences SREs have very good salaries.
> First of all, training off of data generated by another AI is generally a bad idea because you'll end up with a strictly less accurate model (usually).
That is not true at all.
We have known how to solve this for at least 2 years now.
All the latest state of the art models depend heavily on training on synthetic data.
You want to bet?
The panic around deepseek is getting completely disconnected from reality.
Don’t get me wrong what DS did is great, but anyone thinking this reshape the fundamental trend of scaling laws and make compute irrelevant is dead wrong.
I’m sure OpenAI doesn’t really enjoy the PR right now, but guess what OpenAI/Google/Meta/Anthropic can do if you give them a recipe for 11x more efficient training ? They can scale it to their 100k GPUs clusters and still blow everything.
This will be textbook Jevons paradox.
Compute is still king and OpenAI has worked on their training platform longer than anyone.
Of course as soon as the next best model is released, we can train on its output and catch up at a fraction of the cost, and thus the infinite bunny hopping will continue.
> The panic around deepseek is getting completely disconnected from reality.
This entire hype cycle has long been completely disconnected from reality. I've watched a lot of hype waves, and I've never seen one that oscillates so wildly.
I think you're right that OpenAI isn't as hurt by DeepSeek as the mass panic would lead one to believe, but it's also true that DeepSeek exposes how blown out of proportion the initial hype waves were and how inflated the valuations are for this tech.
Meta has been demonstrating for a while that models are a commodity, not a product you can build a business on. DeepSeek proves that conclusively. OpenAI isn't finished, but they need to continue down the path they've already started and give up the idea that "getting to AGI" is a business model that doesn't require them to think about product.
In a sense it doesn't, in that if DeepSeek can do this, making OpenAI-type capabilities available for Llama-type infrastructure costs, then if you apply OpenAI scale infrastructure again to a much more efficient training/evaluation system, everything multiplies back up. I think that's where they'll have to head: using their infrastructure moat (such as it is) to apply these efficiency learnings to allow much more capable models at the top end. Yes, they can't sleep-walk into it, but I don't think that was ever the game.
> The panic around deepseek is getting completely disconnected from reality.
Couldn’t agree more! Nobody here read the manual. The last paragraph of DeepSeek’s R1 paper:
> Software Engineering Tasks: Due to the long evaluation times, which impact the efficiency of the RL process, large-scale RL has not been applied extensively in software engineering tasks. As a result, DeepSeek-R1 has not demonstrated a huge improvement over DeepSeek-V3 on software engineering benchmarks. Future versions will address this by implementing rejection sampling on software engineering data or incorporating asynchronous evaluations during the RL process to improve efficiency.
Just based on my evaluations so far, R1 is not even an improvement on V3 in terms of real world coding problems because it gets stuck in stupid reasoning loops like whether “write C++ code to …” means it can use a C library or has to find a C++ wrapper which doesn’t exist.
OpenAI issue might be that it is extremely inefficient with money (high salaries, high compute costs, high expenses, etc..). This is fine when you have an absolute monopoly as investors will throw money your way (open ai is burning cash) but once an alternative is clear, you can no longer do that.
OpenAI doesn't have an advantage in compute more than Google, Microsoft or someone with a few billions of $$.
oh wow. I have been using kagi premium for months, and never noticed, that their AI assistant now has all the good AIs too. I was using kagi exclusively for search, and perplexity for ai stuff. I guess I can cut down on my subscriptions too. Thanks for your hint. (Also I noticed that kagi has a pwa for their ai assistent, which is also cool)
Computing is not king, DeepSeek just demonstrated otherwise. And yes, OpenAI will have to reinvent itself to copy DS, but this means they'll have to throw away a lot of their investment in existing tech. They might recover but it is not a minor hiccup as you suggest.
I just don't see how this is true. OpenAI has a massive cash & hardware pile -- they'll adapt and learn from what DeepSeek has done and be in a position to build and train 10x-50x-100x (or however) faster and better. They are getting a wake-up call for sure but I don't think much is going to be thrown away.
Unconvinced by that tbh. This could simply be a bias with the encoder/decoder or the model itself, many image generation models showed behaviour like this. Also unsure why a sepia filter would always be applied if it was a workflow, what's the point of this?
Personally, I don't believe this is just an agentic workflow. Agentic workflows can't really do anything a human couln't do manually, they just make the process much faster. I spent 2 years working with image models, specifically around controllability of the output, and there is just no way of getting this kind of edits with a regular diffusion model just through smarter prompting or other tricks. So I don't see how an agentic workflow would help.
I think you can only get there via a true multimodal model.