More

Voloskaya · 2025-04-08T17:53:01 1744134781

> This is because in the workflow that is one of the steps that is naively applied without consideration of multi edit possibility.

Unconvinced by that tbh. This could simply be a bias with the encoder/decoder or the model itself, many image generation models showed behaviour like this. Also unsure why a sepia filter would always be applied if it was a workflow, what's the point of this?

Personally, I don't believe this is just an agentic workflow. Agentic workflows can't really do anything a human couln't do manually, they just make the process much faster. I spent 2 years working with image models, specifically around controllability of the output, and there is just no way of getting this kind of edits with a regular diffusion model just through smarter prompting or other tricks. So I don't see how an agentic workflow would help.

I think you can only get there via a true multimodal model.

Voloskaya · 2025-03-14T17:45:33 1741974333

To be fair, it's not "obviously" better, but it opens a new point on the tradeoff curve. For a lot of use cases full autoregression is clearly better, and for some others ful diffusion will still be better.

Autoregressivity has high quality outputs but is fairly slow. Diffusion has low quality output but is quite fast.

This allows you to go in the middle, not as high quality as full autoregression and not as fast as full diffusion, but a balance between both.

Voloskaya · 2025-02-25T19:12:39 1740510759

> As a manufacturer [...]

> It’s a misplaced backlash on the founders or the factories, while the main enablers are the Big retailers. Factories are at the mercy of big retailers.

Everyone always think the fault lies somewhere else. Go ask big retailers and they will say they are the mercy of competition and consumers' demands.

Ultimately, everyone is trying to extract the maximum profit they can, and that includes factories.

Aveng1991 · 2025-02-26T06:02:25 1740549745

Yes that’s totally true. I have seen huge orders being cancelled at last min due to few days delay. Factories loose contracts over minor issues. The whole mass production phenomenon is very unhealthy esp in fashion where trends changes every week; the pressure to churn the production on time is even greater than other industries.

Not every factories are sweatshops; even good humane factories are under tremendous pressure to run efficiently. Running a business efficiently isn’t a crime. However I do agree that the demo could have been a little more better.

Voloskaya · 2025-02-25T03:42:52 1740454972

> Wouldn't believe that it's about to replace human intellectual work

Yea idk about that one chief. I have been working in ML (specifically scaling of large model training) at FAANG for the past 8 years, and have been using AI for my work since basically the first time this became even slightly usable, and I don’t share your optimism (or pessimism depending on how you see it). Yes it’s still pretty bad, but you have to look at rate of improvement, not just a static picture of where we are today.

I might still be wrong though and you may be right, but claiming that anyone using AI believes like you do is flat out false. A lot of my colleagues also working in ML researcher think like me btw.

mrtksn · 2025-02-25T05:57:09 1740463029

It's a figurative speech, obviously its a spectrum where some believe that AGI is around the corner or that all this is nothing more than some overblown statistics exercise and LLMs have nothing to do with actual intelligence.

In my opinion, this generation of AI is amazing but isn't it.

Voloskaya · 2025-01-31T23:10:33 1738365033

I see it as a 3 tiered hierarchy of satisfaction:

* 1 - If you manage to do something interesting/beautiful after struggling, doubting yourself and working through it, you come out with a tremendous sense of accomplishment and satisfaction, even though the day to day might have been miserable (in that sense the mountaineering comparison from the article is apt).

* 2 - If you get something interesting/beautiful after little (I don't mean no, I mean little) effort, you still enjoy it somewhat.

* 3 - If you struggle through something for a while, and come out at the end with nothing to show for it that you like or are proud of, it feels pretty terrible.

So it seems in your case you used do do (3), and now can do (2), which must definitely feel nicer. But for people that can do (1), such as proficient artists, having to move to (2) feels like it's completely destroying the entire reason why they like to do this in the first place.

Voloskaya · 2025-01-30T00:52:02 1738198322

Parent is (I assume) talking about the entire budget to get to DeepSpeek V3, not the cost of the final training run.

This includes salary for ~130 ML people + rest of the staff, company is 2 years old. They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.

The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.

You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.

I don't know if it really is $1B, but it certainly isn't below $100M.

[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+

Voloskaya · 2025-01-30T00:26:23 1738196783

> that could be like a side-project for a company like that, whose blood and sweat is literally money.

From the mouth of Liang Wenfeng, co-founder of both High Flyer and DeepSeek, 18 months ago:

"Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this."

https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-fr...

ipaddr · 2025-01-30T02:29:57 1738204197

It's a side project called DeepSeek .

Voloskaya · 2025-01-29T22:56:48 1738191408

> This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle

Can we stop with this nonsense ?

The list of author of the paper is public, you can just go look it up. There are ~130 people on the ML team, they have regular ML background just like you would find at any other large ML labs.

Their infra cost multiple millions of dollar per month to run, and the salary of such a big team is somewhere in the $20-50M per year (not very au fait of the market rate in china hence the spread).

This is not a sideproject.

Edit: Apparently my comment is confusing some people. Am not arguing that ML people are good at security. Just that DS is not the side project of a bunch of quant bros.

islewis · 2025-01-29T23:09:01 1738192141

A bunch of ML researchers who were initially hired to do quant work published their first ever user facing project.

So maybe not a side project, but if you have ever worked with ML researchers before, lack of engineering/security chops shouldn't be that surprising to you.

Voloskaya · 2025-01-29T23:58:11 1738195091

> A bunch of ML researchers who were initially hired to do quant work

Very interesting! I'm sure you have a source for this claim?

This myth of DS being a side project literally started from one tweet. DeepSeek the company is funded by a company whose main business is being a hedge fund, but DeepSeek itself from day 1 has been all about building LLM to reach AGI, completely independent.

This is like saying SpaceX is the side-project of a few caremaking bros, just because Elon funded and manages both. They are unrelated.

Again, you can easily google the name of the authors and look at their background, you will find people with PhD in LLM/multimodal models, internships at Microsoft Research etc. No trace of background on quant or time series prediction or any of that.

From the mouth of the CEO himself 2 years ago: "Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this." [0]

It's really interesting to see how after 10 years debating the mythical 10x engineer, we have now overnight created the mythical 100x Chinese quant bro researcher, that can do 50x better models than the best U.S. people, after 6pm while working on his side project.

[0]: https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-fr...

maxglute · 2025-01-30T01:22:13 1738200133

See this earlier interview from 2020.

https://www.pekingnology.com/p/ceo-of-deepseeks-parent-high-...

TDLR Highflyer started very much as exclusive ML/AI focused quant investment firm, with a lot of compute for finance AI and mining. Then CCP cracked down on mining... then finance, so Liang probably decided to pivot to LLM/AGI, which likely started as side project, but probably not anymore now the DeepSeek has taken off and Liang just met with PRC premiere a few days ago. DeepSeek being independent company doesn't mean DeepSeek isn't Liang's side project using compute bought with hedge fund money that is primarily used for hedgefund work, cushioned/allowed to get by with low margins by hedgefund profits.

Voloskaya · 2025-01-30T01:30:28 1738200628

Yes, see my analogy with Elon.

The point is, the team actually doing the DeepSeek work are working on this as their exclusive project, have been hired exclusively for this etc.

They aren't doing this on the side of their main quant job, and destroying U.S. researchers just as a hobby as the myth would have us believe.

maxglute · 2025-01-30T01:43:48 1738201428

That's a fair distinction. IMO should still be categorized as side project in the sense that it's Liang's pet project, the same way Jeff Bezos spend $$$ on his forever clock with seperate org but ultimately with Amazon resources. DeepSeek / Liang fixating on AGI and not profit making or loss-making since hardware / capex deprecation is likely eaten by High Flyer / quant side. No reason to believe DeepSeek spent 100ms to build out another compute chain from High Flyer. Myth that seasoned finance quants using 20% time to crush US researchers is false, but reality/narrative that a bunch of fresh out of school GenZ kids from tier1 PRC universities destroying US researchers is kind of just as embarassing.

asdasdsddd · 2025-01-30T00:37:33 1738197453

Just to be pedantic, spaceX predates tesla

benatkin · 2025-01-30T01:09:11 1738199351

The carmaking bro predates SpaceX. He had a BMW in college and got a supercar in 1997. While he wasn’t a carmaker yet he got started with cars earlier.

islewis · 2025-01-30T00:11:05 1738195865

A valid response to my initial comment which was a bit tongue in cheek.

However, i'm not sure that them being LLM researchers compared to quant researchers changes the dynamic of their relaxed security posture.

Voloskaya · 2025-01-30T00:20:20 1738196420

> However, i'm not sure that them being LLM researchers compared to quant researchers changes the dynamic of their relaxed security posture.

It does not indeed, but that's not the part I was commenting on.

spoaceman7777 · 2025-01-29T23:21:56 1738192916

First ever? Their math, coding, and other models have been making a splash since 2023.

The mythologizing around deepseek is just absurd.

"Deepseek is the tale of one lowly hedgefund manager overcoming the wicked American AI devils". Every day I hear variations of this, and the vast majority of it is based entirely in "vibes" emanating from some unknown place.

sho_hn · 2025-01-30T00:18:03 1738196283

What I find amusing is that this closely mirrors the breakout moment OpenAI had with ChatGPT. They had been releasing models for quite some time before slapping the chatbot interface on it, and then it blew up within a few days.

It's fascinating that a couple of years and a few competitors in, the DeepSeek moment parallels it so closely.

quantified · 2025-01-29T23:29:42 1738193382

Models and security are very different uses of our synapses. Publishing any number of models is no proof of anything beyond models. Talented mathematicians and programmers though they may be.

tonyhart7 · 2025-01-30T00:12:48 1738195968

well security isn't their job to begin with

manquer · 2025-01-29T23:22:41 1738192961

> This is not a sideproject.

OP means to say public API and app being a side project, which likely it is, the skills required to do ML have little overlap to skills required to run large complex workloads securely and at scale for public facing app with presumably millions of users.

The latter role also typically requires experience not just knowledge to do well which is why experiences SREs have very good salaries.

weird-eye-issue · 2025-01-29T23:01:40 1738191700

None of that has anything to do with "deploying external client facing applications"

Dylan16807 · 2025-01-30T00:31:52 1738197112

You're right. It has nothing to do with the second sentence of the two sentence post it replies to.

skywhopper · 2025-01-29T23:22:31 1738192951

?? The point is, the ML researchers aren’t experts at deploying secure infrastructure.

benatkin · 2025-01-30T00:07:01 1738195621

??????

This wasn't narrow minded folks doing this. Shit happens.

Voloskaya · 2025-01-29T20:01:41 1738180901

> First of all, training off of data generated by another AI is generally a bad idea because you'll end up with a strictly less accurate model (usually).

That is not true at all.

We have known how to solve this for at least 2 years now.

All the latest state of the art models depend heavily on training on synthetic data.

bjourne · 2025-01-30T00:18:19 1738196299

https://www.nature.com/articles/s41586-024-07566-y

Voloskaya · 2025-01-30T01:01:22 1738198882

Key point from your linked paper:

> We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models

No one is training on indiscriminate synthetic data. It's very much discriminated, but still synthetic.

Voloskaya · 2025-01-28T13:39:11 1738071551

You want to bet? The panic around deepseek is getting completely disconnected from reality.

Don’t get me wrong what DS did is great, but anyone thinking this reshape the fundamental trend of scaling laws and make compute irrelevant is dead wrong. I’m sure OpenAI doesn’t really enjoy the PR right now, but guess what OpenAI/Google/Meta/Anthropic can do if you give them a recipe for 11x more efficient training ? They can scale it to their 100k GPUs clusters and still blow everything. This will be textbook Jevons paradox.

Compute is still king and OpenAI has worked on their training platform longer than anyone.

Of course as soon as the next best model is released, we can train on its output and catch up at a fraction of the cost, and thus the infinite bunny hopping will continue.

But OpenAI is very much alive.

lolinder · 2025-01-28T14:12:56 1738073576

> The panic around deepseek is getting completely disconnected from reality.

This entire hype cycle has long been completely disconnected from reality. I've watched a lot of hype waves, and I've never seen one that oscillates so wildly.

I think you're right that OpenAI isn't as hurt by DeepSeek as the mass panic would lead one to believe, but it's also true that DeepSeek exposes how blown out of proportion the initial hype waves were and how inflated the valuations are for this tech.

Meta has been demonstrating for a while that models are a commodity, not a product you can build a business on. DeepSeek proves that conclusively. OpenAI isn't finished, but they need to continue down the path they've already started and give up the idea that "getting to AGI" is a business model that doesn't require them to think about product.

regularfry · 2025-01-28T14:28:34 1738074514

In a sense it doesn't, in that if DeepSeek can do this, making OpenAI-type capabilities available for Llama-type infrastructure costs, then if you apply OpenAI scale infrastructure again to a much more efficient training/evaluation system, everything multiplies back up. I think that's where they'll have to head: using their infrastructure moat (such as it is) to apply these efficiency learnings to allow much more capable models at the top end. Yes, they can't sleep-walk into it, but I don't think that was ever the game.

throwup238 · 2025-01-28T15:15:40 1738077340

> The panic around deepseek is getting completely disconnected from reality.

Couldn’t agree more! Nobody here read the manual. The last paragraph of DeepSeek’s R1 paper:

> Software Engineering Tasks: Due to the long evaluation times, which impact the efficiency of the RL process, large-scale RL has not been applied extensively in software engineering tasks. As a result, DeepSeek-R1 has not demonstrated a huge improvement over DeepSeek-V3 on software engineering benchmarks. Future versions will address this by implementing rejection sampling on software engineering data or incorporating asynchronous evaluations during the RL process to improve efficiency.

Just based on my evaluations so far, R1 is not even an improvement on V3 in terms of real world coding problems because it gets stuck in stupid reasoning loops like whether “write C++ code to …” means it can use a C library or has to find a C++ wrapper which doesn’t exist.

csomar · 2025-01-29T05:53:16 1738129996

OpenAI issue might be that it is extremely inefficient with money (high salaries, high compute costs, high expenses, etc..). This is fine when you have an absolute monopoly as investors will throw money your way (open ai is burning cash) but once an alternative is clear, you can no longer do that.

OpenAI doesn't have an advantage in compute more than Google, Microsoft or someone with a few billions of $$.

miroljub · 2025-01-28T16:48:45 1738082925

> You want to bet?

Why would anyone bet? They can just short the OpenAI / MS stocks, and see in a few months if they were right or not.

IAmGraydon · 2025-01-28T17:08:40 1738084120

OpenAI isn't publicly traded and MSFT's stake is so minor compared to their other business that it will have a negligible impact on their stock price.

icedchai · 2025-01-28T17:08:28 1738084108

1) OpenAI isn't public, so not possible. 2) MS is one of the most well diversified tech companies, so, if anything, this will be a positive.

skeaker · 2025-01-28T22:01:23 1738101683

How is that any different from a bet?

dimgl · 2025-01-28T20:25:13 1738095913

Deepseek is not the only reason. I cancelled my OpenAI subscription because I've replaced it wholesale with Anthropic.

joseda-hg · 2025-01-29T03:05:22 1738119922

I replaced that with kagi, unliminted access to multiple models including Claude, O1 and V3/R1 + you also get Kagi, which was already a good deal

hnben · 2025-01-29T13:24:02 1738157042

oh wow. I have been using kagi premium for months, and never noticed, that their AI assistant now has all the good AIs too. I was using kagi exclusively for search, and perplexity for ai stuff. I guess I can cut down on my subscriptions too. Thanks for your hint. (Also I noticed that kagi has a pwa for their ai assistent, which is also cool)

coliveira · 2025-01-28T14:55:28 1738076128

Computing is not king, DeepSeek just demonstrated otherwise. And yes, OpenAI will have to reinvent itself to copy DS, but this means they'll have to throw away a lot of their investment in existing tech. They might recover but it is not a minor hiccup as you suggest.

nlh · 2025-01-28T15:42:22 1738078942

I just don't see how this is true. OpenAI has a massive cash & hardware pile -- they'll adapt and learn from what DeepSeek has done and be in a position to build and train 10x-50x-100x (or however) faster and better. They are getting a wake-up call for sure but I don't think much is going to be thrown away.

diedyesterday · 2025-01-30T20:02:37 1738267357

...until they distill that "11x efficient training" again ...