> DeepSeek spends $6 million in old H800 hardware to develop open source model t...

nabla9 · 2025-01-27T12:01:33 1737979293

Huggingface is currently replicating it.

Replications of small models indicate that they don't lie any significant amount. The architecture is cheap to train.

Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Model RL Revolution https://xyzlabs.substack.com/p/berkeley-researchers-replicat...

benterix · 2025-01-27T12:45:44 1737981944

Whoah, that's incredible!

I remember a year ago I was hoping that in a decade from now it would be great to run GPT4-class models on my own hardware. The reality seems to be far more exciting.

moritzwarhier · 2025-01-27T23:37:07 1738021027

I first sneered at the idea of LLM generated LLM training sets, but is this what might be driving the big efficiency leap?

Asking as someone who honestly only superficially followed the developments since the end of 2023 or so

bufferoverflow · 2025-01-27T15:28:14 1737991694

You call R1 a small model? It's a 671-billion parameter model.

elorant · 2025-01-27T16:42:39 1737996159

There are multiple variations of the model starting from 1.5B parameters.

bufferoverflow · 2025-01-27T16:59:37 1737997177

Those are distillations of the model.

rsanek · 2025-01-27T17:27:38 1737998858

have you used those? in my experience even the 70B distillation is far worse than what you can expect from o1 / the R1 available on the web

elorant · 2025-01-27T19:03:22 1738004602

No, I haven't. I've used Perplexity's R1 but I don't know how many parameters it has. It's quite good, although too slow.

lmm · 2025-01-27T12:44:59 1737981899

All of the western AI companies trained on illegally obtained data, they barely even bother to deny it. This is an industry where lies are normalised. (Not to contradict your point about this specific number)

pas · 2025-01-27T15:58:45 1737993525

It's legally a grey area. It might even be fair use. Facts themselves are not protected by copyright. If there's no unauthorized reproduction/copying then it's not a copyright issue. (Maybe it's a violation of terms of services of course.)

tivert · 2025-01-27T17:47:46 1738000066

> Facts themselves are not protected by copyright.

But don't LLMs encode language, not facts?

> If there's no unauthorized reproduction/copying then it's not a copyright issue.

I'm pretty sure copyright holders have gotten the models to regurgitate their copyright works verbatim, or nearly so.

MRtecno98 · 2025-01-27T20:18:04 1738009084

We don't know what LLMs encode because we don't know what the model weights represent.

On the second point it depends how the models were made to reporduce text verbatim. If i copy-paste someone's article in MS word i technically made word reproduce the text verbatim., obviously that's not Word's fault. If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.

lmm · 2025-01-28T01:16:50 1738027010

> If i copy-paste someone's article in MS word i technically made word reproduce the text verbatim., obviously that's not Word's fault. If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.

But that clearly means that the LLM already has the Bee Movie script inside it (somehow), which would be a copyright violation. If MS word came with an "open movie script" button that let you pick a movie and get the script for it, that would clearly be a copyright violation. Of course if the user inputs something then that's different - that's not the software shipping whatever it is.

cdblades · 2025-01-27T21:49:31 1738014571

That's not a fair comparison. The user in the word example already had access to the infringing content to copy it, and then paste it into word.

But it has to have that copy, verbatim, to produce it, as you acknowledge.

If dropbox was hosting and serving IP from paramount, paramount would be able to submit a DCMA request to get that data removed.

Not only can you not submit a DMCA request to chatGPT, they can't actually obey one.

tivert · 2025-01-28T05:08:13 1738040893

> If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.

Huh? The "request" part doesn't matter. What you describe is exactly like if someone ships me a hard drive with a file containing "the entire Bee Movie script" that they were not authorized to copy: it's copyright infringement before and after I request the disk to read out the blocks with the file.

bee_rider · 2025-01-27T21:43:55 1738014235

I mean, it is IP law, this stuff was all invented to help big corps support their business models. So, it is impossible to predict what any of it means until we see who is willing to pay more to get their desired laws enforced. We’ll have to wait for more precedent to be purchased before us little people can figure out what the laws are.

hackingonempty · 2025-01-27T21:47:32 1738014452

Copies are made in the formation of the training corpus and in the memory of the computers during training so there's definitely a copyright issue. Could be fair use though.

icedchai · 2025-01-27T22:13:34 1738016014

Is there also a copyright issue with search engines?

hackingonempty · 2025-01-27T22:49:29 1738018169

No, the DMCA amended the law to give search engines (and automated caches and user generated content sites) safe harbor from infringement if they follow the takedown protocol.

maxglute · 2025-01-27T12:32:13 1737981133

>been obtained illegally.

PRC companies breaking US export control laws is legal (for PRC companies). Maybe they're trying to avoid US entity listing, lot's of PRC companies keep mum about growing capabilites to do so. But the mere fact Deepseek is publicizing means they're unlikely to care about the political heat that is coming and the ramifications. If anything, getting on US entity list probably locks in their employees with Deepseek on resume into PRC.

mytailorisrich · 2025-01-27T17:01:56 1737997316

Depending on how the law is written this may be legal even under US law.

For instance if the law bans US companies from exporting/selling some chips to Chinese companies and that's it then it is unclear to me whether a Chinese company would do anything illegal under US law by buying such chips as it would be for the American seller to refuse.

Anyway, usually this sort of things takes place through intermediaries in third countries so it is difficult to track but obviously it would be stupid to brag about it if that happened.

esperent · 2025-01-27T13:49:04 1737985744

> PRC companies breaking US export control laws is legal

So long as they don't plan to do any business with the US or any of their allies I guess.

surgical_fire · 2025-01-27T15:31:56 1737991916

Which allies? The ones the current US president is threatening in all sorts of manner?

I actually hope he doubles down. I would love for EU to rely less on the US. It would also reduce the reach of the silly embargoes that benefit no one but the US.

eagleislandsong · 2025-01-28T09:23:41 1738056221

The USA does not have allies. It has hostages.

CamperBob2 · 2025-01-28T00:37:39 1738024659

Destabilizing world trade and international relations isn't something that anyone not named Trump or Putin should be hoping for.

surgical_fire · 2025-01-28T01:39:55 1738028395

Depends on how you think this would all play out.

maxglute · 2025-01-27T15:08:53 1737990533

Hard to think they plan to, PRC strategic companies that gets competitive gets entity listed anyway. And CEO seems mission driven for AGI - if US going to limit hardware inevitably then nothing to do but go gloves off, and try to dunk on competition. At this point US can take deep seek off appstores but what's the point except to look petty. Eitherway, more technical ppl have pointed out some of the R1 optimizations _only_ make sense if Deepseek was constrained to older hardware, i.e. engineer at PTX level to circumvent H800 limitations to perfrom more like H100s.

Throwing this model out also gives US allies soverign AI a launchpad... reducing US dependency is step 1 to not being US allies.

esperent · 2025-01-27T22:55:28 1738018528

> Hard to think they plan to

They already are. You can make a paid account and use their API from most countries around the world. This is what doing business looks like.

sundaeofshock · 2025-01-27T14:29:01 1737988141

This may not be that much of a moat, as Trump seems committed to turning US current allies into former allies.

bee_rider · 2025-01-27T17:21:48 1737998508

If they sell software and build devices in China and then people from the US or our allies have to break our laws to import it, it seems like an us problem.

senko · 2025-01-27T11:39:01 1737977941

There are already some (limited) reproductions that suggest they're not completely lying (ie that there are indeed perf benefits).

belter · 2025-01-27T11:38:46 1737977926

They have a lot of H100: https://www.reddit.com/r/NVDA_Stock/comments/1iadc0s/evidenc...

msoad · 2025-01-27T11:53:12 1737978792

four GPUs are very convincing indeed! :D

bayindirh · 2025-01-27T12:19:42 1737980382

That's 8 (not 4), on a NVIDIA platform board to start with.

You can't buy them as "GPU"s and integrate them to your system. NVIDIA sells your the platform (GPUs + platform board which includes switches and all the support infra), and you integrate that behemoth of a board to your server, as a single unit.

So that open server and the wrapped ones at the back are more telling than it looks.

belter · 2025-01-27T12:00:36 1737979236

You missed the black unwrapped boxes in the background....