Hacker News new | past | comments | ask | show | jobs | submit login

Personally I think trained models are derived works of all the training data.

Just like a translation of a book is a derived works of the original. Or a binary compiled output is a derived works of some source code.




You're trying to use words without the legal context here. The legal definition of words isn't 1-1 wit our colloquial usage.

Translation of a book is non-transformative and retains the original author's artistic expression.

As a counter example - if you write an essay about Picasso's Guernica painting, it is derivative according to our colloquial use of the term, but legally it's an original work.


Wikipedia:

> In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of ... the underlying work

A trained model fails that on two counts, doesn't it? Both the "includes" part, and the fact that a model is itself not an expressive work of authorship.


I'm not sure. If it fails, then I reckon a binary compiled from source code fails top.

There's nothing creative about the act of a compiler, it is automatic, just like the training run of an LLM.

And no part of the original source code is in the binary output.

And yet, binaries are a derived work from the source code that went into them.

So something is up! I am not a lawyer though.


> And no part of the original source code is in the binary output.

It's not about whether the binary includes the raw text of the source, but whether it copies the expressive content. Anything expressive (i.e. copyrightable) in a compiled binary must have come from the sourcecode, so that's what makes it a derived work.

But the same isn't true of LLMs, which are more like "data about their inputs", than "a transformed version of their inputs".


Curating training data is an exercise in editorial judgement.


If a trained model doesn't meet the definition of being a derivative work, it doesn't matter whether the data it's not a derivative work of was curated.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: