You're trying to use words without the legal context here. The legal definition of words isn't 1-1 wit our colloquial usage.
Translation of a book is non-transformative and retains the original author's artistic expression.
As a counter example - if you write an essay about Picasso's Guernica painting, it is derivative according to our colloquial use of the term, but legally it's an original work.
> In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of ... the underlying work
A trained model fails that on two counts, doesn't it? Both the "includes" part, and the fact that a model is itself not an expressive work of authorship.
> And no part of the original source code is in the binary output.
It's not about whether the binary includes the raw text of the source, but whether it copies the expressive content. Anything expressive (i.e. copyrightable) in a compiled binary must have come from the sourcecode, so that's what makes it a derived work.
But the same isn't true of LLMs, which are more like "data about their inputs", than "a transformed version of their inputs".
If a trained model doesn't meet the definition of being a derivative work, it doesn't matter whether the data it's not a derivative work of was curated.
Just like a translation of a book is a derived works of the original. Or a binary compiled output is a derived works of some source code.