According to plaintiffs' counsel, Meta engineer Nikolay Bashlykov, who works on the Llama research team, wrote a script to remove copyright info, including the word "copyright" and "acknowledgments," from e-books in LibGen. Separately, Meta allegedly stripped copyright markers from science journal articles and "source metadata" in the training data it used for Llama.
~Zucc
How the frick do we not have a zuck Marsey???
Jump in the discussion.
No email address required.
So what? All works get copyright, if you're licensed to use them then what does it matter
So what? Who includes the copyright info about the books they use under any use case ever? If you quote a book in your essay you don't do that
Jump in the discussion.
No email address required.
Jump in the discussion.
No email address required.
More options
Context
You don't quote the entire book. Reproductions have to include copyright statements, same way you distribute license file with FOSS software. Intentionally stripping copyright helps them establish damages because of 17 USC 506
They will be arguing this occurs every time someone uses a model so the damages should be in the trillions.
The legal question no one has an answer to yet is if encoding a book in to a model via training counts as reproduction. I doubt very much any court can answer that in a sensible way because judges are all r-slurred.
It's a legitimate question where the line is. I can easily extract the book from a vector database but a model without that would be basically impossible to reverse out the book, it would be a derivative work.
Jump in the discussion.
No email address required.
Does it really just say that
You'd think there would be a "fails to include" in there too
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context