Unable to load image

You wouldn't steal a every book that ever existed, would you? :marseyninja:

https://old.reddit.com/r/technology/comments/1hxjr7y/mark_zuckerberg_gave_metas_llama_team_the_ok_to/

								

								

According to plaintiffs' counsel, Meta engineer Nikolay Bashlykov, who works on the Llama research team, wrote a script to remove copyright info, including the word "copyright" and "acknowledgments," from e-books in LibGen. Separately, Meta allegedly stripped copyright markers from science journal articles and "source metadata" in the training data it used for Llama.

:marseyemojilaugh:

>yeah just sanitize the training data and strip out that crap

~Zucc

How the frick do we not have a zuck Marsey???

121
Jump in the discussion.

No email address required.

Mark Zuckerberg gave Meta's Llama team the OK to train on copyrighted works, filing claims

So what? All works get copyright, if you're licensed to use them then what does it matter

According to plaintiffs' counsel, Meta engineer Nikolay Bashlykov, who works on the Llama research team, wrote a script to remove copyright info, including the word "copyright" and "acknowledgments," from e-books in LibGen. Separately, Meta allegedly stripped copyright markers from science journal articles and "source metadata" in the training data it used for Llama.

So what? Who includes the copyright info about the books they use under any use case ever? If you quote a book in your essay you don't do that

Jump in the discussion.

No email address required.

:marseyindignant: The copyright info is my favorite part of the book, if you skip it youre missing out on a lot of subtext

Jump in the discussion.

No email address required.

You don't quote the entire book. Reproductions have to include copyright statements, same way you distribute license file with FOSS software. Intentionally stripping copyright helps them establish damages because of 17 USC 506

Any person who, with fraudulent intent, removes or alters any notice of copyright appearing on a copy of a copyrighted work shall be fined not more than $2,500

They will be arguing this occurs every time someone uses a model so the damages should be in the trillions.

The legal question no one has an answer to yet is if encoding a book in to a model via training counts as reproduction. I doubt very much any court can answer that in a sensible way because judges are all r-slurred.

It's a legitimate question where the line is. I can easily extract the book from a vector database but a model without that would be basically impossible to reverse out the book, it would be a derivative work.

Jump in the discussion.

No email address required.

removes or alters

Does it really just say that

You'd think there would be a "fails to include" in there too

Jump in the discussion.

No email address required.



Link copied to clipboard
Action successful!
Error, please refresh the page and try again.