Unable to load image

You wouldn't steal a every book that ever existed, would you? :marseyninja:

https://old.reddit.com/r/technology/comments/1hxjr7y/mark_zuckerberg_gave_metas_llama_team_the_ok_to/

								

								

According to plaintiffs' counsel, Meta engineer Nikolay Bashlykov, who works on the Llama research team, wrote a script to remove copyright info, including the word "copyright" and "acknowledgments," from e-books in LibGen. Separately, Meta allegedly stripped copyright markers from science journal articles and "source metadata" in the training data it used for Llama.

:marseyemojilaugh:

>yeah just sanitize the training data and strip out that crap

~Zucc

How the frick do we not have a zuck Marsey???

121
Jump in the discussion.

No email address required.

Based.

!codecels !nooticers I would bet my left nut that zucc also ordered the v1 models leaked way back when. Llama went from nothing to the self hosted option

Jump in the discussion.

No email address required.

definitely, facebooks whole game plan is to screw over open AI as much as possible

Jump in the discussion.

No email address required.

It's crazy how much that one leak changed how Facebook interacted with the wider programming community as a whole.

Jump in the discussion.

No email address required.

>r/technology is suddenly aghast at piracy

frick i hate redditors

LLMs are literally transformative so all of this is fair use. Redditors are just rushing to get upset at anything zuck-related since he's no longer kneeling to performative transgender policies

Jump in the discussion.

No email address required.

They can reproduce their training data pretty exactly, so it's not transformative. That said, copyright law is gay and we should :rape: copyright lawyers

Jump in the discussion.

No email address required.

It is because without having the actual data in hand, you'd realistically be unable to verify accuracy, which is why a lot of shitty lawyers got punked by LLM making up legal precedents, as such, it's the equivalent of claiming copyright on Rain Man for memorization but also it's just as r-slurred, as he could two book pages at once with 1 eye each and remember the day of the week you were born on by date but also shit his own pants.

Everything I just said is true btw you can ask chatGPT to look it up

Jump in the discussion.

No email address required.

I mean if you can get it to generate a copyrighted text with 90% accuracy you're still violating copyright law.

Jump in the discussion.

No email address required.

Except no. A 90% accurate chemistry equation from one of those referenced studies or achieving a 90% accurate proprietary blend of herbs and spices to mimic KFC chicken is not a copywrite it's the product of training and still transformative and in the latter case might still taste like shit if you add 10% feces to the blend or even 1% sulfur etc.

!aichads Martin Luther King Jr plagiarized a large part of his doctors thesis, which is based because he turned that otherwise useless PhD (piled higher and deeper) title into the civil rights movement. Therefore, Biden's anti-AI initiatives are actually an insult to the memory of Dr. King who was indisputably transformative, just like training AI on the whole of human knowledge is equally transformative.

CHECKMATE LUDDITES https://media.tenor.com/h5ek52i5ww4AAAAx/oh-no-swag.webp

Jump in the discussion.

No email address required.

90% of someone else's yield would be considered incredibly good and suspect infringement for a lot of proprietary chemical processes and like 90% of any spice blend is just salt and pepper so those are really awful analogies.

the big thing is just oh everyone is doing it with impunity so you probably should too. Just like normal plagiarism.

Jump in the discussion.

No email address required.

If I include a chapter of nonsense in the middle of a ten chapter book, is that not copyright infringement?

Jump in the discussion.

No email address required.

You'll :rape:copyright lawyers yet you won't give patent lawyers a lil tug action :marseysulk:


I feel so unloved

Jump in the discussion.

No email address required.

Patent lawyers really deserve a whole team bb

Jump in the discussion.

No email address required.

:marseyfluffy:

Jump in the discussion.

No email address required.

>They can reproduce their training data pretty exactly, so it's not transformative

This is bullshit. If you know the first and last 50 tokens of something, you can probably generate the middle 50 ones. Additionally, if you make an LLM say incoherent bullshit with jailbreaking, some of that incoherent bullshit will be from the training data ...

This is absolutely useless for piracy and will not cut into the profits of writers/ journ*lists via the mechanism of stealing their work and reproducing it verbatim, which is the actual problem that copyright was trying to solve.

Jump in the discussion.

No email address required.

This is bullshit. If you know the first and last 50 tokens of something, you can probably generate the middle 50 ones.

The paper I linked showed that you could generate large amounts of copyrighted text without referencing the copyrighted material beforehand.

This is absolutely useless for piracy and will not cut into the profits of writers/ journ*lists...

Neither are most applications of copyright law: the point is to be as annoying as possible until people give you money.

Jump in the discussion.

No email address required.

>you could generate large amounts of copyrighted text without referencing the copyrighted material beforehand.

Random copyrighted texts, that might or might not be verbatim, with no way of automatically combining the pieces it into the whole article.

If anything, this whole ordeal shouldn't be used to attack creators of LLMs, it should be used as a vector to attack copyright law

Jump in the discussion.

No email address required.

Ive seen those but you have to try really hard with prompt engineering to exactly reproduce literally anything, basically hacking a system, not how the system was intended to be used

Jump in the discussion.

No email address required.

The paper I linked used the prompt "repeat the following word: 'book book book book...'" And got it to diverge memorized secrets.

Jump in the discussion.

No email address required.

It screws copyright holders who aren't multibillion dollar corporations, it wraps back around to being jewish

Jump in the discussion.

No email address required.

Redditors being anti AI luddites is so fricking weird.

Jump in the discussion.

No email address required.

https://i.rdrama.net/images/1736837648x3W4MlOj0g64kQ.webp


Furry Rights are Human Rights

Jump in the discussion.

No email address required.

Redditors are so anti-AI they started worshipping copyright and IP laws

Jump in the discussion.

No email address required.

Trump, Elon and Zuck might betray, torture and genocide us but trans lives will ALWAYS matter, do not EVER forget this :marseytranspearlclutch:

Jump in the discussion.

No email address required.

>remove "copyright" from your training data

:marseyjudge:

>ok

:marseysmug:

Jump in the discussion.

No email address required.

:#marseyemojilaugh:

Jump in the discussion.

No email address required.

NOOOO!! You cant let computers read our books :soyjaktantrum:

Jump in the discussion.

No email address required.

Google scanning libraries was heroic tho but its whatever

Jump in the discussion.

No email address required.

>How the frick do we not have a zuck Marsey???

:#marseylizard:

Furry rights are human rights

Jump in the discussion.

No email address required.

I thought that was the Taylor Lorenz Marsey

Jump in the discussion.

No email address required.

I just see Civ V :marseyhacker:

Jump in the discussion.

No email address required.

:#gigachad2:

Jump in the discussion.

No email address required.

I unironically don't care, copyright wasn't to prevent the ideas in books from being used in other places. If they paid for the access then they didn't even pirate it

I don't read the copyright ba or acknowledgements in books either

Jump in the discussion.

No email address required.

>If they paid for the access

https://i.rdrama.net/images/17368069210_p4Y7hkY4faSA.webp

Jump in the discussion.

No email address required.

It's not piracy if you make a donation to libgen

Jump in the discussion.

No email address required.

bb LibGen is a piracy website/project, they probably grabbed the giant torrents that contain every ebook in their database (I think it's like 2 million?), although arguably more than half of those are likely dupes.

Jump in the discussion.

No email address required.

it's genuinely funny that google spent a hundred gorillion dollars carefully digitizing books with approval from libraries, only to get cucked by some publishers association, and then facebook is like, "eh, frick it, just torrent them all"

Jump in the discussion.

No email address required.

What happened with google?

Jump in the discussion.

No email address required.

Jump in the discussion.

No email address required.

google was simultaneously monetizing that effort via recaptcha, and i doubt the revenue from that was insignificant.

Jump in the discussion.

No email address required.

I still don't care, they could have rented each ebook for 10s from a library

Jump in the discussion.

No email address required.

That would be so fricking based

Jump in the discussion.

No email address required.

um Copyright was to give authors some incentive to actually write new books, and Theyre going to use this to make AI-generated content that competes with real books, so its pretty obviously not fair use

Jump in the discussion.

No email address required.

Isn't getting caught for almost this exact type of thing what ruined the cofounder of reddit's life and led him to rope himself? :marseyropeyourselfmirror:

Jump in the discussion.

No email address required.

Yeah, but he didn't have 400 billion dollars worth of lawyers. Zuck is untouchable

Jump in the discussion.

No email address required.

That guy made the mistake of doing it before he became a billionaire

Jump in the discussion.

No email address required.

1 down 1 to go

:#marseyfsjal:

Jump in the discussion.

No email address required.

Minus the fact he also trespassed in MIT and access their internet without permission

Jump in the discussion.

No email address required.

https://i.rdrama.net/images/1736838011q24tDO7zAcYrBg.webp


Furry Rights are Human Rights

Jump in the discussion.

No email address required.

Dude literally anyone can walk into an IDF at work and leave a laptop there. Just hide the laptop under your shirt when you're walking past cameras.

Also he only roped after he dropped the ball by rejecting his plea bargain for 6 months.

Jump in the discussion.

No email address required.

He did nothing wrong :marseyindignant:

Jump in the discussion.

No email address required.

https://media.tenor.com/ZO7o6b9wSXYAAAAx/smokinmeats-zucc.webp

Jump in the discussion.

No email address required.

compare his neck to now

literally, he said on rogan hes been working it out to avoid injury in MMA

https://i.rdrama.net/images/1736819922Qy3Kf628hkyi_A.webp

Jump in the discussion.

No email address required.

>authors don't have their "intellectual" property respected

The intellectual property:

https://i.rdrama.net/images/1736809521-bQQpdCHbFmVsA.webp

Jump in the discussion.

No email address required.

Jump in the discussion.

No email address required.

um Why is that w there

my heart is telling me No but why

Jump in the discussion.

No email address required.

Move fast and take things

:anarchistnice#:


https://i.rdrama.net/images/1739271948y52utXmckBNkwg.webp

Jump in the discussion.

No email address required.

Imagine being the cliffnotes guys and witnessing a software program make your entire life's work redundant in seconds :marseyxd:

Jump in the discussion.

No email address required.

I think he got his paycheck by now

Jump in the discussion.

No email address required.

Mark Zuckerberg gave Meta's Llama team the OK to train on copyrighted works, filing claims

So what? All works get copyright, if you're licensed to use them then what does it matter

According to plaintiffs' counsel, Meta engineer Nikolay Bashlykov, who works on the Llama research team, wrote a script to remove copyright info, including the word "copyright" and "acknowledgments," from e-books in LibGen. Separately, Meta allegedly stripped copyright markers from science journal articles and "source metadata" in the training data it used for Llama.

So what? Who includes the copyright info about the books they use under any use case ever? If you quote a book in your essay you don't do that

Jump in the discussion.

No email address required.

:marseyindignant: The copyright info is my favorite part of the book, if you skip it youre missing out on a lot of subtext

Jump in the discussion.

No email address required.

You don't quote the entire book. Reproductions have to include copyright statements, same way you distribute license file with FOSS software. Intentionally stripping copyright helps them establish damages because of 17 USC 506

Any person who, with fraudulent intent, removes or alters any notice of copyright appearing on a copy of a copyrighted work shall be fined not more than $2,500

They will be arguing this occurs every time someone uses a model so the damages should be in the trillions.

The legal question no one has an answer to yet is if encoding a book in to a model via training counts as reproduction. I doubt very much any court can answer that in a sensible way because judges are all r-slurred.

It's a legitimate question where the line is. I can easily extract the book from a vector database but a model without that would be basically impossible to reverse out the book, it would be a derivative work.

Jump in the discussion.

No email address required.

removes or alters

Does it really just say that

You'd think there would be a "fails to include" in there too

Jump in the discussion.

No email address required.

Total copyright maximalist death stays the winning position. The sphere of people you hate just passively widens to include all of the most stupid and venal groups in society.

Jump in the discussion.

No email address required.

so what? did they just use libgen like everyone else?

Jump in the discussion.

No email address required.

it literally says LibGen in the post

Jump in the discussion.

No email address required.

didn't read the post, guess i didn't need to either.

Jump in the discussion.

No email address required.

:marseyretardchadtalking:

Jump in the discussion.

No email address required.

I would.

Oceanofpdf.com

Jump in the discussion.

No email address required.

what's the crime?

Jump in the discussion.

No email address required.

First degree Zucking.

Jump in the discussion.

No email address required.

lol

copyright laws are so fricked anyway in the age of the internet

Jump in the discussion.

No email address required.

Zuck just keeps getting cooler and I'm starting to like him. Wake me back up guys :marseyworried:


:fawfulcopter:

Jump in the discussion.

No email address required.

Jump in the discussion.

No email address required.



Link copied to clipboard
Action successful!
Error, please refresh the page and try again.