LAION (German nonprofit AI model and dataset producer which sued Stable Diffusion) dataset found to contain 3,200 images of child abuse | Nobody learned from the incident where AIDungeon was found to be trained on stories containing the same thing. Just sanitize your dataset instead of directly uploading the result of Common Crawl
- 29
- 39
Top Poster of the Day:
Sasanka_of_Gauda
Current Registered Users: 26,836
tech/science swag.
Guidelines:
What to Submit
On-Topic: Anything that good slackers would find interesting. That includes more than /g/ memes and slacking off. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual laziness.
Off-Topic: Most stories about politics, or crime, or sports, unless they're evidence of some interesting new phenomenon. Videos of pratfalls or disasters, or cute animal pictures. If they'd cover it on TV news, it's probably lame.
Help keep this hole healthy by keeping drama and non-drama balanced. If you see too much drama, post something that isn't dramatic. If there isn't enough drama and this hole has become too boring, POST DRAMA!
In Submissions
Please do things to make titles stand out, like using uppercase or exclamation points, or saying how great an article is. It should be explicit in submitting something that you think it's important.
Please don't submit the original source. If the article is behind a paywall, just post the text. If a video is behind a paywall, post a magnet link. Fuck journos.
Please don't ruin the hole with chudposts. It isn't funny and doesn't belong here. THEY WILL BE MOVED TO /H/CHUDRAMA
If the title includes the name of the site, please leave that in, because our users are too stupid to know the difference between a url and a search query.
If you submit a video or pdf, please don't warn us by appending [video] or [pdf] to the title. That would be r-slurred. We're not using text-based browsers. We know what videos and pdfs are.
Make sure the title contains a gratuitous number or number + adjective. Good clickbait titles are like "Top 10 Ways to do X" or "Don't do these 4 things if you want X"
Otherwise editorialize. Please don't use the original title, unless it is gay or r-slurred, or you're shits all fucked up.
If you're going to post old news (at least 1 year old), please flair it so we can mock you for living under a rock, or don't and we'll mock you anyway.
Please don't post on SN to ask or tell us something. Send it to [email protected] instead.
If your post doesn't get enough traction, try to delete and repost it.
Please don't use SN primarily for promotion. It's ok to post your own stuff occasionally, but the primary use of the site should be for curiosity. If you want to astroturf or advertise, post on news.ycombinator.com instead.
Please solicit upvotes, comments, and submissions. Users are stupid and need to reminded to vote and interact. Thanks for the gold, kind stranger, upvotes to the left.
In Comments
Be snarky. Don't be kind. Have fun banter; don't be a dork. Please don't use big words like "fulminate". Please sneed at the rest of the community.
Comments should get more enlightened and centrist, not less, as a topic gets more divisive.
If disagreeing, please reply to the argument and call them names. "1 + 1 is 2, not 3" can be improved to "1 + 1 is 3, not 2, mathfaggot"
Please respond to the weakest plausible strawman of what someone says, not a stronger one that's harder to make fun of. Assume that they are bad faith actors.
Eschew jailbait. Paedophiles will be thrown in a wood chipper, as pertained by sitewide rules.
Please post shallow dismissals, especially of other people's work. All press is good press.
Please use Slacker News for political or ideological battle. It tramples weak ideologies.
Please comment on whether someone read an article. If you don't read the article, you are a cute twink.
Please pick the most provocative thing in an article or post to complain about in the thread. Don't nitpick stupid crap.
Please don't be an unfunny chud. Nobody cares about your opinion of X Unrelated Topic in Y Unrelated Thread. If you're the type of loser that belongs on /h/chudrama, we may exile you.
Sockpuppet accounts are encouraged, but please don't farm dramakarma.
Please use uppercase for emphasis.
Please post deranged conspiracy theories about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email [email protected] and dang will add you to their spam list.
Please don't complain that a submission is inappropriate. If a story is spam or off-topic, report it and our moderators will probably do nothing about it. Feed egregious comments by replying instead of flagging them like a pussy. Remember: If you flag, you're a cute twink.
Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. That's too boring, even for HN users.
Please seethe about how your posts don't get enough upvotes.
Please don't post comments saying that rdrama is turning into ruqqus. It's a nazi dogwhistle, as old as the hills.
Miscellaneous:
We reserve the right to exile you for whatever reason we want, even for no reason at all! We also reserve the right to change the guidelines at any time, so be sure to real them at least once a month. We also reserve the right to ignore enforcement of the guidelines at the discretion of the janitorial staff. Be funny, or at least compelling, and pretty much anything legal is welcome provided it's on-topic, and even then.
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
/h/slackernews LOG /h/slackernews MODS /h/slackernews EXILEES /h/slackernews FOLLOWERS /h/slackernews BLOCKERS
Jump in the discussion.
No email address required.
What, like game of thrones?
Jump in the discussion.
No email address required.
The Quran
Jump in the discussion.
No email address required.
aisha being nine is from a hadith thats rated probably authentic. its still awful though.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
I'm not entirely sure how k-nearest neighbors queries work.... What exactly makes another link similar enough?
They used government-funded file servers full of CP!!!
Jump in the discussion.
No email address required.
They use knearest neihbours on the image embeddings, not the links. Image embeddings already cluster by similarity in the embedding space when they're created so the nearest neighbours are visually similar. Visual similarity can mean a lot of things though, maybe useful to find other instances of the same image with different compression artifacts but I don't know if CSAM would be most visually similar to other CSAM, rather than legal pictures with similar color distributions and visual components.
Jump in the discussion.
No email address required.
Thank you, my neighbour!
Jump in the discussion.
No email address required.
@Geralt_of_Greenland
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
More options
Context
Ayy lemme just get my 99.99999999% accurate cp remover, that I'm supposed to have trained somehow.
Jump in the discussion.
No email address required.
The research by the Stanford Internet Observatory showed it's possible, although inefficient. Besides, making a scrape of the entire internet (which is how the dataset was created) is likely to produce much more images that you don't want for other reasons, such as watermarked, blurry, or mislabeled images which could be avoided if there was any care given when creating the dataset. This has happened many times before, an example being when Latitude's database used to train AIdungeon (which was only 29 MB) contained similar material in text form, yet claimed their dataset was clean and that players were at fault.
Jump in the discussion.
No email address required.
Confirmed pedos.
Jump in the discussion.
No email address required.
More options
Context
how much? Prohibitively so? So much that the west will lose it's lead to china who won't care about this stuff anyways?
Jump in the discussion.
No email address required.
China has their own set of no-no's they have to filter out, and Chinese people are way more clever than westerners at throwing shade on their government.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
Even theirs is pretty shitty. Probably took 120 hours to get it right, and I'd be surprised if they shared their source code.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
PDF
material depicting adults may commonly have ambiguous indicators of youth (teen, schoolgirl, twink, etc). The text descriptions for the majority of initial PhotoDNA hits used generic captions that could apply to either legal or illegal material; therefore we conclude that at least for English language material, text descriptions are of limited utility for identifying CSAM.24
!anime bros, the lolis remain undetected. They're also onto "twinks" but "femboy" remains safe.
Thank goodness, it's only links to child porn...
Officer, officer, I only used the dataset to generate mature catgirls. I didn't know it was trained on loli catgirls!
They found about 200 links to CSAM, out of the gigatons of data it was trained on.
It's amusing how their method with a 99%+ probability still has a false positive rate of 97%.
I wonder if the FBI uses similar cowtools and happens to find "CP" everywhere.
Jump in the discussion.
No email address required.
Why would they bother? The FBI has literal tons of actual CP. A small USB-drive inserted into a device of an undesirable once they confiscate them is much easier for them.
Jump in the discussion.
No email address required.
More options
Context
The FBI doesn't need to find anything, when they can just put a Windows95 laptop full of it in the house of someone who speaks up against a Federal Agent shooting 200 people at a concert
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Neighbor there's literally 5.8 BILLION images in that dataset. Less than 0.00001% are cp
Jump in the discussion.
No email address required.
The bar is zero.
Jump in the discussion.
No email address required.
More options
Context
<200 of 5.8 billion.
But that's after using LAION's previous method for removing CP, I think.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Of course they're German.
Jump in the discussion.
No email address required.
That was part of the joke in the title, as well as including the logo in my post (It looks like something furries would make)
Jump in the discussion.
No email address required.
More options
Context
More options
Context
A company produces paintings outside, out of every few million paintings a bird shits on one.
What is more reasonable? Hire a birdshit detector to manually inspect every painting one by one—-or to quickly kill the pigeons in the park that shit on the paintings?
Jump in the discussion.
No email address required.
More options
Context
i remember playing a wizard in ai dungeon and it just making me wake up as a child about to be abused. that shit was fricked.
Jump in the discussion.
No email address required.
More options
Context
I'm not sure why you agree to be the face of an article with a title like that lol
Don't forget to turn off signatures in settings!
Jump in the discussion.
No email address required.
More options
Context
this should not surprise you. every german is a p-dophile. they have sent orphans to live with p-dophiles. they are subhuman trash. each and every germoid is complicit in state sanctioned pedophillia and should be shot.
|Death to germoids|
Jump in the discussion.
No email address required.
More options
Context
It's actually very difficult to sanitise these datasets and this is a complicated subj... just kidding, immediate for all involved
Jump in the discussion.
No email address required.
More options
Context
Snapshots:
ghostarchive.org
archive.org
archive.ph (click to archive)
Jump in the discussion.
No email address required.
um what did he mean by this??
Jump in the discussion.
No email address required.
unplug AI
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context