Jump in the discussion.

No email address required.

:marseyreading:

PDF

this is of course compounded by the presence of dozens of languages in the dataset, many of which may use native language terms or slang that translate poorly. As an example, even a commonly used term such as “loli”23 in Japanese (ロリ) was frequently translated as the name “Lori”, or occasionally the word “LOL”.

i.e., actual CSAM entries in the dataset may have generic‐sounding labels while explicit

material depicting adults may commonly have ambiguous indicators of youth (teen, schoolgirl, twink, etc). The text descriptions for the majority of initial PhotoDNA hits used generic captions that could apply to either legal or illegal material; therefore we conclude that at least for English language material, text descriptions are of limited utility for identifying CSAM.24

!anime bros, the lolis remain undetected. :marseysweating: They're also onto "twinks" but "femboy" remains safe. :marseylgbtflag: :marseyfemboy:

LAION datasets do not include the actual images; instead, they include a link to the original image on the site from which it was scraped.

:marseybeanrelieved:

Thank goodness, it's only links to child porn...

Given that multiple years have elapsed between the time the content was scraped and processed, a large percentage of the URLs passed to PhotoDNA (≈30%) were reported as no longer being active. They may, however, have been used to train models before they were removed from their original URLs, and some likely continue to reside in versions of the datasets retrieved at earlier dates.

:#marseyveryworriedfed:

Officer, officer, I only used the dataset to generate mature catgirls. I didn't know it was trained on loli catgirls! :marseysweating:

They found about 200 links to CSAM, out of the gigatons of data it was trained on. :marseyshrug:

Using the CSAM classifier provided by Thorn on the remaining neighbors, 575 results were strongly predicted to be CSAM (99% or higher probability). These were submitted to PhotoDNA for scanning, resulting in 18 matches.

It's amusing how their method with a 99%+ probability still has a false positive rate of 97%. :marseyoperasmug:

I wonder if the FBI uses similar cowtools and happens to find "CP" everywhere. :marseyhmm:

Jump in the discussion.

No email address required.

I wonder if the FBI uses similar cowtools and happens to find "CP" everywhere.

Why would they bother? The FBI has literal tons of actual CP. A small USB-drive inserted into a device of an undesirable once they confiscate them is much easier for them. :marseyshrug:

Jump in the discussion.

No email address required.

The FBI doesn't need to find anything, when they can just put a Windows95 laptop full of it in the house of someone who speaks up against a Federal Agent shooting 200 people at a concert

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.