Jump in the discussion.

No email address required.

>Just sanitize your 6B pictures dataset

Ayy lemme just get my 99.99999999% accurate cp remover, that I'm supposed to have trained somehow.

Jump in the discussion.

No email address required.

The research by the Stanford Internet Observatory showed it's possible, although inefficient. Besides, making a scrape of the entire internet (which is how the dataset was created) is likely to produce much more images that you don't want for other reasons, such as watermarked, blurry, or mislabeled images which could be avoided if there was any care given when creating the dataset. This has happened many times before, an example being when Latitude's database used to train AIdungeon (which was only 29 MB) contained similar material in text form, yet claimed their dataset was clean and that players were at fault.

Jump in the discussion.

No email address required.

>Stanford Internet Observatory showed it's possible

Confirmed pedos.

Jump in the discussion.

No email address required.

>although inefficient

how much? Prohibitively so? So much that the west will lose it's lead to china who won't care about this stuff anyways?

Jump in the discussion.

No email address required.

China has their own set of no-no's they have to filter out, and Chinese people are way more clever than westerners at throwing shade on their government.

Jump in the discussion.

No email address required.

Even theirs is pretty shitty. Probably took 120 hours to get it right, and I'd be surprised if they shared their source code.

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.