Huggingface codecel makes a Bluesky post dataset for ML training and posts it on Bluesky, causes an absolute seethefest from the AIphobic and is bullied by :marseytrain2:s into taking it down and apologizing :marseyxd:

https://bsky.app/profile/danielvanstrien.bsky.social/post/3lbvih4luvk23

I've removed the Bluesky data from the repo. While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake.

Daniel van Strien (@danielvanstrien.bsky.social) 2024-11-27T02:19:57.958Z

These r-slurs realize there is a public firehose API where you can collect every post right? I myself collected like 20M before I got bored and stopped.

59
Jump in the discussion.

No email address required.

why would hugging face care about transparency and consent of bluesky :!marseytrain:s when they have been doing for everywhere else?

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.