I'm logging every single Bluesky post.
Their API is designed like dogshit don't let the s that wrote it tell you otherwise.
It's like 10-15k posts per minute.
My log file is growing by like 100-200MB/hr of just text lol.
I don't get how they think Bluesky won't be used for AI training when there's an unauthenticated stream that lets you log absolutely everything.
I don't know if I'm violating the ToS because I don't care if I am.
Tell me if you want me to grep anything juicy.
Jump in the discussion.
No email address required.
!codecels behold this obscure secret hack https://docs.bsky.app/blog/jetstream
Jump in the discussion.
No email address required.
Reached out to the guy behind https://pullpush.io/ - @pullpush-actual
He's up for setting up something similar for bluesky if you can find someone to cover hosting costs
Jump in the discussion.
No email address required.
More options
Context
Google didn't show me this
Twitter had a firehose API that only fancy people could use at high cost.
There's no way they maintain this long term.
It's ripe for extreme abuse. And the userbase is going to seethe hard when AI scrapers are using it lmao.
Every piece of text and image getting scraped and turned into an LLM in real time.
Additionally that data coming in is pre-jannied, so even if it's getting censored somebody is going to do a report on what they see the users actually trying to post like fedposts etc.
Jump in the discussion.
No email address required.
Oh no their posts that they're posting online on a public website are going to be used for AI
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context