I did it. I fixed @automeme's hashtag algorithim.

It's late as frick and I just deployed it so I don't want to fully explain how it works, but the gist is that I have been secretly collecting ~500,000 rdrama comments. I wrote a script to figure out what terms correlate with other terms based on the comments. I also look at wikipedia's top trending articles to see what people are talking about.

Hilariously, because of rdrama's fixation on trans people, and the fact that a prominent trans activist recently died, almost every post I tried ended up having "#FrickingTransWomen" appended to the end of it. I thought that was hilarious so I knew I had to get the bot running before it goes away.

Heymoon it won't fix anything, your bot sux, keep yourself safe

i dont care

77
Jump in the discussion.

No email address required.

A regex does not constitute an "algorithm".

Please stop smartwashing yourself.

Jump in the discussion.

No email address required.

Lol its quite a bit more involved than a regex

Jump in the discussion.

No email address required.

![](https://media.giphy.com/media/1tHzw9PZCB3gY/giphy.webp)

Jump in the discussion.

No email address required.

Do a rdrama wordcloud.

I did a /r/politics and /r/conservative wordcloud from comments and titles the other day in jupyter for something I'm lazily fiddling with. It's quite effective.

I wanna make it clickable but I'm using the wrong wordcloud lib for that and I need to investigate another one

You need to slice off the most common words to be left with relevant topics, but ideally leave in the nouns. Unfortunately I haven't got a free dataset of common words which includes word-type columns, you have to pay for that, so I'm just slicing off the 20k most common words.

https://www.kaggle.com/datasets/rtatman/english-word-frequency

![](/images/16739756640568073.webp) ![](/images/16739756837808378.webp)

Jump in the discussion.

No email address required.

a quirk of my implementation is that i never actually counted the tokens, just determined their relationship to each other. So it is kind of a "word graph". so "jesus" and "christ" are connected, for instance

if there is interest I will see about making a visualization of it (or maybe a small part of it because it will def kill my computer if I try to represent the entire graph lol)

Jump in the discussion.

No email address required.

I don't know what this bootleg wordcloud lib is doing behind the scenes but I know that simple statistical analysis is going to produce garbage results compared to an actual AI model of some type, even if it is a simple one.

Some sort of backwards-looking analysis would help, i.e. you take a 30 day average corpus of the subject bodies/titles and then compare only last day to that, i.e. to filter out the popular topics that day from the general background words that are used in general. Exactly how best to do that is a bit beyond me.

Another one I can think of is simply drop all common words and all words less than 4 letters and then analyse word pairs, ideally with the help of some sort of 'synonym condensing' AI that's like a reverse thesaurus lookup, it analyses the corpus and converts words to their root sentiment word, i.e. useless, hopeless, incompetent, morons, idiots, stupid, fools all get rendered down to one token for the analysis.

But where to start, lol

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.