Context
For my recent project to add hashtags to automeme, I created a database that contained the strength of the relationship between every pair of tokens that appeared in rdrama comments since August of last year. (Two tokens are said to be "related" if they appear in the same comment)
This allows me to find keywords that might be related to other keywords. For instance, "train" relates to these words (excluding "uninteresting" words) (in order of most to least powerful):
- trains
- women
- trans
- drama
- ai
- kids
- posts
- online
- men
- dude
- retarded
- thread
- rdrama
- ones
- weird
- crazy
- funny
- gay
- chud
RDrama relates to
- drama
- rightoids
- thread
- marsey
- website
- banned
- comments
- internet
- funny
- jannies
- carp
- fun
- rightoid
- sub
- bait
- online
- retarded
- chud
- net
- dot
- content
Okay, so it isn't perfect. But it is cheap! Running this system lasts only a few seconds (I do a lot of additional processing, including some graph searches for adjacent topics
Most Powerful
The most used (INTERESTING) words are
women - 7291
based - 5466
retarded - 5045
reddit - 5030
true - 4211
rdrama - 4158
funny - 4097
gay - 3836
twitter - 3775
men - 3762
god - 3746
fat - 3672
kids - 3621
fun - 3500
drama - 3368
trans - 3333
dude - 3256
internet - 3227
chud - 3214
foid - 3187
tbh - 3030
marsey - 2958
foids - 2898
unironically - 2806
sounds - 2799
retard - 2720
ones - 2698
cool - 2675
thread - 2673
ass - 2648
days - 2603
tho - 2602
times - 2599
weird - 2588
posts - 2528
stupid - 2385
comments - 2279
carp - 2263
Yes, dramatards are obsessed with women. Straggot alert!
What are the most powerful relationships? Well...
(women, men) 1208
(women, trans) 449
(ukraine, russia) 419
(jesus, christ) 416
(mental, illness) 343
(twink, cute) 281
(harry, potter) 263
(rightoids, leftoids) 253
(russia, russian) 249
(twitter, elon) 243
(ukraine, russian) 238
(men, gay) 236
(kids, parents) 229
(male, female) 229
(trans, rights) 215
(women, male) 214
(foids, foid) 208
(trans, lives) 205
(twitter, reddit) 200
(times, multiple) 195
(women, foids) 195
(foids, moids) 195
(women, fat) 194
(women, gay) 191
(cope, seethe) 191
(sub, bait) 189
(sub, reddit) 188
(reddit, banned) 187
(musk, elon) 184
(chud, award) 183
(rdrama, reddit) 183
(men, foids) 176
(women, female) 174
(reddit, comments) 170
(women, children) 168
(russian, ukrainian) 167
(kids, children) 166
(users, rdrama) 155
(posts, comments) 155
(reddit, subs) 153
(kiwi, farms) 153
(posts, reddit) 153
(tate, andrew) 150
(cute, twinks) 144
(women, foid) 143
(men, trans) 142
(women, attractive) 142
(online, internet) 141
(women, funny) 139
(filter, slur) 137
(youtube, videos) 137
(men, male) 137
(china, russia) 136
(thread, reddit) 135
(women, true) 135
(ai, artists) 133
(alex, jones) 131
(reddit, jannies) 130
(ukraine, ukrainian) 129
(rdrama, user) 129
(god, bless) 128
(chad, virgin) 128
(foid, moid) 128
(kids, women) 128
(women, rape) 128
(democratic, collapse) 127
(heckin, valid) 126
(retarded, women) 126
(women, ones) 125
(ugly, fat) 125
Pretty Graphs
The image above is a graph that links the most common words together. The darker the line connecting them, the more powerful the connection.
Here's a slightly more frantic one, with a lot more nodes...
Here's one with around a thousand nodes
Future
The algorithim as I implemented it does not understand n-grams, and I have been brainstorming ways to add n-gram support. Also, a lot of the tokens are the same word in different tenses, so perhaps I could consolidate those tokens.
Jump in the discussion.
No email address required.
Wow enlightening
Jump in the discussion.
No email address required.
omg
Jump in the discussion.
No email address required.
More options
Context
Awww it's Aevann
Jump in the discussion.
No email address required.
More options
Context
More options
Context