Unable to load image

Comments search is back and faster than ever

tl;dr— Comments search works again. It is much, much faster. Also search keywords for comments now only search for the whole word.

This all began forty-eight hours ago. The WPD server was down, I'd already broken out the bourbon (as is customary when your server is down), and someone was trying to search for comments containing the word china. Comments search always was slow because we used to do an exhaustive search through the full text of all 2.2 million comments on the site. But, man, china was even slower than slow: it was grinding the site to a halt for minutes at a time. We don't normally log search queries, but china was so slow it was crashing the server, which showed up in the crash error message.

It was 2 AM, we were busy dealing with WPD, and I wasn't exactly sober enough to debug, so we disabled comments search until we had time to look at it. The next day, a dozen of you let us know comment search wasn't working. A dozen-minus-one of you didn't scroll in the bugs thread to notice that's what everyone else was reporting and we'd already explained why.

Anyway, it's re-enabled, and it's a lot faster. china takes half a second, not two minutes. This comes with some minor changes in functionality. First, word substring searches don't work on comments now—carp only finds the exact word carp (or Carp), not carpathianflorist or escarpment (this probably breaks nwordcountbot; sorry @geese_suck). Also, I have no idea what "exact search" syntax with "s does any more; probably just guarantees you get zero results. Report weird search results here and we'll iron it out soon. Just wanted to get comment search back online.

71
Jump in the discussion.

No email address required.

by the way, we still love you for reporting it as many times as you did

even though it had already been reported thirty times first <3

Jump in the discussion.

No email address required.

I reported several times pls give bug report badges

![](/images/16647120777456915.webp)

Jump in the discussion.

No email address required.

There are seriously users here who need to search for and reread a comment made by another r-slurred user months ago? Why? Do you really need to see that Cirno post again from 6 months ago?

:marseygigaretard:


![](https://files.catbox.moe/y2zrro.png)

Jump in the discussion.

No email address required.

I used marseysearch trying to find a post I made on reddit years ago. Turns out I imagined making that post, but I found that I'd generated so much content that I've already forgotten. I can just peddle it as being new here as long as nobody checks the expiration date. It'll be like cleaning out my refrigerator by unloading everything from 2019 onto a stupid neighbor.

Jump in the discussion.

No email address required.

>Turns out I imagined making that post

:marseymeds:

Jump in the discussion.

No email address required.

That's just being a lazy dramatard imho. I don't want post dated expired drama the lolcows have long moved on or died from AIDs by now.


![](https://files.catbox.moe/y2zrro.png)

Jump in the discussion.

No email address required.

I used it to find the "escape from Predditor Mansion" art post so I could post it on PCM, and then the "Marsey in her room" and "Marsey the country cat girl" posts when they asked if there were any similar pictures.

Jump in the discussion.

No email address required.

I used it to find a post about Japanese tweets I didn't get to finish reading the other day

Jump in the discussion.

No email address required.

Comments search always was slow because we used to do an exhaustive search through the full text of all 2.2 million comments on the site.

I don't know anything about search algorithms but I'm shocked it took this long for this to become a problem.

Jump in the discussion.

No email address required.

Apparently some people didn't have a father who would beat them at age 9 if they failed to optimize their database indexes. :marseyshrug:

Jump in the discussion.

No email address required.

avoids thining about a 60 GB unoptimized SQLite db that has random json fields strewn about that I have

Jump in the discussion.

No email address required.

#justtransgirlthings :marseytrans2:

Jump in the discussion.

No email address required.

yeah it's super dumb but i do code things sometimes

Jump in the discussion.

No email address required.

I'll let LLM know

Jump in the discussion.

No email address required.

He already knows

![](https://media.giphy.com/media/X05U0gOPkQ4G4/giphy.webp)

Jump in the discussion.

No email address required.

marseyschizos finds :marseyschizosal: comments

marseyschizosa or marseyschizosal finds nothing

it's not a length thing, marseyace finds nothing

:marseycapysorenjump#2:

Jump in the discussion.

No email address required.

:#marseycapymad:

Jump in the discussion.

No email address required.

:#marseywise:

Jump in the discussion.

No email address required.

Actual footage of @TwoLargeSnakesMating while he was making this post.

Jump in the discussion.

No email address required.

search doesn't really work? searching for marseybeanquestion gives no results for example

Jump in the discussion.

No email address required.

Should be fixed now :marseythumbsup: There was a minor issue with long words and composite words like marsey names. (cc: @grizzly — same bug you reported)

Jump in the discussion.

No email address required.

it looks like it works modulo english affixes, so schizos also yields schizo and vice versa

:marseyneat:

Jump in the discussion.

No email address required.

Postgres has this neat way of vectorizing text that's language-aware. On one hand, I'm not sure the lexeme analyzer knows how to really parse some agglutination like marseyschizogetogetolove, but I think it can at least do it consistently. … Also means I don't need to set up a real search service. I worked with Solr once years ago, and I hope it can be many more years before I have to touch it again.

Jump in the discussion.

No email address required.

Shit is useless for people who know what they're looking for, I hate it

Jump in the discussion.

No email address required.

Neither does searching for "the" (without quotes)

Jump in the discussion.

No email address required.

Ah, yeah, that was another casualty of the fix. Insignificant words like "the" and "a" get optimized out now.

Jump in the discussion.

No email address required.

But how can I distinguish between "Batman" and "The Batman"????? :soymad:

Jump in the discussion.

No email address required.

Ok! :marseyschizosnakeslove:

Jump in the discussion.

No email address required.

word substring searches don't work on comments now—carp only finds the exact word carp (or Carp), not carpathianflorist or escarpment

I think that's acceptable here. I only need this capability when I'm searching for pirated stuff on P2P networks :marseyboomer: where they've got weird naming conventions or when I'm doing some kind of repetitive data entry where I only need to type in 3 characters for the computer to know what I'm saying.

Jump in the discussion.

No email address required.

This makes it much harder to dig up months old comments that I vaguely remember, but anything in the name of :marseysonic: :marseyclapping:

Jump in the discussion.

No email address required.

Oh boy can’t wait to see if the minetest server is back up :marseyaware:

Jump in the discussion.

No email address required.

I swear it will be back this weekend :marseycrying2:

Jump in the discussion.

No email address required.

hope so :marseybegging: I need my minecraft fix and I'm not about to join some zoomer server

Jump in the discussion.

No email address required.

chinagate

Jump in the discussion.

No email address required.

this probably breaks nwordcountbot; sorry @geese_suck)

good thing no one uses it lol

Jump in the discussion.

No email address required.

tl;dr search is still broken

You need to build a prefix table for all the words in your comments table so that the search will work correctly as well as fast. Ill tell you how for DC

Jump in the discussion.

No email address required.

cant you just use elasticsearch?

if it doesnt have substrings or any kind of stemming i dont think you get to sound to pleased with yourself

even postgres builtin full-text indexing gives you stemming and the like. are you using mysql or something lol

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.