Unable to load image

Comments search is back and faster than ever

tl;dr— Comments search works again. It is much, much faster. Also search keywords for comments now only search for the whole word.

This all began forty-eight hours ago. The WPD server was down, I'd already broken out the bourbon (as is customary when your server is down), and someone was trying to search for comments containing the word china. Comments search always was slow because we used to do an exhaustive search through the full text of all 2.2 million comments on the site. But, man, china was even slower than slow: it was grinding the site to a halt for minutes at a time. We don't normally log search queries, but china was so slow it was crashing the server, which showed up in the crash error message.

It was 2 AM, we were busy dealing with WPD, and I wasn't exactly sober enough to debug, so we disabled comments search until we had time to look at it. The next day, a dozen of you let us know comment search wasn't working. A dozen-minus-one of you didn't scroll in the bugs thread to notice that's what everyone else was reporting and we'd already explained why.

Anyway, it's re-enabled, and it's a lot faster. china takes half a second, not two minutes. This comes with some minor changes in functionality. First, word substring searches don't work on comments now—carp only finds the exact word carp (or Carp), not carpathianflorist or escarpment (this probably breaks nwordcountbot; sorry @geese_suck). Also, I have no idea what "exact search" syntax with "s does any more; probably just guarantees you get zero results. Report weird search results here and we'll iron it out soon. Just wanted to get comment search back online.

71
Jump in the discussion.

No email address required.

search doesn't really work? searching for marseybeanquestion gives no results for example

Jump in the discussion.

No email address required.

Neither does searching for "the" (without quotes)

Jump in the discussion.

No email address required.

Ah, yeah, that was another casualty of the fix. Insignificant words like "the" and "a" get optimized out now.

Jump in the discussion.

No email address required.

But how can I distinguish between "Batman" and "The Batman"????? :soymad:

Jump in the discussion.

No email address required.

Ok! :marseyschizosnakeslove:

Jump in the discussion.

No email address required.

Should be fixed now :marseythumbsup: There was a minor issue with long words and composite words like marsey names. (cc: @grizzly — same bug you reported)

Jump in the discussion.

No email address required.

it looks like it works modulo english affixes, so schizos also yields schizo and vice versa

:marseyneat:

Jump in the discussion.

No email address required.

Postgres has this neat way of vectorizing text that's language-aware. On one hand, I'm not sure the lexeme analyzer knows how to really parse some agglutination like marseyschizogetogetolove, but I think it can at least do it consistently. … Also means I don't need to set up a real search service. I worked with Solr once years ago, and I hope it can be many more years before I have to touch it again.

Jump in the discussion.

No email address required.

Shit is useless for people who know what they're looking for, I hate it

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.