emoji-award-marseywholesome
Unable to load image
Reported by:

need help with "simps" and "simps for"

so yesterday I restored "simps" and "simps for" but made them only count post votes

the reason is because the queries that would include comment votes take way too long when they run on powerusers which causes timeouts that slow down the whole site

here's 2 examples that use my user id


QUERY 1 (for my simps): 25 seconds

explain analyze SELECT commentvotes.user_id AS commentvotes_user_id, count(commentvotes.user_id) AS count_1
        FROM commentvotes JOIN comments ON comments.id = commentvotes.comment_id
        WHERE comments.ghost = false AND comments.is_banned = false AND comments.deleted_utc = 0 AND commentvotes.vote_type = 1 AND comments.author_id = 1 GROUP BY commentvotes.user_id ORDER BY count(commentvotes.user_id) DESC;
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=101326.43..101331.71 rows=2111 width=12) (actual time=25292.671..25371.754 rows=1844 loops=1)
   Sort Key: (count(commentvotes.user_id)) DESC
   Sort Method: quicksort  Memory: 135kB
   ->  Finalize GroupAggregate  (cost=100675.05..101209.87 rows=2111 width=12) (actual time=25266.703..25370.338 rows=1844 loops=1)
         Group Key: commentvotes.user_id
         ->  Gather Merge  (cost=100675.05..101167.65 rows=4222 width=12) (actual time=25266.210..25368.234 rows=4089 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=99675.02..99680.30 rows=2111 width=12) (actual time=25174.301..25174.732 rows=1363 loops=3)
                     Sort Key: commentvotes.user_id
                     Sort Method: quicksort  Memory: 113kB
                     Worker 0:  Sort Method: quicksort  Memory: 113kB
                     Worker 1:  Sort Method: quicksort  Memory: 111kB
                     ->  Partial HashAggregate  (cost=99537.35..99558.46 rows=2111 width=12) (actual time=25172.482..25173.130 rows=1363 loops=3)
                           Group Key: commentvotes.user_id
                           Batches: 1  Memory Usage: 241kB
                           Worker 0:  Batches: 1  Memory Usage: 241kB
                           Worker 1:  Batches: 1  Memory Usage: 241kB
                           ->  Nested Loop  (cost=119.48..99291.83 rows=49104 width=4) (actual time=70.552..25064.475 rows=25093 loops=3)
                                 ->  Parallel Bitmap Heap Scan on comments  (cost=119.04..12311.25 rows=4943 width=4) (actual time=57.412..246.591 rows=4380 loops=3)
                                       Recheck Cond: (author_id = 1)
                                       Filter: ((NOT ghost) AND (NOT is_banned) AND (deleted_utc = 0))
                                       Rows Removed by Filter: 263
                                       Heap Blocks: exact=3882
                                       ->  Bitmap Index Scan on comments_user_index  (cost=0.00..116.08 rows=12353 width=0) (actual time=36.276..36.277 rows=14518 loops=1)
                                             Index Cond: (author_id = 1)
                                 ->  Index Scan using commentvotes_pkey on commentvotes  (cost=0.44..17.28 rows=32 width=8) (actual time=1.447..5.654 rows=6 loops=13141)
                                       Index Cond: (comment_id = comments.id)
                                       Filter: (vote_type = 1)
                                       Rows Removed by Filter: 0
 Planning Time: 0.764 ms
 JIT:
   Functions: 60
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 19.575 ms, Inlining 0.000 ms, Optimization 6.942 ms, Emission 122.562 ms, Total 149.079 ms
 Execution Time: 25453.921 ms
(36 rows)






QUERY 2 (for the ppl i simp for): 13 seconds

explain analyze SELECT comments.author_id AS comments_author_id, count(comments.author_id) AS count_1 
        FROM comments JOIN commentvotes ON comments.id = commentvotes.comment_id 
        WHERE comments.ghost = false AND comments.is_banned = false AND comments.deleted_utc = 0 AND commentvotes.vote_type = 1 AND commentvotes.user_id = 1 GROUP BY comments.author_id ORDER BY count(comments.author_id) DESC;

                                                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=234059.08..234064.81 rows=2293 width=12) (actual time=13511.566..13632.448 rows=3074 loops=1)
   Sort Key: (count(comments.author_id)) DESC
   Sort Method: quicksort  Memory: 241kB
   ->  Finalize GroupAggregate  (cost=233350.16..233931.10 rows=2293 width=12) (actual time=13502.703..13630.725 rows=3074 loops=1)
         Group Key: comments.author_id
         ->  Gather Merge  (cost=233350.16..233885.24 rows=4586 width=12) (actual time=13502.611..13628.135 rows=6597 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=232350.14..232355.87 rows=2293 width=12) (actual time=13404.805..13405.287 rows=2199 loops=3)
                     Sort Key: comments.author_id
                     Sort Method: quicksort  Memory: 203kB
                     Worker 0:  Sort Method: quicksort  Memory: 199kB
                     Worker 1:  Sort Method: quicksort  Memory: 197kB
                     ->  Partial HashAggregate  (cost=232199.23..232222.16 rows=2293 width=12) (actual time=13398.724..13399.556 rows=2199 loops=3)
                           Group Key: comments.author_id
                           Batches: 1  Memory Usage: 369kB
                           Worker 0:  Batches: 1  Memory Usage: 369kB
                           Worker 1:  Batches: 1  Memory Usage: 369kB
                           ->  Nested Loop  (cost=0.87..231821.34 rows=75577 width=4) (actual time=81.541..13275.534 rows=52615 loops=3)
                                 ->  Parallel Index Scan using cvote_user_index on commentvotes  (cost=0.44..129890.24 rows=78687 width=4) (actual time=64.216..9154.821 rows=55000 loops=3)
                                       Index Cond: (user_id = 1)
                                       Filter: (vote_type = 1)
                                       Rows Removed by Filter: 90
                                 ->  Index Scan using comments_pkey on comments  (cost=0.43..1.30 rows=1 width=8) (actual time=0.072..0.072 rows=1 loops=165000)
                                       Index Cond: (id = commentvotes.comment_id)
                                       Filter: ((NOT ghost) AND (NOT is_banned) AND (deleted_utc = 0))
                                       Rows Removed by Filter: 0
 Planning Time: 0.721 ms
 JIT:
   Functions: 60
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 8.837 ms, Inlining 0.000 ms, Optimization 11.450 ms, Emission 167.933 ms, Total 188.220 ms
 Execution Time: 13635.714 ms
(33 rows)

so i need help, how do I make them run faster or what indexes do I make, i didnt actually read the explain analyze outputs I posted here cuz im too lazy and need to take a shower rn so the answer might be pretty obvious (correct answer gets 50k marseybuckerinos and a code contributer badge)

115
Jump in the discussion.

No email address required.

Really should not take that long for the amount of data shown in the query plan. A quick look at the schema suggests you need more indexes, composite indexes, to cover this query. It's relying on the comments.author_id index, and looping for the rest.

Fact table would work as people suggest, but a lot of overhead to achieve something that could be fixed with better DB architecture.

Jump in the discussion.

No email address required.

what indexes would u make

Jump in the discussion.

No email address required.

I'll take a stab at it later today. Been a few years since I've used Postgres.

Jump in the discussion.

No email address required.

My test data is not very valid, hard to tell without actual data, but inserting hundreds of thousands of comments and about 70 votes each from random users.

-- See if this new index alone fixes issue


CREATE INDEX commentvotes_commentid_userid_votetype_idx ON public.commentvotes USING btree(comment_id ASC, user_id ASC, vote_type ASC)


-- Alternative way to write query, but didn't see a difference in query plan with my sample data


SELECT
    cv.user_id
    ,count(cv.comment_id) votes
FROM public.commentvotes cv
WHERE 
    cv.vote_type = 1
AND cv.comment_id IN (
    SELECT id 
    FROM public.comments
    WHERE author_id = 1
        AND deleted_utc = 0
        AND is_banned = false
        AND ghost = false
)
GROUP BY cv.user_id
ORDER BY count(cv.comment_id) DESC
Jump in the discussion.

No email address required.

CREATE INDEX commentvotes_commentid_userid_votetype_idx ON public.commentvotes USING btree(comment_id ASC, user_id ASC, vote_type ASC)

i really idk what i'm talking about,

but this doesn't create a btree entry for every permutation of (comment_id, user_id, and vote_type)?

Jump in the discussion.

No email address required.

This is the right answer. If any query is too slow the answer is always to add indexes until you don't see any more table scans

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.