Performance problem with table containing a lot of text (blog)

From: Kari Lavikka <tuner(at)bdb(dot)fi>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance problem with table containing a lot of text (blog)
Date: 2007-08-28 18:53:37
Message-ID: Pine.HPX.4.62.0708282016250.3324@purple.bdb.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello!

Some background info.. We have a blog table that contains about eight
million blog entries. Average length of an entry is 1200 letters. Because
each 8k page can accommodate only a few entries, every query that involves
several entries causes several random seeks to disk. We are having
problems with queries like:

1) give me a list of months when I have written someting
2) give me id's of entries I have written on month X year X
3) give me the number of blog entries my friends have written since last
time

Clustering would probably decrease random seeks but it is not an option.
It locks the table and operation would take "some" time. It should also be
done periodically to maintain clustering.

I guess that file system cache gets filled with text contents of blog
entries although they are totally useless for queries like these. Contents
of individual blog entries are cached to memcached on application level
anyway. There's rarely any need to fetch them from database.

It would be nice if I could flag a column to be toasted always, regardless
of it's length.

Because there isn't such option maybe I should create a separate table for
blog text content. Does anybody have better ideas for this? :)

Thanks!

P.S. Here's a plan for query #3. Users can have several bookmark groups
they are following. User can limit visibility of an entry to some of
his/her bookmark group. Those are not any kind of bottlenecks anyway...

Sort (cost=34112.60..34117.94 rows=2138 width=14)
Sort Key: count(*), upper((u.nick)::text)
-> HashAggregate (cost=33962.28..33994.35 rows=2138 width=14)
-> Nested Loop (cost=8399.95..33946.24 rows=2138 width=14)
-> Nested Loop (cost=8399.95..9133.16 rows=90 width=22)
-> HashAggregate (cost=8399.95..8402.32 rows=237 width=8)
-> Nested Loop (cost=0.00..8395.99 rows=792 width=8)
-> Index Scan using user_bookmark_uid on user_bookmark ub (cost=0.00..541.39 rows=2368 width=12)
Index Cond: (uid = 256979)
-> Index Scan using user_bookmark_group_pkey on user_bookmark_group bg (cost=0.00..3.30 rows=1 width=4)
Index Cond: ("outer".bookmark_group_id = bg.bookmark_group_id)
Filter: (("type" >= 0) AND ("type" <= 1) AND (trace_blog = 'y'::bpchar))
-> Index Scan using users_uid_accepted_only on users u (cost=0.00..3.06 rows=1 width=14)
Index Cond: (u.uid = "outer".marked_uid)
-> Index Scan using blog_entry_uid_beid on blog_entry be (cost=0.00..275.34 rows=24 width=8)
Index Cond: ((be.uid = "outer".marked_uid) AND (COALESCE("outer".last_seen_blog_entry_id, 0) < be.blog_entry_id))
Filter: ((visibility = 'p'::bpchar) AND ((status = 'p'::bpchar) OR (status = 'l'::bpchar)) AND ((bookmark_group_id IS NULL) OR (subplan)))
SubPlan
-> Index Scan using user_bookmark_pkey on user_bookmark fub (cost=0.00..3.42 rows=1 width=0)
Index Cond: ((bookmark_group_id = $0) AND (marked_uid = 256979))

P.S. That particular user has quite many unread entries though...

|\__/|
( oo ) Kari Lavikka - tuner(at)bdb(dot)fi - (050) 380 3808
__ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _
""

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dan Harris 2007-08-28 19:04:31 Re: Performance problem with table containing a lot of text (blog)
Previous Message Tom Lane 2007-08-28 18:48:00 Re: Fwd: 8.2 Query 10 times slower than 8.1 (view-heavy)