Re: Problems with FTS

Lists: pgsql-performance
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <rauan(at)maemirov(dot)com>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: Problems with FTS
Date: 2010-12-18 17:56:57
Message-ID: 4D0CA1890200002500038774@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Rauan Maemirov wrote:

> EXPLAIN SELECT [...]

Please show us the results of EXPLAIN ANALYZE SELECT ...

Also, please show us the table layout (including indexes), and
details about your hardware and PostgreSQL configuration. See this
page for details:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

> As you can see the query doesn't use index.

That means that either the optimizer thinks that the index isn't
usable for this query (due to type mismatch or some such) or that it
thinks a plan without the index costs less to run (i.e., it will
generally run faster). You haven't told us enough to know whether
that is actually true, much less how to allow PostgreSQL to develop
more accurate costing estimates in your environment if it's currently
wrong about this.

-Kevin


From: Rauan Maemirov <rauan(at)maemirov(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Problems with FTS
Date: 2011-01-11 08:16:10
Message-ID: AANLkTim4vLhMB5Ty6hTN4XdNoJP0BWYxguqEXt0+r920@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi, Kevin.

Sorry for long delay.

EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
WHERE (v.active) AND (v.fts @@
'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery and
v.id <> 500563 )
ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
1) DESC, v.views DESC
LIMIT 6

"Limit (cost=103975.50..103975.52 rows=6 width=280) (actual
time=2893.193..2893.199 rows=6 loops=1)"
" -> Sort (cost=103975.50..104206.07 rows=92228 width=280) (actual
time=2893.189..2893.193 rows=6 loops=1)"
" Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( (
( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views"
" Sort Method: top-N heapsort Memory: 25kB"
" -> Seq Scan on video v (cost=0.00..102322.34 rows=92228
width=280) (actual time=0.100..2846.639 rows=54509 loops=1)"
" Filter: (active AND (fts @@ '( ( ( ( ( ''dexter'':A |
''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
''сезон'':A ) | ''серия'':A'::tsquery) AND (id <> 500563))"
"Total runtime: 2893.264 ms"

Table scheme:

CREATE TABLE video
(
id bigserial NOT NULL,
hash character varying(12),
account_id bigint NOT NULL,
category_id smallint NOT NULL,
converted boolean NOT NULL DEFAULT false,
active boolean NOT NULL DEFAULT true,
title character varying(255),
description text,
tags character varying(1000),
authorized boolean NOT NULL DEFAULT false,
adult boolean NOT NULL DEFAULT false,
views bigint DEFAULT 0,
rating real NOT NULL DEFAULT 0,
screen smallint DEFAULT 2,
duration smallint,
"type" smallint DEFAULT 0,
mp4 smallint NOT NULL DEFAULT 0,
size bigint,
size_high bigint DEFAULT 0,
source character varying(255),
storage_id smallint NOT NULL DEFAULT 1,
rule_watching smallint,
rule_commenting smallint,
count_comments integer NOT NULL DEFAULT 0,
count_likes integer NOT NULL DEFAULT 0,
count_faves integer NOT NULL DEFAULT 0,
fts tsvector,
modified timestamp without time zone NOT NULL DEFAULT now(),
created timestamp without time zone DEFAULT now(),
CONSTRAINT video_pkey PRIMARY KEY (id),
CONSTRAINT video_hash_key UNIQUE (hash)
)
WITH (
OIDS=FALSE
);

Indexes:

CREATE INDEX idx_video_account_id ON video USING btree (account_id);
CREATE INDEX idx_video_created ON video USING btree (created);
CREATE INDEX idx_video_fts ON video USING gin (fts);
CREATE INDEX idx_video_hash ON video USING hash (hash);

(here I tried both gist and gin indexes)

I have 32Gb ram and 2 core quad E5520, 2.27GHz (8Mb cache).

Pgsql conf:
max_connections = 200
shared_buffers = 7680MB
work_mem = 128MB
maintenance_work_mem = 1GB
effective_cache_size = 22GB
default_statistics_target = 100

Anything else?

2010/12/18 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>

> Rauan Maemirov wrote:
>
> > EXPLAIN SELECT [...]
>
> Please show us the results of EXPLAIN ANALYZE SELECT ...
>
> Also, please show us the table layout (including indexes), and
> details about your hardware and PostgreSQL configuration. See this
> page for details:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions
>
> > As you can see the query doesn't use index.
>
> That means that either the optimizer thinks that the index isn't
> usable for this query (due to type mismatch or some such) or that it
> thinks a plan without the index costs less to run (i.e., it will
> generally run faster). You haven't told us enough to know whether
> that is actually true, much less how to allow PostgreSQL to develop
> more accurate costing estimates in your environment if it's currently
> wrong about this.
>
> -Kevin
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rauan Maemirov <rauan(at)maemirov(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Problems with FTS
Date: 2011-01-14 19:03:47
Message-ID: AANLkTinAKawNNfs8VKpqrCJzyfn=EQQh=6XqcF4iy4ur@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Jan 11, 2011 at 3:16 AM, Rauan Maemirov <rauan(at)maemirov(dot)com> wrote:
> Hi, Kevin.
> Sorry for long delay.
> EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
> WHERE (v.active) AND (v.fts @@
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery and
> v.id <> 500563 )
> ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
> 1) DESC, v.views DESC
> LIMIT 6
> "Limit  (cost=103975.50..103975.52 rows=6 width=280) (actual
> time=2893.193..2893.199 rows=6 loops=1)"
> "  ->  Sort  (cost=103975.50..104206.07 rows=92228 width=280) (actual
> time=2893.189..2893.193 rows=6 loops=1)"
> "        Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( (
> ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
> ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views"
> "        Sort Method:  top-N heapsort  Memory: 25kB"
> "        ->  Seq Scan on video v  (cost=0.00..102322.34 rows=92228
> width=280) (actual time=0.100..2846.639 rows=54509 loops=1)"
> "              Filter: (active AND (fts @@ '( ( ( ( ( ''dexter'':A |
> ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
> ''сезон'':A ) | ''серия'':A'::tsquery) AND (id <> 500563))"
> "Total runtime: 2893.264 ms"
> Table scheme:
> CREATE TABLE video
> (
>   id bigserial NOT NULL,
>   hash character varying(12),
>   account_id bigint NOT NULL,
>   category_id smallint NOT NULL,
>   converted boolean NOT NULL DEFAULT false,
>   active boolean NOT NULL DEFAULT true,
>   title character varying(255),
>   description text,
>   tags character varying(1000),
>   authorized boolean NOT NULL DEFAULT false,
>   adult boolean NOT NULL DEFAULT false,
>   views bigint DEFAULT 0,
>   rating real NOT NULL DEFAULT 0,
>   screen smallint DEFAULT 2,
>   duration smallint,
>   "type" smallint DEFAULT 0,
>   mp4 smallint NOT NULL DEFAULT 0,
>   size bigint,
>   size_high bigint DEFAULT 0,
>   source character varying(255),
>   storage_id smallint NOT NULL DEFAULT 1,
>   rule_watching smallint,
>   rule_commenting smallint,
>   count_comments integer NOT NULL DEFAULT 0,
>   count_likes integer NOT NULL DEFAULT 0,
>   count_faves integer NOT NULL DEFAULT 0,
>   fts tsvector,
>   modified timestamp without time zone NOT NULL DEFAULT now(),
>   created timestamp without time zone DEFAULT now(),
>   CONSTRAINT video_pkey PRIMARY KEY (id),
>   CONSTRAINT video_hash_key UNIQUE (hash)
> )
> WITH (
>   OIDS=FALSE
> );
> Indexes:
> CREATE INDEX idx_video_account_id  ON video  USING btree  (account_id);
> CREATE INDEX idx_video_created  ON video  USING btree  (created);
> CREATE INDEX idx_video_fts  ON video  USING gin  (fts);
> CREATE INDEX idx_video_hash  ON video  USING hash  (hash);
> (here I tried both gist and gin indexes)
> I have 32Gb ram and 2 core quad E5520, 2.27GHz (8Mb cache).
> Pgsql conf:
> max_connections = 200
> shared_buffers = 7680MB
> work_mem = 128MB
> maintenance_work_mem = 1GB
> effective_cache_size = 22GB
> default_statistics_target = 100
> Anything else?

For returning that many rows, an index scan might actually be slower.
Maybe it's worth testing. Try:

SET enable_seqscan=off;
EXPLAIN ANALYZE ...

and see what you get. If it's slower, well, then be happy it didn't
use the index (maybe the question is... what index should you have
instead?). If it's faster, post the results...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Rauan Maemirov <rauan(at)maemirov(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Problems with FTS
Date: 2011-11-21 05:53:47
Message-ID: CAFw_bqOdQqE0fUVntCX-aV+oZnzEgpmGqce=ODktJkNktFEEBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

The problem has returned back, and here's the results, as you've said it's
faster now:

SET enable_seqscan=off;
EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
WHERE (v.active) AND (v.fts @@
'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery
and v.id <> 500563 )
ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
1) DESC, v.views DESC
LIMIT 6

Limit (cost=219631.83..219631.85 rows=6 width=287) (actual
time=1850.567..1850.570 rows=6 loops=1)
-> Sort (cost=219631.83..220059.05 rows=170886 width=287) (actual
time=1850.565..1850.566 rows=6 loops=1)
Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( (
( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views
Sort Method: top-N heapsort Memory: 26kB
-> Bitmap Heap Scan on video v (cost=41180.92..216568.73
rows=170886 width=287) (actual time=214.842..1778.830 rows=103087 loops=1)
Recheck Cond: (fts @@ '( ( ( ( ( ''dexter'':A | ''season'':A
) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) |
''серия'':A'::tsquery)
Filter: (active AND (id <> 500563))
-> Bitmap Index Scan on idx_video_fts (cost=0.00..41138.20
rows=218543 width=0) (actual time=170.206..170.206 rows=171945 loops=1)
Index Cond: (fts @@ '( ( ( ( ( ''dexter'':A |
''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
''сезон'':A ) | ''серия'':A'::tsquery)
Total runtime: 1850.632 ms

Should I use this instead?

2011/1/15 Robert Haas <robertmhaas(at)gmail(dot)com>

> On Tue, Jan 11, 2011 at 3:16 AM, Rauan Maemirov <rauan(at)maemirov(dot)com>
> wrote:
> > Hi, Kevin.
> > Sorry for long delay.
> > EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
> > WHERE (v.active) AND (v.fts @@
> > 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery
> and
> > v.id <> 500563 )
> > ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
> >
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
> > 1) DESC, v.views DESC
> > LIMIT 6
> > "Limit (cost=103975.50..103975.52 rows=6 width=280) (actual
> > time=2893.193..2893.199 rows=6 loops=1)"
> > " -> Sort (cost=103975.50..104206.07 rows=92228 width=280) (actual
> > time=2893.189..2893.193 rows=6 loops=1)"
> > " Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts,
> '( (
> > ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
> > ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)),
> views"
> > " Sort Method: top-N heapsort Memory: 25kB"
> > " -> Seq Scan on video v (cost=0.00..102322.34 rows=92228
> > width=280) (actual time=0.100..2846.639 rows=54509 loops=1)"
> > " Filter: (active AND (fts @@ '( ( ( ( ( ''dexter'':A |
> > ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
> > ''сезон'':A ) | ''серия'':A'::tsquery) AND (id <> 500563))"
> > "Total runtime: 2893.264 ms"
> > Table scheme:
> > CREATE TABLE video
> > (
> > id bigserial NOT NULL,
> > hash character varying(12),
> > account_id bigint NOT NULL,
> > category_id smallint NOT NULL,
> > converted boolean NOT NULL DEFAULT false,
> > active boolean NOT NULL DEFAULT true,
> > title character varying(255),
> > description text,
> > tags character varying(1000),
> > authorized boolean NOT NULL DEFAULT false,
> > adult boolean NOT NULL DEFAULT false,
> > views bigint DEFAULT 0,
> > rating real NOT NULL DEFAULT 0,
> > screen smallint DEFAULT 2,
> > duration smallint,
> > "type" smallint DEFAULT 0,
> > mp4 smallint NOT NULL DEFAULT 0,
> > size bigint,
> > size_high bigint DEFAULT 0,
> > source character varying(255),
> > storage_id smallint NOT NULL DEFAULT 1,
> > rule_watching smallint,
> > rule_commenting smallint,
> > count_comments integer NOT NULL DEFAULT 0,
> > count_likes integer NOT NULL DEFAULT 0,
> > count_faves integer NOT NULL DEFAULT 0,
> > fts tsvector,
> > modified timestamp without time zone NOT NULL DEFAULT now(),
> > created timestamp without time zone DEFAULT now(),
> > CONSTRAINT video_pkey PRIMARY KEY (id),
> > CONSTRAINT video_hash_key UNIQUE (hash)
> > )
> > WITH (
> > OIDS=FALSE
> > );
> > Indexes:
> > CREATE INDEX idx_video_account_id ON video USING btree (account_id);
> > CREATE INDEX idx_video_created ON video USING btree (created);
> > CREATE INDEX idx_video_fts ON video USING gin (fts);
> > CREATE INDEX idx_video_hash ON video USING hash (hash);
> > (here I tried both gist and gin indexes)
> > I have 32Gb ram and 2 core quad E5520, 2.27GHz (8Mb cache).
> > Pgsql conf:
> > max_connections = 200
> > shared_buffers = 7680MB
> > work_mem = 128MB
> > maintenance_work_mem = 1GB
> > effective_cache_size = 22GB
> > default_statistics_target = 100
> > Anything else?
>
> For returning that many rows, an index scan might actually be slower.
> Maybe it's worth testing. Try:
>
> SET enable_seqscan=off;
> EXPLAIN ANALYZE ...
>
> and see what you get. If it's slower, well, then be happy it didn't
> use the index (maybe the question is... what index should you have
> instead?). If it's faster, post the results...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rauan Maemirov <rauan(at)maemirov(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Problems with FTS
Date: 2011-11-30 20:58:28
Message-ID: CA+Tgmoaqy-O9OWYnO-Xc0ei_mytB6C2rMAarjwuM+581TGRaLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Nov 21, 2011 at 12:53 AM, Rauan Maemirov <rauan(at)maemirov(dot)com> wrote:
> The problem has returned back, and here's the results, as you've said it's
> faster now:
>
> SET enable_seqscan=off;
> EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
> WHERE (v.active) AND (v.fts @@
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery and
> v.id <> 500563 )
> ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts,
> 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery),
> 1) DESC, v.views DESC
> LIMIT 6
>
> Limit  (cost=219631.83..219631.85 rows=6 width=287) (actual
> time=1850.567..1850.570 rows=6 loops=1)
>   ->  Sort  (cost=219631.83..220059.05 rows=170886 width=287) (actual
> time=1850.565..1850.566 rows=6 loops=1)
>         Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( (
> ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) |
> ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views
>         Sort Method:  top-N heapsort  Memory: 26kB
>         ->  Bitmap Heap Scan on video v  (cost=41180.92..216568.73
> rows=170886 width=287) (actual time=214.842..1778.830 rows=103087 loops=1)
>               Recheck Cond: (fts @@ '( ( ( ( ( ''dexter'':A | ''season'':A )
> | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) |
> ''серия'':A'::tsquery)
>               Filter: (active AND (id <> 500563))
>               ->  Bitmap Index Scan on idx_video_fts  (cost=0.00..41138.20
> rows=218543 width=0) (actual time=170.206..170.206 rows=171945 loops=1)
>                     Index Cond: (fts @@ '( ( ( ( ( ''dexter'':A |
> ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) |
> ''сезон'':A ) | ''серия'':A'::tsquery)
> Total runtime: 1850.632 ms
>
>
> Should I use this instead?

Can you also provide EXPLAIN ANALYZE output for the query with
enable_seqscan=on?

The row-count estimates look reasonably accurate, so there's some
other problem here. What do you have random_page_cost, seq_page_cost,
and effective_cache_size set to? You might try "SET
random_page_cost=2" or even "SET random_page_cost=0.5; SET
seq_page_cost=0.3" and see if those settings help.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rauan Maemirov <rauan(at)maemirov(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Problems with FTS
Date: 2011-12-01 06:11:40
Message-ID: 4ED71A9C.5030603@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 2011-11-30 21:58, Robert Haas wrote:
> The row-count estimates look reasonably accurate, so there's some
> other problem here. What do you have random_page_cost, seq_page_cost,
> and effective_cache_size set to? You might try "SET
> random_page_cost=2" or even "SET random_page_cost=0.5; SET
> seq_page_cost=0.3" and see if those settings help
I may be seing ghosts here, since I've encountered
the same problem. But the Query-planner does not
take toast into account, so a Sequential Scan + filter
only cost what it takes to scan the main table, but fts-fields
are typically large enough to be toasted so the cost should
be main+toast (amount of pages) + filtering cost.

I posted about it yesterday:

http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php

If above problem is on <9.1 a patch to proper account of gin-estimates
have been added to 9.1 which also may benefit the planning:
http://www.postgresql.org/docs/9.1/static/release-9-1.html

Improve GIN index scan cost estimation (Teodor Sigaev)

Jesper
--
Jesper