Re: Qual evaluation cost estimates for GIN indexes

From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgreSQL(dot)org, jesper(at)krogh(dot)cc
Subject: Re: Qual evaluation cost estimates for GIN indexes
Date: 2012-02-20 14:24:11
Message-ID: 20120220142411.GA9564@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 20, 2012 at 10:18:31AM +0100, Marc Mamin wrote:
> > I looked into the complaint here of poor estimation for GIN
> indexscans:
> > http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php
> > At first glance it sounds like a mistake in selectivity estimation,
> > but it isn't: the rowcount estimates are pretty nearly dead on.
> > The problem is in the planner's estimate of the cost of executing the
> > @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but
> > actually it's a good deal more expensive than that. Some
> > experimentation suggests that @@ might be about 500 times as expensive
> > as a simple integer comparison. I don't propose pushing its procost
> > up that much, but surely at least 10 would be appropriate, maybe even
> > 100.
> >
> > However ... if you just alter pg_proc.procost in Marc's example, the
> > planner *still* picks a seqscan, even though its estimate of the
> > seqscan
> > cost surely does go up. The reason is that its estimate of the GIN
> > indexscan cost goes up just as much, since we charge one qual eval
> cost
> > per returned tuple in gincostestimate. It is easy to tell from the
> > actual runtimes that that is not what's happening in a GIN indexscan;
> > we are not re-executing the @@ operator for every tuple. But the
> > planner's cost model doesn't know that.
>
> Hello,
>
> many thanks for your feedback.
>
> I've repeated my test with a table using plain storage, which halved the
> query time.
> This confirms that detoasting is the major issue for cost estimation,
> but even with plain storage the table scan remains about 30% slower
> compared to the index scan.
>

Hi Marc,

Do you happen to know in which function, the extra time for the toast
storage is spent -- zlib compression? I saw a mention of the LZ4 compression
algorithm that is BSD licensed as a Google summer of code project:

http://code.google.com/p/lz4/

that compresses at almost 7X than zlib (-1) and decompresses at 6X.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Mamin 2012-02-20 14:32:11 Re: Qual evaluation cost estimates for GIN indexes
Previous Message Andrew Dunstan 2012-02-20 14:01:08 Re: array_to_json re-encodes ARRAY of json type