Re: gistchoose vs. bloat

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 19:44:59
Message-ID: 23619.1359056699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> I did some experimenting with that. I used the same test case Alexander
> did, with geonames data, and compared unpatched version, the original
> patch, and the attached patch that biases the first "best" tuple found,
> but still sometimes chooses the other equally good ones.

> testname | initsize | finalsize | idx_blks_read | idx_blks_hit
> ----------------+----------+-----------+---------------+--------------
> patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331
> unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647
> unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759
> patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738
> origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373

> I think the conclusion is that all of these patches are effective. The
> 1/10 variant is less effective, as expected, as it's closer in behavior
> to the unpatched behavior than the others. The 1/2 variant seems as good
> as the original patch.

At least on this example, it seems a tad better, if you look at index
size.

> A table full of duplicates isn't very realistic, but overall, I'm
> leaning towards my version of this patch (gistchoose-2.patch). It has
> less potential for causing a regression in existing applications, but is
> just as effective in the original scenario of repeated delete+insert.

+1 for this patch, but I think the comments could use more work. I was
convinced it was wrong on first examination, mainly because it's hard to
follow the underdocumented look_further_on_equal logic. I propose the
attached, which is the same logic with better comments (I also chose to
rename and invert the sense of the state variable, because it seemed
easier to follow this way ... YMMV on that though).

regards, tom lane

Attachment Content-Type Size
gistchoose-3.patch text/x-patch 4.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-01-24 19:49:19 Re: Materialized views WIP patch
Previous Message Jameison Martin 2013-01-24 19:43:59 Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]