Re: Index-only scans for GIST

From: Thom Brown <thom(at)linux(dot)com>
To: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only scans for GIST
Date: 2014-10-27 11:33:50
Message-ID: CAA-aLv5cArNV6PtAfkqpro3prV0Y4_r1L47Yie2KMK_C=5K2Kg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 August 2014 09:05, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:
>
>> 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>:
>>
>> * I'm getting two regression failures with this (opr_sanity and join).
>>
>>>
>>>
>> opr_sanity failure is corrected.
>> But there is remain question with join.
>> I check the latest version of my github repo and there's no fail in join
>> regression test
>> All 145 tests passed.
>> To tell the truth, I don't understand which changes could led to this
>> failure.
>> Could you show me regression diffs?
>>
>
> Sure, here you go. It seems like a change in a plan. At a quick glance it
> seems harmless: the new plan is identical except that the left and right
> side of a join have been reversed. But I don't understand either why this
> patch would change that, so it needs to be investigated.
>
> * The regression test queries that use LIMIT are not guaranteed to always
>>
>>> return the same rows, hence they're not very good regression test cases.
>>> I'd suggest using more restricting WHERE clauses, so that each query only
>>> returns a handful of rows.
>>>
>>
>> Thank you for comment, I rewrote wrong queries. But could you explain why
>> LIMIT queries may return different results? Is it happens because of
>> different query optimization?
>>
>
> Imagine that you have a table with two rows, A and B. If you run a query
> like "SELECT * FROM table LIMIT 1", the system can legally return either
> row A or B, because there's no ORDER BY.
>
> Now, admittedly you have a similar problem even without the LIMIT - the
> system can legally return the rows in either order - but it's less of an
> issue because at least you can quickly see from the diff that the result
> set is in fact the same, the rows are just in different order. You could
> fix that by adding an ORDER BY to all test queries, but we haven't done
> that in the regression suite because then we would not have any test
> coverage for cases where you don't have an ORDER BY. As a compromise, test
> cases are usually written without an ORDER BY, but if e.g. the buildfarm
> starts failing because of differences in the result set order across
> platforms, then we add an ORDER BY to make it stable.
>
> * I think it's leaking memory, in GIST scan context. I tested this with a
>>
>>> variant of the regression tests:
>>>
>>> insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
>>> 0.05*i)),
>>> point(0.05*i, 0.05*i) FROM generate_series(0,
>>> 10000000) as i;
>>> CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);
>>>
>>> set enable_seqscan=off;
>>> set enable_bitmapscan=off;
>>>
>>> explain analyze select p from gist_tbl where p <@ box(point(0,0),
>>> point(9999999,9999999)) and length(p::text) < 10;
>>>
>>> while the final query runs, 'top' shows constantly increasing memory
>>> usage.
>>>
>>
>> I don't think it's memory leak. After some increasing, memory using remain
>> the same. It works similar without using indexonlyscan.
>>
>
> No, it's definitely a leak caused by the patch. Test with the attached
> patch, which prints out to the server log the amount of memory used by the
> GiST scan memory context every 10000 rows. It clearly shows increasing
> memory usage all the way to the end of the query.
>
> It's cleaned up at the end of the query, but that's not good enough
> because for a large query you might accumulate gigabytes of leaked memory
> until the query has finished. If you (manually) apply the same patch to git
> master, you'll see that the memory usage stays consistent and small.
>

Hi Anastasia,

Do you have time to address the issues brought up in Heikki's review? It
would be good if we could your work into PostgreSQL 9.5.

Thanks

Thom

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2014-10-27 11:37:17 Re: KNN searches support for SP-GiST [GSOC'14]
Previous Message Stephen Frost 2014-10-27 11:27:38 Re: Directory/File Access Permissions for COPY and Generic File Access Functions