Quick Links

Re: Index-only scans for GIST

From:	Thom Brown <thom(at)linux(dot)com>
To:	Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Index-only scans for GIST
Date:	2014-10-27 11:33:50
Message-ID:	CAA-aLv5cArNV6PtAfkqpro3prV0Y4_r1L47Yie2KMK_C=5K2Kg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 18 August 2014 09:05, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:
>
>> 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>:
>>
>> * I'm getting two regression failures with this (opr_sanity and join).
>>
>>>
>>>
>> opr_sanity failure is corrected.
>> But there is remain question with join.
>> I check the latest version of my github repo and there's no fail in join
>> regression test
>> All 145 tests passed.
>> To tell the truth, I don't understand which changes could led to this
>> failure.
>> Could you show me regression diffs?
>>
>
> Sure, here you go. It seems like a change in a plan. At a quick glance it
> seems harmless: the new plan is identical except that the left and right
> side of a join have been reversed. But I don't understand either why this
> patch would change that, so it needs to be investigated.
>
> * The regression test queries that use LIMIT are not guaranteed to always
>>
>>> return the same rows, hence they're not very good regression test cases.
>>> I'd suggest using more restricting WHERE clauses, so that each query only
>>> returns a handful of rows.
>>>
>>
>> Thank you for comment, I rewrote wrong queries. But could you explain why
>> LIMIT queries may return different results? Is it happens because of
>> different query optimization?
>>
>
> Imagine that you have a table with two rows, A and B. If you run a query
> like "SELECT * FROM table LIMIT 1", the system can legally return either
> row A or B, because there's no ORDER BY.
>
> Now, admittedly you have a similar problem even without the LIMIT - the
> system can legally return the rows in either order - but it's less of an
> issue because at least you can quickly see from the diff that the result
> set is in fact the same, the rows are just in different order. You could
> fix that by adding an ORDER BY to all test queries, but we haven't done
> that in the regression suite because then we would not have any test
> coverage for cases where you don't have an ORDER BY. As a compromise, test
> cases are usually written without an ORDER BY, but if e.g. the buildfarm
> starts failing because of differences in the result set order across
> platforms, then we add an ORDER BY to make it stable.
>
> * I think it's leaking memory, in GIST scan context. I tested this with a
>>
>>> variant of the regression tests:
>>>
>>> insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
>>> 0.05*i)),
>>> point(0.05*i, 0.05*i) FROM generate_series(0,
>>> 10000000) as i;
>>> CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);
>>>
>>> set enable_seqscan=off;
>>> set enable_bitmapscan=off;
>>>
>>> explain analyze select p from gist_tbl where p <@ box(point(0,0),
>>> point(9999999,9999999)) and length(p::text) < 10;
>>>
>>> while the final query runs, 'top' shows constantly increasing memory
>>> usage.
>>>
>>
>> I don't think it's memory leak. After some increasing, memory using remain
>> the same. It works similar without using indexonlyscan.
>>
>
> No, it's definitely a leak caused by the patch. Test with the attached
> patch, which prints out to the server log the amount of memory used by the
> GiST scan memory context every 10000 rows. It clearly shows increasing
> memory usage all the way to the end of the query.
>
> It's cleaned up at the end of the query, but that's not good enough
> because for a large query you might accumulate gigabytes of leaked memory
> until the query has finished. If you (manually) apply the same patch to git
> master, you'll see that the memory usage stays consistent and small.
>

Hi Anastasia,

Do you have time to address the issues brought up in Heikki's review? It
would be good if we could your work into PostgreSQL 9.5.

Thanks

Thom

In response to

Re: Index-only scans for GIST at 2014-08-18 08:05:25 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thom Brown	2014-10-27 11:37:17	Re: KNN searches support for SP-GiST [GSOC'14]
Previous Message	Stephen Frost	2014-10-27 11:27:38	Re: Directory/File Access Permissions for COPY and Generic File Access Functions