Re: Fix picksplit with nan values

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix picksplit with nan values
Date: 2013-11-08 18:38:28
Message-ID: 22995.1383935908@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> I wrote attached patch by following principles:
> 1) NaN coordinates shouldn't crash or hang GiST.
> 2) NaN coordinates should be processed in GiST index scan like in
> sequential scan.
> 3) NaN coordinates shouldn't lead to significant slowdown.

I looked at this patch for awhile. It seems like there's still an awful
lot of places in gistproc.c that are not worrying about NANs, and it's not
clear to me that they don't need to. For instance, despite the changes in
adjustBox(), it'll still be possible to have boxes with NAN boundaries if
all the contained values are all-NAN boxes. It doesn't seem like
gist_box_penalty will behave very sanely for that; it'll return a NAN
penalty which seems unhelpful. The static box_penalty function doesn't
work sanely for NANs either, and if it can return NAN then you also have
to worry about NAN deltas in common_entry_cmp. And isn't it still
possible for the Assert in gist_box_picksplit to fire?

> That could be illustrated on following test-case:

> create table test1 as (select point(random(), random()) as p from
> generate_series(1,1000000));
> create index test1_idx on test1 using gist(p);
> create table temp as (select * from test1);
> insert into temp (select point('nan'::float8, 'nan'::float8) from
> generate_series(1,1000));
> insert into temp (select point('nan'::float8, random()) from
> generate_series(1,1000));
> insert into temp (select point(random(), 'nan'::float8) from
> generate_series(1,1000));
> create table test2 as (select * from temp order by random());
> create index test2_idx on test2 using gist(p);
> drop table temp;

I think this test case is unlikely to generate any all-NAN index entry
boxes, because almost certainly the initial entries will be non-NANs, and
you've got it set up to keep incoming NANs from adjusting the boundaries
of an existing box. You'd get better code coverage if you started by
inserting some NANs into an empty index.

Also, as a stylistic matter, I thought your previous solution of
copying float8_cmp_internal was a better idea than manually inlining it
(sans comments!) in multiple places as this version does.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-11-08 18:43:14 Re: pgsql: Fix blatantly broken record_image_cmp() logic for pass-by-value
Previous Message Joshua D. Drake 2013-11-08 18:33:15 Re: backup.sgml-cmd-v003.patch