Cause of intermittent rangetypes regression test failures

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Cause of intermittent rangetypes regression test failures
Date: 2011-11-13 20:38:52
Message-ID: 23765.1321216732@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Well, I was overthinking the question of why rangetypes sometimes fails
with

select count(*) from test_range_gist where ir << int4range(100,500);
! ERROR: input range is empty

Turns out that happens whenever auto-analyze has managed to process
test_range_gist before we get to this part of the test. jaguar
is more likely to see this because CLOBBER_CACHE_ALWAYS slows down the
rangetypes code to a really staggering extent, but obviously it can
happen anywhere. If the table has been analyzed, then the
most_common_values array for column ir will consist of
{empty}
which is entirely correct since that value accounts for 16% of the
table. And then, when mcv_selectivity tries to estimate the selectivity
of the << condition, it applies range_before to the empty range along
with the int4range(100,500) value, and range_before spits up.

I think this demonstrates that the current definition of range_before is
broken. It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cause of intermittent rangetypes regression test failures
Date: 2011-11-14 07:32:18
Message-ID: 1321255938.25053.12.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2011-11-13 at 15:38 -0500, Tom Lane wrote:
> If the table has been analyzed, then the
> most_common_values array for column ir will consist of
> {empty}
> which is entirely correct since that value accounts for 16% of the
> table. And then, when mcv_selectivity tries to estimate the selectivity
> of the << condition, it applies range_before to the empty range along
> with the int4range(100,500) value, and range_before spits up.
>
> I think this demonstrates that the current definition of range_before is
> broken. It is not reasonable for it to throw an error on a perfectly
> valid input ... at least, not unless you'd like to mark it VOLATILE so
> that the planner will not risk calling it.
>
> What shall we have it do instead?

We could have it return NULL, I suppose. I was worried that that would
lead to confusion between NULL and the empty range, but it might be
better than marking it VOLATILE.

Thoughts, other ideas?

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cause of intermittent rangetypes regression test failures
Date: 2011-11-14 13:11:16
Message-ID: 24134.1321276276@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Sun, 2011-11-13 at 15:38 -0500, Tom Lane wrote:
>> I think this demonstrates that the current definition of range_before is
>> broken. It is not reasonable for it to throw an error on a perfectly
>> valid input ... at least, not unless you'd like to mark it VOLATILE so
>> that the planner will not risk calling it.
>>
>> What shall we have it do instead?

> We could have it return NULL, I suppose. I was worried that that would
> lead to confusion between NULL and the empty range, but it might be
> better than marking it VOLATILE.

It needs to return FALSE, actually. After further reading I realized
that you have that behavior hard-wired into the range GiST routines,
and it's silly to make the stand-alone versions of the function act
differently.

This doesn't seem terribly unreasonable: we just have to document
that the empty range is neither before nor after any other range.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cause of intermittent rangetypes regression test failures
Date: 2011-11-14 18:06:01
Message-ID: 1321293961.25053.13.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:
> It needs to return FALSE, actually. After further reading I realized
> that you have that behavior hard-wired into the range GiST routines,
> and it's silly to make the stand-alone versions of the function act
> differently.

Good point. That makes sense to me.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cause of intermittent rangetypes regression test failures
Date: 2011-11-14 18:43:20
Message-ID: 18945.1321296200@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:
>> It needs to return FALSE, actually. After further reading I realized
>> that you have that behavior hard-wired into the range GiST routines,
>> and it's silly to make the stand-alone versions of the function act
>> differently.

> Good point. That makes sense to me.

While thinking about this ... would it be sensible for range_lower and
range_upper to return NULL instead of throwing an exception for empty or
infinite ranges? As with these comparison functions, throwing an error
seems like a fairly unpleasant definition to work with in practice.

regards, tom lane


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cause of intermittent rangetypes regression test failures
Date: 2011-11-14 18:54:57
Message-ID: e1b5e16ececf727967d38842df0e011c.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, November 14, 2011 19:43, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
>> On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:
>
> While thinking about this ... would it be sensible for range_lower and
> range_upper to return NULL instead of throwing an exception for empty or
> infinite ranges? As with these comparison functions, throwing an error
> seems like a fairly unpleasant definition to work with in practice.
>

+1

much better, IMHO.

Erik Rijkers