bug in join?

Lists: pgsql-hackers
From: Laurette Cisneros <laurette(at)nextbus(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: bug in join?
Date: 2002-01-02 21:40:32
Message-ID: Pine.LNX.4.33.0201021337310.16385-100000@visor.corp.nextbus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


We are running the 7.2b4 beta server.

This join worked last week and today it gets and error:

select * from b, d
where b.address = d.address;

It now fails with the following error:
ERROR: join_selectivity: bad value -0.121693

all that is in the pgsql.log file is the same error.

I am working on a reproduction I can give you, but in the mean time I was
wondering if this is a current problem with the server or me?

Thanks,

--
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Laurette Cisneros <laurette(at)nextbus(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bug in join?
Date: 2002-01-02 22:02:18
Message-ID: 7117.1010008938@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Laurette Cisneros <laurette(at)nextbus(dot)com> writes:
> This join worked last week and today it gets and error:
> select * from b, d
> where b.address = d.address;
> It now fails with the following error:
> ERROR: join_selectivity: bad value -0.121693

Probably what has changed is the pg_statistic data (VACUUM ANALYZE
results). Please send the results of

select * from pg_stats where tablename = 'b';
select * from pg_stats where tablename = 'd';

regards, tom lane


From: Laurette Cisneros <laurette(at)nextbus(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug in join?
Date: 2002-01-02 22:25:35
Message-ID: Pine.LNX.4.33.0201021414390.17257-100000@visor.corp.nextbus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Yeah, the culprit appears to be vacuum analyze (vacuum alone doesn't do it).

The problem is that I fixed the original database by dropping and
recreating the tables populating them with backed up data. And, now it
won't recreate (the values in pg_stats for them is lost).

Ugh.

I will keep trying to recreate it for you.

L.
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette(at)nextbus(dot)com> writes:
> > This join worked last week and today it gets and error:
> > select * from b, d
> > where b.address = d.address;
> > It now fails with the following error:
> > ERROR: join_selectivity: bad value -0.121693
>
> Probably what has changed is the pg_statistic data (VACUUM ANALYZE
> results). Please send the results of
>
> select * from pg_stats where tablename = 'b';
> select * from pg_stats where tablename = 'd';
>
> regards, tom lane
>

--
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Laurette Cisneros <laurette(at)nextbus(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bug in join?
Date: 2002-01-02 22:29:56
Message-ID: 7726.1010010596@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Laurette Cisneros <laurette(at)nextbus(dot)com> writes:
> I will keep trying to recreate it for you.

You could just try

analyze b;
analyze d;
explain select * from b,d where b.address = d.address;

and repeat until you see the error from EXPLAIN. Since ANALYZE takes
a random sampling these days, successive loops will in fact produce
slightly different results, and you may be able to recreate the
erroneous state eventually.

The math in eqjoinsel() is not entirely trivial, but I thought I had
convinced myself it was okay. I need to see a failing example to
figure out what's wrong with it.

regards, tom lane


From: Laurette Cisneros <laurette(at)nextbus(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug in join?
Date: 2002-01-02 22:31:45
Message-ID: Pine.LNX.4.33.0201021431080.17257-100000@visor.corp.nextbus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Will keep trying. I also noticed that several triggers were dropped when I
dropped the table. I need to add these back and see if they make a
difference.

More as I have it...

L.
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette(at)nextbus(dot)com> writes:
> > I will keep trying to recreate it for you.
>
> You could just try
>
> analyze b;
> analyze d;
> explain select * from b,d where b.address = d.address;
>
> and repeat until you see the error from EXPLAIN. Since ANALYZE takes
> a random sampling these days, successive loops will in fact produce
> slightly different results, and you may be able to recreate the
> erroneous state eventually.
>
> The math in eqjoinsel() is not entirely trivial, but I thought I had
> convinced myself it was okay. I need to see a failing example to
> figure out what's wrong with it.
>
> regards, tom lane
>

--
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere


From: Laurette Cisneros <laurette(at)nextbus(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug in join?
Date: 2002-01-02 22:32:52
Message-ID: Pine.LNX.4.33.0201021432330.17882-100000@visor.corp.nextbus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

OOps, never mind about the triggers. But, I will keep trying to induce the
problem for you.

L.
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette(at)nextbus(dot)com> writes:
> > I will keep trying to recreate it for you.
>
> You could just try
>
> analyze b;
> analyze d;
> explain select * from b,d where b.address = d.address;
>
> and repeat until you see the error from EXPLAIN. Since ANALYZE takes
> a random sampling these days, successive loops will in fact produce
> slightly different results, and you may be able to recreate the
> erroneous state eventually.
>
> The math in eqjoinsel() is not entirely trivial, but I thought I had
> convinced myself it was okay. I need to see a failing example to
> figure out what's wrong with it.
>
> regards, tom lane
>

--
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Laurette Cisneros <laurette(at)nextbus(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bug in join?
Date: 2002-01-02 22:36:51
Message-ID: 7787.1010011011@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Laurette Cisneros <laurette(at)nextbus(dot)com> writes:
> Will keep trying. I also noticed that several triggers were dropped when I
> dropped the table. I need to add these back and see if they make a
> difference.

No, join_selectivity isn't going to care about triggers. What's failing
is the estimation of the fraction of rows that will match on address
between the two tables (join_selectivity is rejecting the result as
obviously bogus, which it is). That doesn't depend on anything except
the ANALYZE statistics.

regards, tom lane


From: Laurette Cisneros <laurette(at)nextbus(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug in join?
Date: 2002-01-02 23:32:43
Message-ID: Pine.LNX.4.33.0201021521210.18529-100000@visor.corp.nextbus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

OK, I reproduced it again. I had restored the data for the tables from the
wrong backup. I restored the two tables from last night's backup (BTW, we use
pg_dump -a -O -Fc...) and ran vacuum analyze and it reproduces (each and
every time).

We've turned off vacuum analyze (we do it every night *after* the backup)
for now. We would love for this to get fixed asap (of course ;.)

Here's the info. you asked for:

select * from pg_stats where tablesname ='b':

tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
-----------+--------------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+------------------+-------------
b | rev | 0 | 4 | -0.142857 | {"0","1001","1002"} | {"0.333333","0.333333","0.333333"}
| | 1
b | b_tag | 0 | 7 | -0.333333 | {"S001","S002","S003","S004","S005","VT1","VT2"} | {"0.142857","0.142857","0.142857","0.142857","0.142857","0.142857","0.142857"} | | 0.454545
b | input_tag | 0 | 16 | 1 | {"AirLinkInput"} | {"1"} | | 1
b | address | 0 | 19 | -0.333333 | {"166.128.052.237","166.128.053.084","166.128.054.017","166.128.054.018","166.128.057.250","166.128.058.202","166.128.058.203"} | {"0.142857","0.142857","0.142857","0.142857","0.142857","0.142857","0.142857"} | | 0.454545
b | b_distance | 0 | 4 | 1 | {"200"} | {"1"} | | 1
b | b_time | 0 | 4 | 1 | {"90"} | {"1"} | | 1
(6 rows)

select * from pg_status where tablename = 'd';
tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
-----------+-------------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
d | rev | 0 | 4 | 3 | {"0","1002","1001"} | {"0.45679","0.45679","0.0864198"} | | 0.912263
d | address | 0 | 19 | -0.45679 | {"166.128.052.237","166.128.053.084","166.128.054.017","166.128.054.018","166.128.057.250","166.128.058.202","166.128.058.203"} | {"0.037037","0.037037","0.037037","0.037037","0.037037","0.037037","0.037037"} | {"166.128.169.189","166.128.169.199","166.128.169.236","166.128.171.002","166.129.108.003","166.132.126.184","166.133.174.093","166.133.174.098","166.204.012.171","166.204.045.135","166.204.066.001"} | 0.451197
d | d_passwd | 0 | 9 | 1 | {"Czech"} | {"1"} | | 1
d | d_port | 0 | 4 | 1 | {"22335"} | {"1"} |
| 1
d | d_type | 0 | 9 | 2 | {"signs","buses"} | {"0.740741","0.259259"} | | 0.912263
d | d_status | 0 | 4 | 1 | {"0"} | {"1"} | | 1
(6 rows)

Thanks!

Laurette
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette(at)nextbus(dot)com> writes:
> > This join worked last week and today it gets and error:
> > select * from b, d
> > where b.address = d.address;
> > It now fails with the following error:
> > ERROR: join_selectivity: bad value -0.121693
>
> Probably what has changed is the pg_statistic data (VACUUM ANALYZE
> results). Please send the results of
>
> select * from pg_stats where tablename = 'b';
> select * from pg_stats where tablename = 'd';
>
> regards, tom lane
>

--
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere