Re: select statement against pg_stats returns inconsistent data

Lists: pgsql-generalpgsql-hackers
From: Shelby Cain <alyandon(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: select statement against pg_stats returns inconsistent data
Date: 2004-02-24 19:58:56
Message-ID: 20040224195856.9237.qmail@web41602.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

The select statements return different data for
most_commons_vals depending on whether n_distinct is
included in the select clause or not.

I only seem to get the behavior below against int8
columns - but I haven't interated through every
conceivable data type either.

Is this expected behavior or perhaps a bug?

Regards,

Shelby Cain

=========================================================

c1scain=# select version();
version

---------------------------------------------------------------------------------------
PostgreSQL 7.4.1 on i686-pc-cygwin, compiled by GCC
gcc (GCC) 3.3.1 (cygming special)
(1 row)

c1scain=# create table test_table (lastname
varchar(20), firstname varchar(20), userid int8,
testid int8);
CREATE TABLE
c1scain=# insert into test_table values ('cain',
'shelby', random()*10000, random()*10000);
INSERT 13015123 1
c1scain=# insert into test_table values ('cain',
'shelby', random()*10000, random()*10000);
INSERT 13015124 1
c1scain=# insert into test_table values ('cain',
'shelby', random()*10000, random()*10000);
INSERT 13015125 1
c1scain=# insert into test_table values ('cain',
'shelby', random()*10000, random()*10000);
INSERT 13015126 1
c1scain=# insert into test_table values ('cain',
'shelby', random()*10000, random()*10000);
INSERT 13015127 1
c1scain=# insert into test_table (select * from
test_table);
INSERT 0 5
c1scain=# insert into test_table (select * from
test_table);
INSERT 0 10
c1scain=# insert into test_table (select * from
test_table);
INSERT 0 20
c1scain=# insert into test_table (select * from
test_table);
INSERT 0 40
c1scain=# analyze test_table;
ANALYZE
c1scain=# select distinct userid from test_table;
userid
--------
211
2641
4333
7642
8053
(5 rows)
c1scain=# select distinct testid from test_table;
testid
--------
73
834
1399
2315
4511
(5 rows)
c1scain=# select tablename, attname, most_common_vals
from pg_stats where tablename = 'test_table';
tablename | attname | most_common_vals
------------+-----------+---------------------------
test_table | lastname | {cain}
test_table | firstname | {shelby}
test_table | userid | {211,2641,4333,7642,8053}
test_table | testid | {73,834,1399,2315,4511}
(4 rows)
c1scain=# select tablename, attname, n_distinct,
most_common_vals from pg_stats where tablename =
'test_table';
tablename | attname | n_distinct |
most_common_vals
------------+-----------+------------+------------------------------------------------------
test_table | lastname | 1 | {cain}
test_table | firstname | 1 | {shelby}
test_table | userid | 5 |
{211,18610093293568,32822140076032,34587371634688,0}
test_table | testid | 5 |
{73,6008659247104,9942849290240,19374597472256,0}

__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shelby Cain <alyandon(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: select statement against pg_stats returns inconsistent data
Date: 2004-02-24 23:28:36
Message-ID: 28810.1077665316@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Shelby Cain <alyandon(at)yahoo(dot)com> writes:
> The select statements return different data for
> most_commons_vals depending on whether n_distinct is
> included in the select clause or not.
> I only seem to get the behavior below against int8
> columns - but I haven't interated through every
> conceivable data type either.

Hoo, I'm surprised no one noticed this during 7.4 development/testing.
The problem applies for any datatype that requires double alignment,
which includes int8, float8, and timestamp as well as most of the
geometric types. pg_statistic is declared as using type "anyarray",
and this type really needs to be marked as requiring double alignment
so that arrays of double-aligned datatypes will come out correctly.

The correct source fix is a one-line change in pg_type.h, but this will
not propagate into existing databases without an initdb. It looks like
what you'd need to do to fix an existing database is

-- clear out broken data in pg_statistic
DELETE FROM pg_statistic;
-- this should update 1 row:
UPDATE pg_type SET typalign = 'd' WHERE oid = 2277;
-- this should update 6 rows:
UPDATE pg_attribute SET attalign = 'd' WHERE atttypid = 2277;
-- might be a good idea to start a fresh backend at this point
-- repopulate pg_statistic
ANALYZE;

Ugh :-(

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Shelby Cain <alyandon(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: select statement against pg_stats returns inconsistent
Date: 2004-02-25 00:38:31
Message-ID: 403BEE87.20703@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Hoo, I'm surprised no one noticed this during 7.4 development/testing.
> The problem applies for any datatype that requires double alignment,
> which includes int8, float8, and timestamp as well as most of the
> geometric types. pg_statistic is declared as using type "anyarray",
> and this type really needs to be marked as requiring double alignment
> so that arrays of double-aligned datatypes will come out correctly.

anyarray has been defined this way since 7.3 -- any concerns there? I
see that back then pg_statistic used text[] instead of anyarray, so
perhaps not.

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Shelby Cain <alyandon(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: select statement against pg_stats returns inconsistent data
Date: 2004-02-25 03:35:04
Message-ID: 106.1077680104@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> anyarray has been defined this way since 7.3 -- any concerns there?

I don't think so --- we weren't trying to use it as an actual column
datatype back then.

7.4 has a problem though :-( ... this is one of the "damn I wish we'd
caught that before release" ones, since it can't easily be fixed without
initdb. Reminds me that I need to get to work on making pg_upgrade
viable again.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Shelby Cain <alyandon(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] select statement against pg_stats returns
Date: 2004-02-25 04:07:20
Message-ID: 403C1F78.9070308@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> I don't think so --- we weren't trying to use it as an actual column
> datatype back then.
>
> 7.4 has a problem though :-( ... this is one of the "damn I wish we'd
> caught that before release" ones, since it can't easily be fixed without
> initdb. Reminds me that I need to get to work on making pg_upgrade
> viable again.

Has anyone given any thought as to whether dumping and restoring
pg_statistic is worthwhile?

eg. some sort of ALTER TABLE..SET STATISTICS (1.0, 3.3, 'asdf',....)
command?

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Shelby Cain <alyandon(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] select statement against pg_stats returns inconsistent data
Date: 2004-02-25 04:15:26
Message-ID: 328.1077682526@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> Has anyone given any thought as to whether dumping and restoring
> pg_statistic is worthwhile?

Why? You can reconstruct it with a simple "ANALYZE" command. Dumping
and restoring would mean nailing down cross-version assumptions about
what it contains, which doesn't seem real forward-looking...

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Shelby Cain <alyandon(at)yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] select statement against pg_stats returns
Date: 2004-02-25 04:37:11
Message-ID: 403C2677.6050808@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> Why? You can reconstruct it with a simple "ANALYZE" command. Dumping
> and restoring would mean nailing down cross-version assumptions about
> what it contains, which doesn't seem real forward-looking...

I seem to recall that people like that kind of thing so that the dump is
really the current state of the database.

Also, I believe big db's like DB2 and Oracle do such a thing.

I just recall it being discussed some time ago...

Chris


From: "V i s h a l Kashyap (at) [Sai Hertz And Control Systems]" <sank89(at)sancharnet(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: select statement against pg_stats returns inconsistent
Date: 2004-02-25 10:05:11
Message-ID: 403C7357.2080309@sancharnet.in
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Dear Shelby Cain ,

>Is this expected behavior or perhaps a bug?
>
>
For a novice like me can anyone please tell me
1. Will this effect my application developed on PostgreSQL
2. Will my Application break at some point I heavly use the type of
queries defined in the post.

Would be greatfull for any kinda answers.

--
Best Regards,
Vishal Kashyap
Director / Lead Developer,
Sai Hertz And Control Systems Pvt Ltd,
http://saihertz.rediffblogs.com
Jabber IM: vishalkashyap(at)jabber(dot)org
ICQ : 264360076
-----------------------------------------------
You yourself, as much as anybody in the entire
universe, deserve your love and affection.
- Buddha
---------------
I am usually called as Vishal Kashyap
and my Girlfriend calls me Vishal CASH UP.
Because everyone loves me as Vishal Kashyap
and my Girlfriend loves me as CASH.
___
//\\\
( 0_0 )
----------------o0o-----o0o---------------------


From: Shelby Cain <alyandon(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: select statement against pg_stats returns inconsistent data
Date: 2004-02-26 04:18:59
Message-ID: 20040226041859.97252.qmail@web41602.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hoo, I'm surprised no one noticed this during 7.4
> development/testing.
> The problem applies for any datatype that requires
> double alignment,
> which includes int8, float8, and timestamp as well
> as most of the
> geometric types. pg_statistic is declared as using
> type "anyarray",
> and this type really needs to be marked as requiring
> double alignment
> so that arrays of double-aligned datatypes will come
> out correctly.
>
> The correct source fix is a one-line change in
> pg_type.h, but this will
> not propagate into existing databases without an
> initdb. It looks like
> what you'd need to do to fix an existing database is
>
> -- clear out broken data in pg_statistic
> DELETE FROM pg_statistic;
> -- this should update 1 row:
> UPDATE pg_type SET typalign = 'd' WHERE oid = 2277;
> -- this should update 6 rows:
> UPDATE pg_attribute SET attalign = 'd' WHERE
> atttypid = 2277;
> -- might be a good idea to start a fresh backend at
> this point
> -- repopulate pg_statistic
> ANALYZE;
>
> Ugh :-(
>
> regards, tom lane

Works for me. Thanks!

__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Shelby Cain <alyandon(at)yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] select statement against pg_stats returns
Date: 2004-03-02 18:45:21
Message-ID: 200403021845.i22IjLO17409@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Christopher Kings-Lynne wrote:
> > Why? You can reconstruct it with a simple "ANALYZE" command. Dumping
> > and restoring would mean nailing down cross-version assumptions about
> > what it contains, which doesn't seem real forward-looking...
>
> I seem to recall that people like that kind of thing so that the dump is
> really the current state of the database.
>
> Also, I believe big db's like DB2 and Oracle do such a thing.
>
> I just recall it being discussed some time ago...

I have heard of dumping stats so you are sure your production db has the
same stats as your test database, but with ANALYZE so fast, and our
optimizer so good, I don't see a use case for us.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073