Re: ALTER SET DISTINCT vs. Oracle-like DBMS_STATS

Lists: pgsql-hackers
From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: ALTER SET DISTINCT vs. Oracle-like DBMS_STATS
Date: 2009-07-06 10:28:57
Message-ID: 20090706185545.9DF3.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

A new feature "ALTER TABLE ... ALTER COLUMN ... SET DISTINCT" is
submitted to the next commetfest:
http://archives.postgresql.org/message-id/603c8f070905041913r667b3f32oa068d758ba5f17e1@mail.gmail.com

but I have another approach for the plan stability issues. It might conflict
ALTER SET DISTINCT patch in terms of duplicated functionality, so I want to
discuss them.

It is just similar to Oracle's DBMS_STATS package.
http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_stats.htm
If it were, "ALTER TABLE tablename ALTER COLUMN 3rd-column SET DISITNCT 100"
could be written as:

INSERT INTO dbms_stats.columns(starelid, ataattnum, stadistinct)
VALUES ('tablename'::regclass, 3, 100);

Of course ALTER SET DISTINCT is easy-to-use, so it could be an alias for
the above INSERT command.

The "DBMS_STATS for Postgres" is based on new statstics hooks in 8.4 --
get_relation_info_hook, get_attavgwidth_hook, get_relation_stats_hook
and get_index_stats_hook. The module has dbms_stats.relations and
dbms_stats.columns tables and hides pg_class and pg_statistics when enabled.
So, if once you set a value to dbms_stats.columns.stadistinct, the value
hides pg_statistics.stadistinct and planner always uses it for planning.

You can modify statistics of your tables by inserting values directly
to relations and columns tables. Also lock() or unlock() functions
are useful to use a bit customized stats based on existing values.

- TABLE dbms_stats.relations : hide pg_class.relpages, reltuples.
- TABLE dbms_stats.columns : hide pg_statistic.
- FUNCTION dbms_stats.lock() : copy pg_class and pg_statistic to the above tables.
- FUNCTION dbms_stats.unlock(): delete some rows from the above tables.

The module also supports backup-statstics feature.

- TABLE dbms_stats.backup, relations_backup, columns_backup
- FUNCTION dbms_stats.backup() : backup statistics to the above tables.
- FUNCTION dbms_stats.restore() : restore statistics from
- FUNCTION dbms_stats.export() : export statistics to external text file.
- FUNCTION dbms_stats.import() : import statistics from external text file.

If acceptable, I'd like to submit DBMS_STATS for Postgres module
to September commitfest. I'm not sure the feature should be in core,
in contrib, or in pgFoundry... Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: ALTER SET DISTINCT vs. Oracle-like DBMS_STATS
Date: 2009-07-06 15:36:03
Message-ID: 20090706153603.GC5861@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro escribió:

> It is just similar to Oracle's DBMS_STATS package.
> http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_stats.htm
> If it were, "ALTER TABLE tablename ALTER COLUMN 3rd-column SET DISITNCT 100"
> could be written as:
>
> INSERT INTO dbms_stats.columns(starelid, ataattnum, stadistinct)
> VALUES ('tablename'::regclass, 3, 100);
>
> Of course ALTER SET DISTINCT is easy-to-use, so it could be an alias for
> the above INSERT command.

Why wouldn't you implement this through reloptions? (I ask because the
syntax you propose above is awfully similar to what we used for
pg_autovacuum, which ended up being replaced by reloptions)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: ALTER SET DISTINCT vs. Oracle-like DBMS_STATS
Date: 2009-07-07 02:06:24
Message-ID: 20090707105852.946E.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Euler Taveira de Oliveira <euler(at)timbira(dot)com> wrote:

> >> INSERT INTO dbms_stats.columns(starelid, ataattnum, stadistinct)
> >> VALUES ('tablename'::regclass, 3, 100);
> >
> > Why wouldn't you implement this through reloptions?
> >
> Because it is column-based and not table-based? In this case, we need to store
> and array value like {attnum, stadistinct}. If it is not ugly in your POV, +1
> for this approach.

Yes, column-based storage is needed. However, when we drop tables,
dangling stat settings might remain. I want core-support for the module,
for example, "TRIGGER ON DROP TABLE" or some drop-relation-hooks.

There might be another approach that we add pg_attribute.attoptions for
generic column-based options, like pg_class.reloptions.

Which approach is better, or something else?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center