Re: Materialized views WIP patch

Lists: pgsql-committerspgsql-hackers
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Marko Tiikkaja" <pgmail(at)joh(dot)to>,"Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-16 16:48:17
Message-ID: 20130116164817.97290@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane wrote:
> "Kevin Grittner" <kgrittn(at)mail(dot)com> writes:
>> I've been struggling with two areas:
>> - pg_dump sorting for MVs which depend on other MVs
>
> Surely that should fall out automatically given that the
> dependency is properly expressed in pg_depend?
>
> If you mean you're trying to get it to cope with circular
> dependencies between MVs, it might take some work on the pg_dump
> side, but plain ordering shouldn't require new code.

The *definitions* sort properly, but what I'm trying to do is
define them WITH NO DATA and load data after all the COPY
statements for tables. If mva is referenced by mvb, the goal is the
REFRESH mva, build its indexes before running REFRESH for mvb and
building its indexes. To do things in any other order does't seem
to me to leave things after restore in the same state they were in
at the time of the dump.

So I should have been a little more verbose describing the problem:
pg_dump sorting of REFRESH and CREATE INDEX steps for MVs which
depend on other MVs.

Last night I found why my previous attempts had been failing -- I
was trying to build the dependencies at the wrong point in the dump
process, after the sorts had already been done.  Now that I've
spotted that fundamental flaw, I think I can get this out of the
way without too much more fanfare. I kept thinking I had something
wrong in the detail of my approach, while the problem was at a much
higher level.

Where I really need someone to hit me upside the head with a
clue-stick is the code I added to the bottom of RelationBuildDesc()
in relcache.c. The idea is that on first access to an unlogged MV,
to detect that the heap has been replaced by the init fork, set
relisvalid to false, and make the heap look normal again. I
couldn't see any way to do that which wasn't a kludge, and I can't
figure out how to deal with relcache properly in implementing that
kludge. Either a tip about the right way to work the kludge, or a
suggestion for a less kludgy alternative would be welcome.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <kgrittn(at)mail(dot)com>
Cc: "Marko Tiikkaja" <pgmail(at)joh(dot)to>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-16 18:38:57
Message-ID: 29211.1358361537@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

"Kevin Grittner" <kgrittn(at)mail(dot)com> writes:
> Tom Lane wrote:
>> Surely that should fall out automatically given that the
>> dependency is properly expressed in pg_depend?

> The *definitions* sort properly, but what I'm trying to do is
> define them WITH NO DATA and load data after all the COPY
> statements for tables. If mva is referenced by mvb, the goal is the
> REFRESH mva, build its indexes before running REFRESH for mvb and
> building its indexes. To do things in any other order does't seem
> to me to leave things after restore in the same state they were in
> at the time of the dump.

Ah. Can't you treat this using the same pg_dump infrastructure as
for the data for an ordinary table? The dependencies made for the
TableDataInfo object might be a bit different, but after that it
seems like the sort logic ought to be happy.

> Where I really need someone to hit me upside the head with a
> clue-stick is the code I added to the bottom of RelationBuildDesc()
> in relcache.c. The idea is that on first access to an unlogged MV,
> to detect that the heap has been replaced by the init fork, set
> relisvalid to false, and make the heap look normal again.

Hmm. I agree that relcache.c has absolutely no business doing that,
but not sure what else to do instead. Seems like it might be better
done at first touch of the MV in the parser, rewriter, or planner ---
but the fact that I can't immediately decide which of those is right
makes me feel that it's still too squishy.

I'm also wondering about locking issues there. Obviously you don't
want more than one backend trying to rebuild the MV.

Do we really need unlogged MVs in the first iteration? Seems like
that's adding a whole bunch of new issues, when you have quite enough
already without that.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-01-16 19:12:07
Message-ID: 50F6FB87.7040405@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


> Do we really need unlogged MVs in the first iteration? Seems like
> that's adding a whole bunch of new issues, when you have quite enough
> already without that.

While I think there is strong user demand for unlogged MVs, if we can
get MVs without unlogged ones for 9.3, I say go for that. We'll add
unlogged in 9.4.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)mail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-17 12:54:55
Message-ID: CA+TgmoapxXwynTTboA=PAugb7y8=8YfJwNk3Ux9wBkJeuRuA6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Where I really need someone to hit me upside the head with a
>> clue-stick is the code I added to the bottom of RelationBuildDesc()
>> in relcache.c. The idea is that on first access to an unlogged MV,
>> to detect that the heap has been replaced by the init fork, set
>> relisvalid to false, and make the heap look normal again.
>
> Hmm. I agree that relcache.c has absolutely no business doing that,
> but not sure what else to do instead. Seems like it might be better
> done at first touch of the MV in the parser, rewriter, or planner ---
> but the fact that I can't immediately decide which of those is right
> makes me feel that it's still too squishy.

I think we shouldn't be doing that at all. The whole business of
transferring the relation-is-invalid information from the relation to
a pg_class flag seems like a Rube Goldberg device to me. I'm still
not convinced that we should have a relation-is-invalid flag at all,
but can we at least not have two?

It seems perfectly adequate to detect that the MV is invalid when we
actually try to execute a plan - that is, when we first access the
heap or one of its indexes. So the bit can just live in the
file-on-disk, and there's no need to have a second copy of it in
pg_class.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-24 18:07:54
Message-ID: 20130124180754.GB2448@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote:
> On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Where I really need someone to hit me upside the head with a
> >> clue-stick is the code I added to the bottom of RelationBuildDesc()
> >> in relcache.c. The idea is that on first access to an unlogged MV,
> >> to detect that the heap has been replaced by the init fork, set
> >> relisvalid to false, and make the heap look normal again.
> >
> > Hmm. I agree that relcache.c has absolutely no business doing that,
> > but not sure what else to do instead. Seems like it might be better
> > done at first touch of the MV in the parser, rewriter, or planner ---
> > but the fact that I can't immediately decide which of those is right
> > makes me feel that it's still too squishy.
>
> I think we shouldn't be doing that at all. The whole business of
> transferring the relation-is-invalid information from the relation to
> a pg_class flag seems like a Rube Goldberg device to me. I'm still
> not convinced that we should have a relation-is-invalid flag at all,
> but can we at least not have two?
>
> It seems perfectly adequate to detect that the MV is invalid when we
> actually try to execute a plan - that is, when we first access the
> heap or one of its indexes. So the bit can just live in the
> file-on-disk, and there's no need to have a second copy of it in
> pg_class.

Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
genuinely yielded the empty set at last refresh. I agree that there's no
particular need to store that fact in pg_class, and I would much prefer only
storing it one way in any case. A user-visible disadvantage of the current
implementation is that relisvalid remains stale until something opens the rel.
That's fine for the system itself, but it can deceive user-initiated catalog
queries. Imagine a check_postgres action that looks for invalid MVs to
complain about. It couldn't just scan pg_class; it would need to first do
something that opens every MV.

I suggest the following:

1. Let an invalid MV have a zero-length heap. Distinguish a valid, empty MV
by giving it a page with no tuples. This entails VACUUM[1] not truncating
MVs below one page and the refresh operation, where necessary, extending
the relation from zero pages to one.
2. Remove pg_class.relisvalid.
3. Add a bool field to RelationData. The word "valid" is used in that context
to refer to the validity of the structure itself, so perhaps call the new
field rd_scannable. RelationIsFlaggedAsValid() can become a macro;
consider changing its name for consistency with the field name.
4. During relcache build, set the field to "RelationGetNumberBlocks(rel) != 0"
for MVs, fixed "true" for everyone else. Any operation that changes the
field must, and probably would anyway, instigate a relcache invalidation.
5. Expose a database function, say pg_relation_scannable(), for querying the
current state of a relation. This supports user-level monitoring.

Does that seem reasonable? One semantic difference to keep in mind is that
unlogged MVs will be considered invalid on the standby while valid on the
master. That's essentially an accurate report, so I won't mind it.

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery. MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode. When a crash occurs with no refresh in flight,
the latest refresh had been safely synced. When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt. However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here. It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

I'm going to follow this with a review covering a broader range of topics.

Thanks,
nm

[1] For the time being, it's unfortunate to VACUUM materialized views at all;
they only ever bear frozen tuples.


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-16 01:01:10
Message-ID: 1360976470.79081.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> wrote:

> Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
> genuinely yielded the empty set at last refresh.  I agree that there's no
> particular need to store that fact in pg_class, and I would much prefer only
> storing it one way in any case.  A user-visible disadvantage of the current
> implementation is that relisvalid remains stale until something opens the rel.
> That's fine for the system itself, but it can deceive user-initiated catalog
> queries.  Imagine a check_postgres action that looks for invalid MVs to
> complain about.  It couldn't just scan pg_class; it would need to first do
> something that opens every MV.
>
> I suggest the following:
>
> 1. Let an invalid MV have a zero-length heap.  Distinguish a valid, empty MV
>    by giving it a page with no tuples.  This entails VACUUM[1] not truncating
>    MVs below one page and the refresh operation, where necessary, extending
>    the relation from zero pages to one.
> 2. Remove pg_class.relisvalid.
> 3. Add a bool field to RelationData.  The word "valid" is used in that context
>    to refer to the validity of the structure itself, so perhaps call the new
>    field rd_scannable.  RelationIsFlaggedAsValid() can become a macro;
>    consider changing its name for consistency with the field name.
> 4. During relcache build, set the field to "RelationGetNumberBlocks(rel) !=> 0"
>    for MVs, fixed "true" for everyone else.  Any operation that changes the
>    field must, and probably would anyway, instigate a relcache invalidation.
> 5. Expose a database function, say pg_relation_scannable(), for querying the
>    current state of a relation.  This supports user-level monitoring.
>
> Does that seem reasonable?  One semantic difference to keep in mind is that
> unlogged MVs will be considered invalid on the standby while valid on the
> master.  That's essentially an accurate report, so I won't mind it.

Changed to work pretty much as you suggested.

> I'm going to follow this with a review covering a broader range of topics.

Those issues addressed, too.  That includes the most egregious doc
problems you pointed out, but there still needs to be a thorough
review, and I expect to find a few more doc cleanup issues.

There was one minor syntax issue not addressed by Noah, nor much
discussed in general that I didn't want to just unilaterally
choose; but given that nobody seems to care that much I will put
forward a proposal and do it that way tomorrow if nobody objects.
Before this patch tables were the only things subject to
truncation, but now materialized views can also be truncated.  So
far we have been treating TABLE as a noise word in the truncate
command.  I assume we still want to allow tables to be truncated
with or without the word.  The question is what to do about
materialized views, and wheter both can be specified on a single
TRUNCATE statement.  I propose that we allow TABLE or MATERIALIZED
VIEW to be specified, or that part of the statement to be left out.
I propose that either type of object be allowed unless one or the
other is specified and the object to be truncated is not of that
kind.  So you could mix both kinds on one statement, so long as you
didn't specify either kind.

There is one odd aspect to pg_dump, but I think the way it is
behaving is the best way to handle it, although I invite other
opinions.  If you load from pg_dump output, it will try to
populated materialized views which were populated on dump, and
leave the ones which were not scannable because the contents had
not been generated in an empty and unscannable state on restore.
That much seems pretty obvious.  Where it gets  a little tricky is
if mva is generated with data, and mvb is generated based on mva.
Then mva is truncated.  Then you dump.  mvb was populated at the
time of the dump, but its contents can't be regenerated on restore
because mva is not scannable.  As the patch currently stands, you
get an error on the attempt to REFRESH mvb.  I think that's a good
thing, but I'm open to arguments to the contrary.

I don't have any handling in pg_dump for circular references among
materialized views, because I couldn't see how that could happen.
I'm not 100% sure that isn't just a failure of imagination on my
part, though.

The only other comment I know of that hasn't been addressed is
Simon's suggestion that I put in syntax for features which we might
implement in future releases.  I don't want to do that without the
usual community design and bike-shedding process, so syntax is only
implemented for implemented features.

I'm still waiting for final word (and a small patch?) from KaiGai
Kohei for the sepgsql part.

This patch does require an initdb because of a new function.

Unless something else comes up in review or I get feedback to the
contrary I plan to deal with the above-mentioned issues and commit
this within a week or two.

Thanks to Marko Tiikkaja, Robert Haas, Thom Brown, Simon Riggs,
KaiGai Kohei, and Noah Misch for the reviews and suggestions so
far, thanks to Robert for the initial cut at the docs, and big
thanks to Noah for helping me track down an elusive bug.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
matview-v4.patch.gz application/x-gzip 52.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-16 01:24:16
Message-ID: CA+TgmobM04k=OK1rUuEkK4i26_JSsiksmnEx5R4BnhN42=Xoow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> There is one odd aspect to pg_dump, but I think the way it is
> behaving is the best way to handle it, although I invite other
> opinions. If you load from pg_dump output, it will try to
> populated materialized views which were populated on dump, and
> leave the ones which were not scannable because the contents had
> not been generated in an empty and unscannable state on restore.
> That much seems pretty obvious. Where it gets a little tricky is
> if mva is generated with data, and mvb is generated based on mva.
> Then mva is truncated. Then you dump. mvb was populated at the
> time of the dump, but its contents can't be regenerated on restore
> because mva is not scannable. As the patch currently stands, you
> get an error on the attempt to REFRESH mvb. I think that's a good
> thing, but I'm open to arguments to the contrary.

Hmm, anything that means a dump-and-restore can fail seems like a bad
thing to me. There's nothing outrageous about that scenario. It's
arguable what state dump-and-restore should leave the new database in,
but I don't see why it shouldn't work. I predict we'll end up with
unhappy users if we leave it like this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-16 01:57:25
Message-ID: 20130216015725.GJ12030@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote:
> On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> > There is one odd aspect to pg_dump, but I think the way it is
> > behaving is the best way to handle it, although I invite other
> > opinions. If you load from pg_dump output, it will try to
> > populated materialized views which were populated on dump, and
> > leave the ones which were not scannable because the contents had
> > not been generated in an empty and unscannable state on restore.
> > That much seems pretty obvious. Where it gets a little tricky is
> > if mva is generated with data, and mvb is generated based on mva.
> > Then mva is truncated. Then you dump. mvb was populated at the
> > time of the dump, but its contents can't be regenerated on restore
> > because mva is not scannable. As the patch currently stands, you
> > get an error on the attempt to REFRESH mvb. I think that's a good
> > thing, but I'm open to arguments to the contrary.
>
> Hmm, anything that means a dump-and-restore can fail seems like a bad
> thing to me. There's nothing outrageous about that scenario. It's
> arguable what state dump-and-restore should leave the new database in,
> but I don't see why it shouldn't work. I predict we'll end up with
> unhappy users if we leave it like this.

pg_upgrade is going to fail on that pg_restore error. :-(

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-16 17:53:14
Message-ID: 1361037194.53579.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

>> There is one odd aspect to pg_dump, but I think the way it is
>> behaving is the best way to handle it, although I invite other
>> opinions.  If you load from pg_dump output, it will try to
>> populated materialized views which were populated on dump, and
>> leave the ones which were not scannable because the contents had
>> not been generated in an empty and unscannable state on restore.
>> That much seems pretty obvious.  Where it gets  a little tricky is
>> if mva is generated with data, and mvb is generated based on mva.
>> Then mva is truncated.  Then you dump.  mvb was populated at the
>> time of the dump, but its contents can't be regenerated on restore
>> because mva is not scannable.  As the patch currently stands, you
>> get an error on the attempt to REFRESH mvb.  I think that's a good
>> thing, but I'm open to arguments to the contrary.
>
> Hmm, anything that means a dump-and-restore can fail seems like a bad
> thing to me.  There's nothing outrageous about that scenario.  It's
> arguable what state dump-and-restore should leave the new database in,
> but I don't see why it shouldn't work.  I predict we'll end up with
> unhappy users if we leave it like this.

Keeping in mind that mva may take hours to refresh, and mvb may
take only minutes once you have the data from mva, what behavior do
you think is preferable?

The alternatives I can think of are:

(1)  Force mva to refresh on restore, even though it was empty and
not scannable on dump.  This may delay completion of the restore
for an extended time.  It would leave both mva and mvb populated.

(2)  Populate mvb by using mva's query as a regular view.  This
would leave things in the same state as they were on dump, and
might possibly optimized to something faster than generating mva
and then mvb; but probably would not be much faster in most cases.

(3)  Change the failure to generate data for mvb in this case as a
WARNING rather than an ERROR.

(4)  Actually dump and restore data with COPY statements for
materialized views, rather than having the dump create REFRESH
statements.  The main down side of this, it seems to me, is that it
opens up materialized views to direct tinkering of contents via SQL
statements, which I was hoping to avoid.  Perhaps this can be
mitigated in some way.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-16 18:00:34
Message-ID: 1361037634.67590.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote:
>> On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>>> There is one odd aspect to pg_dump, but I think the way it is
>>> behaving is the best way to handle it, although I invite other
>>> opinions.  If you load from pg_dump output, it will try to
>>> populated materialized views which were populated on dump, and
>>> leave the ones which were not scannable because the contents had
>>> not been generated in an empty and unscannable state on restore.
>>> That much seems pretty obvious.  Where it gets  a little tricky is
>>> if mva is generated with data, and mvb is generated based on mva.
>>> Then mva is truncated.  Then you dump.  mvb was populated at the
>>> time of the dump, but its contents can't be regenerated on restore
>>> because mva is not scannable.  As the patch currently stands, you
>>> get an error on the attempt to REFRESH mvb.  I think that's a good
>>> thing, but I'm open to arguments to the contrary.
>>
>> Hmm, anything that means a dump-and-restore can fail seems like a bad
>> thing to me.  There's nothing outrageous about that scenario.  It's
>> arguable what state dump-and-restore should leave the new database in,
>> but I don't see why it shouldn't work.  I predict we'll end up with
>> unhappy users if we leave it like this.
>
> pg_upgrade is going to fail on that pg_restore error.  :-(

With the hard link option it should succeed, I would think.  If we
arranged for the check option, when run without the hard link
option, to report such cases so that the user could choose to
either truncate mvb or refresh mva before the upgrade, would that
satisfy this concern?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-16 19:46:45
Message-ID: 20130216194645.GA31264@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> >> There is one odd aspect to pg_dump, but I think the way it is
> >> behaving is the best way to handle it, although I invite other
> >> opinions.  If you load from pg_dump output, it will try to
> >> populated materialized views which were populated on dump, and
> >> leave the ones which were not scannable because the contents had
> >> not been generated in an empty and unscannable state on restore.
> >> That much seems pretty obvious.  Where it gets  a little tricky is
> >> if mva is generated with data, and mvb is generated based on mva.
> >> Then mva is truncated.  Then you dump.  mvb was populated at the
> >> time of the dump, but its contents can't be regenerated on restore
> >> because mva is not scannable.  As the patch currently stands, you
> >> get an error on the attempt to REFRESH mvb.  I think that's a good
> >> thing, but I'm open to arguments to the contrary.
> >
> > Hmm, anything that means a dump-and-restore can fail seems like a bad
> > thing to me.  There's nothing outrageous about that scenario.  It's
> > arguable what state dump-and-restore should leave the new database in,
> > but I don't see why it shouldn't work.  I predict we'll end up with
> > unhappy users if we leave it like this.

I agree that making the dump fail on this account is bad.

> Keeping in mind that mva may take hours to refresh, and mvb may
> take only minutes once you have the data from mva, what behavior do
> you think is preferable?
>
> The alternatives I can think of are:
>
> (1)  Force mva to refresh on restore, even though it was empty and
> not scannable on dump.  This may delay completion of the restore
> for an extended time.  It would leave both mva and mvb populated.

This is reasonable. If the user doesn't like it, he can presumably use an
edited dump list to remove the REFRESHes.

> (2)  Populate mvb by using mva's query as a regular view.  This
> would leave things in the same state as they were on dump, and
> might possibly optimized to something faster than generating mva
> and then mvb; but probably would not be much faster in most cases.

Interesting idea, but I don't think adding novel server behavior is justified
to achieve this.

> (3)  Change the failure to generate data for mvb in this case as a
> WARNING rather than an ERROR.

This is also fair. However, I think it's better to restore more valid MVs
(option 1) than fewer.

> (4)  Actually dump and restore data with COPY statements for
> materialized views, rather than having the dump create REFRESH
> statements.  The main down side of this, it seems to me, is that it
> opens up materialized views to direct tinkering of contents via SQL
> statements, which I was hoping to avoid.  Perhaps this can be
> mitigated in some way.

This is a door better left closed.

Overall, I recommend option 1. None of the options will furnish the desire of
every database, but the DBA can always tailor the outcome by replacing the
dumped REFRESH statements with his own. I'm not envisioning that MVs left
invalid for the long term will be a typical thing, anyway.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-17 15:13:27
Message-ID: 1361114007.83939.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote:

> I agree that making the dump fail on this account is bad.

I would argue that this is an overstatement of the issue except for
restores that use the single-transaction switch and pg_upgrade
without the hard link option.  In all other cases, one could just
issue REFRESH statements after the dump successfully completed all
the other work.  But those two cases are important enough that the
issue must be seriously considered.

>> (1)  Force mva to refresh on restore, even though it was empty
>> and not scannable on dump.  This may delay completion of the
>> restore for an extended time.  It would leave both mva and mvb
>> populated.
>
> This is reasonable.  If the user doesn't like it, he can
> presumably use an edited dump list to remove the REFRESHes.

> Overall, I recommend option 1.

I'm OK with that approach, and in the absence of anyone pushing for
another direction, will make that change to pg_dump.  I'm thinking
I would only do this for materialized views which were not
scannable, but which cause REFRESH failures on other materialized
views if not refreshed first (recursively evaluated), rather than
just automatically refreshing all MVs on restore.  The reason this
seems important is that some MVs may take a long time to refresh,
and a user might want a dump/restore to get to a point where they
can use the rest of the database while building the contents of
some MVs in the background or during off hours.

> None of the options will furnish the desire of every database,

Agreed.

> but the DBA can always tailor the outcome by replacing the dumped
> REFRESH statements with his own.

... or by issuing TRUNCATE or REFRESH statements before the dump to
avoid the issue.

> I'm not envisioning that MVs left invalid for the long term will
> be a typical thing, anyway.

Agreed.  I think this will be an infrequent issue caused by unusual
user actions; but it would be bound to come up occasionally.

-Kevin


From: Thom Brown <thom(at)linux(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-17 20:23:57
Message-ID: CAA-aLv5K9LkdTjYbrB=QxqKDE4TA4nyvg_t1oc2ZF_yAfq7s3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 16 February 2013 01:01, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Unless something else comes up in review or I get feedback to the
> contrary I plan to deal with the above-mentioned issues and commit
> this within a week or two.

At the moment it's not possible to rename a column without using ALTER
TABLE on an MV.

Also, shouldn't we have the ability to set the collation on a column of the MV?

And the inconsistency between regular views and MVs is still present,
where MVs always dump with column parameters in their definition, and
views never do. Not a show-stopper, but curious nonetheless.

--
Thom


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-18 13:02:10
Message-ID: 20130218130210.GA4739@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner escribió:

> I'm OK with that approach, and in the absence of anyone pushing for
> another direction, will make that change to pg_dump.  I'm thinking
> I would only do this for materialized views which were not
> scannable, but which cause REFRESH failures on other materialized
> views if not refreshed first (recursively evaluated), rather than
> just automatically refreshing all MVs on restore.  The reason this
> seems important is that some MVs may take a long time to refresh,
> and a user might want a dump/restore to get to a point where they
> can use the rest of the database while building the contents of
> some MVs in the background or during off hours.

Maybe it would be a good idea to try to put such commands at the very
end of the dump, if possible.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-18 14:49:14
Message-ID: 1361198954.52351.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Kevin Grittner escribió:
>
>> I'm OK with that approach, and in the absence of anyone pushing
>> for another direction, will make that change to pg_dump.  I'm
>> thinking I would only do this for materialized views which were
>> not scannable, but which cause REFRESH failures on other
>> materialized views if not refreshed first (recursively
>> evaluated), rather than just automatically refreshing all MVs on
>> restore.  The reason this seems important is that some MVs may
>> take a long time to refresh, and a user might want a
>> dump/restore to get to a point where they can use the rest of
>> the database while building the contents of some MVs in the
>> background or during off hours.
>
> Maybe it would be a good idea to try to put such commands at the
> very end of the dump, if possible.

Here is the dump order as currently implemented in that patch.  MVs
are created at the same priority as tables and views.  MV REFRESH
and MV index builds obviously need to follow population of table
data. These are at the same priority because it makes the most
sense to populated an MV without any indexes and then build them
before the MV is used to populate some other MV.  Dependency
information is used to get that to sort properly within the
priority level.

    1,                            /* DO_NAMESPACE */
    2,                            /* DO_PROCLANG */
    3,                            /* DO_COLLATION */
    4,                            /* DO_EXTENSION */
    5,                            /* DO_TYPE */
    5,                            /* DO_SHELL_TYPE */
    6,                            /* DO_FUNC */
    7,                            /* DO_AGG */
    8,                            /* DO_OPERATOR */
    9,                            /* DO_OPCLASS */
    9,                            /* DO_OPFAMILY */
    10,                            /* DO_CAST */
    11,                            /* DO_CONVERSION */
    12,                            /* DO_TSPARSER */
    13,                            /* DO_TSTEMPLATE */
    14,                            /* DO_TSDICT */
    15,                            /* DO_TSCONFIG */
    16,                            /* DO_FDW */
    17,                            /* DO_FOREIGN_SERVER */
    18,                            /* DO_TABLE */
    19,                            /* DO_DUMMY_TYPE */
    20,                            /* DO_ATTRDEF */
    21,                            /* DO_BLOB */
    22,                            /* DO_PRE_DATA_BOUNDARY */
    23,                            /* DO_TABLE_DATA */
    24,                            /* DO_BLOB_DATA */
    25,                            /* DO_POST_DATA_BOUNDARY */
    26,                            /* DO_CONSTRAINT */
    27,                            /* DO_INDEX */
    28,                            /* DO_REFRESH_MATVIEW */
    28                             /* DO_MATVIEW_INDEX */
    29,                            /* DO_RULE */
    30,                            /* DO_TRIGGER */
    31,                            /* DO_FK_CONSTRAINT */
    32,                            /* DO_DEFAULT_ACL */
    33,                            /* DO_EVENT_TRIGGER */

I don't think that pushing MV refreshes and index creation farther
down the list should require anything beyond adjusting the priority
numbers.  I don't see a problem pushing them to the end.  Does
anyone else see anything past priority 28 that MV population should
*not* follow?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-18 15:18:34
Message-ID: 1361200714.37038.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Thom Brown <thom(at)linux(dot)com> wrote:
> On 16 February 2013 01:01, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> Unless something else comes up in review or I get feedback to
>> the contrary I plan to deal with the above-mentioned issues and
>> commit this within a week or two.
>
> At the moment it's not possible to rename a column without using
> ALTER TABLE on an MV.
>
> Also, shouldn't we have the ability to set the collation on a
> column of the MV?

Will fix.

> And the inconsistency between regular views and MVs is still
> present, where MVs always dump with column parameters in their
> definition, and views never do.  Not a show-stopper, but curious
> nonetheless.

I haven't worried about this because current behavior generates
correct results -- this seems like a micro-optimization.  The
explanation for why it wound up that way is that creating a
materialized view is in many ways more like creating a table than
like creating a view -- it seemed safer and less invasive to modify
the CREATE TABLE code than the CREATE VIEW code, and specifying
column names just fell out of that as part of the minimal change.
In looking at the pg_dump output, though, I see that the CMV AS
clause also is getting the names right with column-level aliases,
so it should be pretty simple and safe to leave off the
column-list section for MVs.  I guess it's worth it just to
forestall further questions on the topic.

Thanks!

-Kevin


From: Noah Misch <noah(at)leadboat(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-18 20:28:43
Message-ID: 20130218202843.GA28767@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > Maybe it would be a good idea to try to put such commands at the
> > very end of the dump, if possible.

>     25,                            /* DO_POST_DATA_BOUNDARY */
>     26,                            /* DO_CONSTRAINT */
>     27,                            /* DO_INDEX */
>     28,                            /* DO_REFRESH_MATVIEW */
>     28                             /* DO_MATVIEW_INDEX */
>     29,                            /* DO_RULE */
>     30,                            /* DO_TRIGGER */
>     31,                            /* DO_FK_CONSTRAINT */
>     32,                            /* DO_DEFAULT_ACL */
>     33,                            /* DO_EVENT_TRIGGER */
>
> I don't think that pushing MV refreshes and index creation farther
> down the list should require anything beyond adjusting the priority
> numbers.  I don't see a problem pushing them to the end.  Does
> anyone else see anything past priority 28 that MV population should
> *not* follow?

DO_EVENT_TRIGGER should remain last; it may change the behavior of nearly any
other command.

Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the outcome when the MV
calls functions that ultimately trip triggers or rules. Currently, the
behavior will be the same as for CHECK constraints: the rules and triggers
don't exist yet. This may also affect, for the better, MVs referencing views
that need the CREATE TABLE ... CREATE RULE _RETURN restoration pathway. It
looks like a positive change. On the flip side, I wonder if there's some case
I'm not considering where it's important to delay restoring rules and/or
triggers until after restoring objects for which restoration can entail calls
to arbitrary user functions.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-18 21:48:05
Message-ID: 1361224085.82835.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote:
>> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>>> Maybe it would be a good idea to try to put such commands at
>>> the very end of the dump, if possible.
>
>>     25,                            /* DO_POST_DATA_BOUNDARY */
>>     26,                            /* DO_CONSTRAINT */
>>     27,                            /* DO_INDEX */
>>     28,                            /* DO_REFRESH_MATVIEW */
>>     28                             /* DO_MATVIEW_INDEX */
>>     29,                            /* DO_RULE */
>>     30,                            /* DO_TRIGGER */
>>     31,                            /* DO_FK_CONSTRAINT */
>>     32,                            /* DO_DEFAULT_ACL */
>>     33,                            /* DO_EVENT_TRIGGER */
>>
>> I don't think that pushing MV refreshes and index creation
>> farther down the list should require anything beyond adjusting
>> the priority numbers.  I don't see a problem pushing them to the
>> end.  Does anyone else see anything past priority 28 that MV
>> population should *not* follow?
>
> DO_EVENT_TRIGGER should remain last; it may change the behavior
> of nearly any other command.
>
> Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the
> outcome when the MV calls functions that ultimately trip triggers
> or rules. Currently, the behavior will be the same as for CHECK
> constraints: the rules and triggers don't exist yet.  This may
> also affect, for the better, MVs referencing views that need the
> CREATE TABLE ... CREATE RULE _RETURN restoration pathway.  It
> looks like a positive change.  On the flip side, I wonder if
> there's some case I'm not considering where it's important to
> delay restoring rules and/or triggers until after restoring
> objects for which restoration can entail calls to arbitrary user
> functions.

I didn't quite follow all of Noah's points or their implications,
so we chatted off-list.  He made a couple additional observations
which allow some simplification of the patch, and allow MV REFRESH
to be moved to the very end of the priority list without ill
effect.

(1)  While it might be incorrect for the CREATE INDEX on a
materialized view to come after event triggers are set up, REFRESH
can be expected to be a routine action in the presence of such
triggers, and it might actually be incorrect to REFRESH when the
triggers are not present.

(2)  REFRESH MATERIALIZED VIEW creates and builds a new heap, and
reindexes it after the data has been loaded, so the timing of the
CREATE INDEX statements on MVs is not critical, as long as they are
done after the CREATE and before the REFRESH.  We could drop them
into the same priority as the other CREATE INDEX statements, and it
would not be a big deal because the MVs would be empty.

This should allow me to simplify the code a little bit and move the
RMV step to the very end.  That may have some advantages when users
want to start using the database while MVs are being populated.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 13:54:34
Message-ID: CA+TgmoahDmiGRSFVbTv=_ZyMDJD4a2DeixRV-fbiAmqdfO91SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, Feb 18, 2013 at 4:48 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> This should allow me to simplify the code a little bit and move the
> RMV step to the very end. That may have some advantages when users
> want to start using the database while MVs are being populated.

In the department of crazy ideas, what about having pg_dump NEVER
refresh ANY materialized views?

It's true that the job of pg_dump and pg_restore is to put the new
database in the same state that the old database was in, but I think
you could make a reasonable case that materialized views ought to be
an exception. After all, even with all of this infrastructure,
chances are pretty good that the new MV contents won't end up being
the same as the old MV contents on the old server - because the old
MVs could easily have been stale. So why not just get the restore
over with as fast as possible, and then let the user refresh the MVs
that they think need refreshing (perhaps after getting the portions of
their system that don't rely on MVs up and running)?

At the very least, I think we ought to have an option for this
behavior. But the more I think about it, the more I think maybe it
ought to be the default.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 14:24:28
Message-ID: CAP-rdTY1XZKEv1_zfAHP3cK4AVYOz+pAtd3Xmi2Cb-GPWvniBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

2013/2/19 Robert Haas <robertmhaas(at)gmail(dot)com>:

> In the department of crazy ideas, what about having pg_dump NEVER
> refresh ANY materialized views?
>
> It's true that the job of pg_dump and pg_restore is to put the new
> database in the same state that the old database was in, but I think
> you could make a reasonable case that materialized views ought to be
> an exception. After all, even with all of this infrastructure,
> chances are pretty good that the new MV contents won't end up being
> the same as the old MV contents on the old server - because the old
> MVs could easily have been stale. So why not just get the restore
> over with as fast as possible, and then let the user refresh the MVs
> that they think need refreshing (perhaps after getting the portions of
> their system that don't rely on MVs up and running)?
>
> At the very least, I think we ought to have an option for this
> behavior. But the more I think about it, the more I think maybe it
> ought to be the default.

+1 from me from a minimalist point of view.

I think of a matview of the manually refreshed kind as “can contain
stale contents (or be invalid) unless someone manually makes sure it
is up to date (or valid)”. Making any matviews invalid by default upon
restoring (itself being a manual action) would be consistent with that
definition. Additionally, ISTM to be the least arbitrary (and hence
most elegant) choice, and even more so in the context of
matviews-depending-on-matviews.

Spamming some more craziness:

Another (more elaborate) suggestion could be: Store for each matview
whether it is to be rebuilt upon restore or not. Using this setting
would intuitively mean something like “I consider this matview being
valid a precondition for considering the database state valid.”
Setting this to true for a matview would only be allowed when any
other matviews on which it depends also have this setting set to true.

Just my €0.02 of course.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 16:01:36
Message-ID: 5123A1E0.70405@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/19/13 8:54 AM, Robert Haas wrote:
> In the department of crazy ideas, what about having pg_dump NEVER
> refresh ANY materialized views?

It might be useful to have an option for this, but I don't think it
should be the default. The default should be that the new database is
"ready to go".

Then again, when would you ever actually use that option?

This might be different if there were a command to refresh all
materialized views, because you don't want to have to go around and type
separate commands 47 times after a restore.


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Kevin Grittner" <kgrittn(at)ymail(dot)com>
Cc: "Noah Misch" <noah(at)leadboat(dot)com>, "Kevin Grittner" <kgrittn(at)mail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Marko Tiikkaja" <pgmail(at)joh(dot)to>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 22:09:13
Message-ID: 25965310c10b4d07bab2fd3c880135a3.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
> matview-v4.patch.gz

Hi,

I was wondering if material views should not go into information_schema. I was thinking either
.views or .tables. Have you considered this?

I ask because as far as I can see querying for mv's has to go like this:

SELECT n.nspname, c.relname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('m','')
and n.nspname = 'myschema'

which seems rather ugly.

Also, some documentation typos: please see attached.

Thanks,

Erik Rijkers

Attachment Content-Type Size
matviewtypos.diff application/octet-stream 1.7 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 22:18:24
Message-ID: 5123FA30.8080006@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 02/19/2013 02:09 PM, Erik Rijkers wrote:
> On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
>> matview-v4.patch.gz
>
> Hi,
>
> I was wondering if material views should not go into information_schema. I was thinking either
> .views or .tables. Have you considered this?
>
> I ask because as far as I can see querying for mv's has to go like this:
>
> SELECT n.nspname, c.relname
> FROM pg_catalog.pg_class c
> LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
> WHERE c.relkind IN ('m','')
> and n.nspname = 'myschema'

Well, I'm not sure about information_schema, but we'll definitely want a
pg_matviews system view. Also a \dM. That could wait until 9.4, though.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: David Fetter <david(at)fetter(dot)org>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 22:22:26
Message-ID: 20130219222226.GD5670@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote:
> On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
> > matview-v4.patch.gz
>
> Hi,
>
> I was wondering if material views should not go into information_schema. I was thinking either
> .views or .tables. Have you considered this?

I'm guessing it'd be .views if anything. Haven't been able to
decipher from section 11 of the standard (Schemata) whether the
standard has anything to say on the matter.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 22:30:52
Message-ID: 1361313052.97060.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Erik Rijkers <er(at)xs4all(dot)nl> wrote:

> I was wondering if material views should not go into
> information_schema.  I was thinking either .views or .tables.
> Have you considered this?

I had not considered this to be a good idea because
information_schema is defined by the standard, and materialized
views are an extension to the standard.  Someone using these views
to identify either tables or views might make a bad choice based on
this.  I'm open to arguments for inclusion, if you think it would
not violate the standard.  Which would be safe?

> Also, some documentation typos: please see attached.

Will apply.  Thanks.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 22:38:24
Message-ID: 1361313504.17714.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> Well, I'm not sure about information_schema, but we'll definitely
> want a pg_matviews system view.

> That could wait until 9.4, though.

That I could probably do.  Do you think they should have a separate
pg_stat_user_matviews table, etc., or do you think it would be
better to include them in with tables there?

> Also a \dM.

I already added it as \dm in the current patch.  Does that conflict
with something else that's pending?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 22:45:10
Message-ID: 51240076.3060108@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


> That I could probably do. Do you think they should have a separate
> pg_stat_user_matviews table, etc., or do you think it would be
> better to include them in with tables there?

Well, ideally pg_matviews would have matview definitions, and
pg_stat_matviews would have stats on matview usage and rows. But see
what you can get done; I imagine we'll overhaul it for 9.4 anyway once
we've had a chance to use the feature.

>
>> Also a \dM.
>
> I already added it as \dm in the current patch. Does that conflict
> with something else that's pending?

Oh, no, I thought \dm was *already* in use, but apparently not.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 22:47:24
Message-ID: 1361314044.27824.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com>

> There was one minor syntax issue not addressed by Noah, nor much
> discussed in general that I didn't want to just unilaterally
> choose; but given that nobody seems to care that much I will put
> forward a proposal and do it that way tomorrow if nobody objects.
> Before this patch tables were the only things subject to
> truncation, but now materialized views can also be truncated.  So
> far we have been treating TABLE as a noise word in the truncate
> command.  I assume we still want to allow tables to be truncated
> with or without the word.  The question is what to do about
> materialized views, and wheter both can be specified on a single
> TRUNCATE statement.  I propose that we allow TABLE or MATERIALIZED
> VIEW to be specified, or that part of the statement to be left out.
> I propose that either type of object be allowed unless one or the
> other is specified and the object to be truncated is not of that
> kind.  So you could mix both kinds on one statement, so long as you
> didn't specify either kind.

When I went to do this, I hit a shift/reduce conflict, because with
TABLE being optional it couldn't tell whether:

TRUNCATE MATERIALIZED VIEW x, y, z;

... was looking for five relations or three.  That goes away with
MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that OK?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 23:32:51
Message-ID: 1361316771.19892.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

>> That I could probably do.  Do you think they should have a separate
>> pg_stat_user_matviews table, etc., or do you think it would be
>> better to include them in with tables there?
>
> Well, ideally pg_matviews would have matview definitions, and
> pg_stat_matviews would have stats on matview usage and rows.  But see
> what you can get done; I imagine we'll overhaul it for 9.4 anyway once
> we've had a chance to use the feature.

I agree on pg_matviews, but after looking over the existing views
and thinking about what I would use them for as a DBA, I'm inclined
to fold the backing tables for MVs into the _stat_ and _statio_
views -- especially since we already include the backing tables and
indexes for TOAST.  There is a precident for including
implementation details at that level.  The only difference from
TOAST, is that I include the heap and indexes for MVs in the _user_
views.  I'm attaching the patch for just the system_views.sql file
for discussion before I go write docs for this part.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
matview-system_views-v1.diff text/x-patch 2.7 KB

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-19 23:41:16
Message-ID: 1361317276.13105.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> I'm attaching the patch for just the system_views.sql file
> for discussion before I go write docs for this part.

Meh.  If I'm gonna have pg_matviews I might as well include an
isscannable column.  v2 attached.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
matview-system_views-v2.diff text/x-patch 2.7 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 00:07:08
Message-ID: 512413AC.7040901@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 02/19/2013 03:41 PM, Kevin Grittner wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
>> I'm attaching the patch for just the system_views.sql file
>> for discussion before I go write docs for this part.
>
> Meh. If I'm gonna have pg_matviews I might as well include an
> isscannable column. v2 attached.

pg_get_viewdef() will work on Matviews? Great.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 08:11:18
Message-ID: 23244.1361347878@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> When I went to do this, I hit a shift/reduce conflict, because with
> TABLE being optional it couldn't tell whether:

> TRUNCATE MATERIALIZED VIEW x, y, z;

> ... was looking for five relations or three. That goes away with
> MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK?

Not really. I would much rather see us not bother with this pedantic
syntax than introduce an even-partially-reserved word.

Having said that, I don't think I believe your analysis of why this
doesn't work. The presence or absence of commas ought to make the
syntax non-ambiguous, I would think. Maybe you just factored the
grammar wrong.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 11:13:49
Message-ID: CA+TgmoYXSEEpfUMKytf1a_AJyN1jPD3ZFfZ+faXYn+PwOtw1=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 2/19/13 8:54 AM, Robert Haas wrote:
>> In the department of crazy ideas, what about having pg_dump NEVER
>> refresh ANY materialized views?
>
> It might be useful to have an option for this, but I don't think it
> should be the default. The default should be that the new database is
> "ready to go".
>
> Then again, when would you ever actually use that option?

You'd use that option if you'd rather get the database mostly-up as
soon as possible, and then worry about the materialized views
afterwards.

> This might be different if there were a command to refresh all
> materialized views, because you don't want to have to go around and type
> separate commands 47 times after a restore.

Well, it's pretty easy to do:

SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
pg_class WHERE relkind = 'm';

...but we could also add explicit syntax for it, perhaps along the
lines of what we have for CLUSTER and VACUUM.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 11:43:42
Message-ID: 27998.1361360622@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> This might be different if there were a command to refresh all
>> materialized views, because you don't want to have to go around and type
>> separate commands 47 times after a restore.

> Well, it's pretty easy to do:

> SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
> pg_class WHERE relkind = 'm';

> ...but we could also add explicit syntax for it, perhaps along the
> lines of what we have for CLUSTER and VACUUM.

It's not really that easy, because of the likelihood that MVs have to be
refreshed in a specific order. The SELECT you suggest definitely seems
too simplistic. A dedicated command could perhaps be built to pay
attention to dependencies ... but if we're still coding such things now,
it seems a bit late for 9.3.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 14:29:45
Message-ID: 5124DDD9.1040800@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/19/13 5:22 PM, David Fetter wrote:
> On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote:
>> On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
>>> matview-v4.patch.gz
>>
>> Hi,
>>
>> I was wondering if material views should not go into information_schema. I was thinking either
>> .views or .tables. Have you considered this?
>
> I'm guessing it'd be .views if anything. Haven't been able to
> decipher from section 11 of the standard (Schemata) whether the
> standard has anything to say on the matter.

I suppose one should be able to expect that if one finds a view in the
information schema, then one should be able to use DROP VIEW to remove
it. Which in this case wouldn't work. So I don't think including a
materialized view under views or tables is appropriate.


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 15:24:14
Message-ID: 1361373854.33741.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> When I went to do this, I hit a shift/reduce conflict, because
>> with TABLE being optional it couldn't tell whether:
>
>> TRUNCATE MATERIALIZED VIEW x, y, z;
>
>> ... was looking for five relations or three.  That goes away
>> with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
>> OK?
>
> Not really.  I would much rather see us not bother with this
> pedantic syntax than introduce an even-partially-reserved word.

I'm not sure it's worth it either; but two people requested it and
I didn't forsee this shift/reduce conflict, so I took a shot at it.
 If we can't eliminate the conflict, I'm fine with leaving things
as they are in the latest posted patch.

> Having said that, I don't think I believe your analysis of why
> this doesn't work.  The presence or absence of commas ought to
> make the syntax non-ambiguous, I would think.  Maybe you just
> factored the grammar wrong.

Well, it wouldn't be the first time you've seen a better way to do
something in flex than I was able to see.  Taking just the gram.y
part of the change which implemented this, and omitting the change
in reservedness of MATERIALIZED, I have:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 820cb41..1d393c5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -394,6 +394,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);

 %type <ival>   opt_column event cursor_options opt_hold opt_set_data
 %type <objtype>    reindex_type drop_type comment_type security_label_type
+               trunc_type

 %type <node>   fetch_args limit_clause select_limit_value
                offset_clause select_offset_value
@@ -5172,9 +5173,10 @@ attrs:       '.' attr_name
  *****************************************************************************/

 TruncateStmt:
-           TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior
+           TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior
                {
                    TruncateStmt *n = makeNode(TruncateStmt);
+                   n->objtype = $2;
                    n->relations = $3;
                    n->restart_seqs = $4;
                    n->behavior = $5;
@@ -5182,6 +5184,12 @@ TruncateStmt:
                }
        ;

+trunc_type:
+           TABLE                       { $$ = OBJECT_TABLE; }
+           | MATERIALIZED VIEW         { $$ = OBJECT_MATVIEW; }
+           | /*EMPTY*/                 { $$ = OBJECT_UNSPECIFIED; }
+       ;
+
 opt_restart_seqs:
            CONTINUE_P IDENTITY_P       { $$ = false; }
            | RESTART IDENTITY_P        { $$ = true; }

I'm open to suggestions on a better way.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 15:28:25
Message-ID: 1361374105.73929.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> I suppose one should be able to expect that if one finds a view
> in the information schema, then one should be able to use DROP
> VIEW to remove it.  Which in this case wouldn't work.  So I don't
> think including a materialized view under views or tables is
> appropriate.

Right.  I think adding pg_matviews covers the stated use-case
enough to answer Erik's concern.  I'm not going to mess with adding
non-standard stuff to the standard views.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 16:20:41
Message-ID: 7366.1361377241@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Having said that, I don't think I believe your analysis of why
>> this doesn't work.

> Well, it wouldn't be the first time you've seen a better way to do
> something in flex than I was able to see. Taking just the gram.y
> part of the change which implemented this, and omitting the change
> in reservedness of MATERIALIZED, I have:

> - TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior
> + TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior

> +trunc_type:
> + TABLE { $$ = OBJECT_TABLE; }
> + | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; }
> + | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; }
> + ;

Yeah, this is a standard gotcha when working with unreserved keywords.
You can't factor it like that because then the parser is required to
make a shift-reduce decision (on whether to reduce trunc_type to empty)
before it can "see past" the first word. So for instance given

TRUNCATE MATERIALIZED ...
^

the parser has to make that decision when it can't see past the word
"MATERIALIZED" and so doesn't know what comes after it.

The way to fix it is to not try to use the sub-production but spell it
all out:

TRUNCATE TABLE relation_expr_list ...
| TRUNCATE MATERIALIZED VIEW relation_expr_list ...
| TRUNCATE relation_expr_list ...

Now the parser doesn't have to make any shift-reduce decision until
after it can "see past" the first identifier. It's a bit tedious
but beats making a word more reserved than it has to be.

regards, tom lane


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 16:26:10
Message-ID: 1361377570.36972.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> The way to fix it is to not try to use the sub-production but spell it
> all out:
>
>       TRUNCATE TABLE relation_expr_list ...
>     | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
>     | TRUNCATE relation_expr_list ...
>
> Now the parser doesn't have to make any shift-reduce decision until
> after it can "see past" the first identifier.  It's a bit tedious
> but beats making a word more reserved than it has to be.

Thanks!  Will do.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 16:58:15
Message-ID: 512500A7.6000200@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/20/13 6:13 AM, Robert Haas wrote:
>> It might be useful to have an option for this, but I don't think it
>> > should be the default. The default should be that the new database is
>> > "ready to go".
>> >
>> > Then again, when would you ever actually use that option?
> You'd use that option if you'd rather get the database mostly-up as
> soon as possible, and then worry about the materialized views
> afterwards.

Since the proposed materialized views are not available for implicit use
in query optimization, the only way an application would make use of
them is to access them directly. And if it accesses an unpopulated
materialized view, it would fail. So I don't think in the current state
a database is mostly-up without the materialized views filled in.

I can see the value in having a restore mode that postpones certain
nonessential operations, such as creating indexes or certain constraints
or even materialized views. But I think the boundaries and expectations
for that need to be defined more precisely. For example, a database
without constraints might be considered "ready for read-only use",
without secondary indexes it might be "ready for use but slow".


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Kevin Grittner" <kgrittn(at)ymail(dot)com>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "David Fetter" <david(at)fetter(dot)org>, "Noah Misch" <noah(at)leadboat(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Marko Tiikkaja" <pgmail(at)joh(dot)to>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 17:14:48
Message-ID: 7eaefd1edb4db28832310b8013d5477b.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, February 20, 2013 16:28, Kevin Grittner wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>
>> I suppose one should be able to expect that if one finds a view
>> in the information schema, then one should be able to use DROP
>> VIEW to remove it.  Which in this case wouldn't work.  So I don't
>> think including a materialized view under views or tables is
>> appropriate.
>
> Right.  I think adding pg_matviews covers the stated use-case
> enough to answer Erik's concern. 

Absolutely - I agree pg_matviews is much better than adding deviating information_schema stuff.

Thank you,

Erik Rijkers


From: Greg Stark <stark(at)mit(dot)edu>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 17:59:59
Message-ID: CAM-w4HNwDzC+f9RapEw6-Su9BmtGi6WUaaJP54cN1J3gyapMFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Feb 20, 2013 at 4:26 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>
>> The way to fix it is to not try to use the sub-production but spell it
>> all out:
>>
>> TRUNCATE TABLE relation_expr_list ...
>> | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
>> | TRUNCATE relation_expr_list ...
>>
>> Now the parser doesn't have to make any shift-reduce decision until
>> after it can "see past" the first identifier. It's a bit tedious
>> but beats making a word more reserved than it has to be.
>
> Thanks! Will do.

Fwiw I think worrying about stuff like this at this point is probably
a waste of time. There'll be a period of bike-shedding where people
debate what the command should be called so worrying about parser
conflicts before there's a consensus is kind pointless.

I would like to know what operations you plan to support independently
of the command names. I may have missed much earlier in the discussion
but then I suspect things have evolved since then.

It sounds like you want to support:

1) Selecting from materialized viws
2) Manually refreshing materialized views
3) Manually truncating materialized views

And explicitly not support

1) Automatically rewriting queries to select from matching views
2) Incrementally refreshing materialized views
3) Manual DML against data in materialized views (except truncate
which is kind of DDL)
4) Keeping track of whether the data in the materialized view is up to date

I have to say I find this model a bit odd. It seems the UI you're
presenting is that they're basically read-only tables that the
database will fill in the data for automatically. My mental model of
materialized views is that they're basically views that the database
guarantees a different performance characteristic for.

I would expect a materialized view to be up to date all the time. If
we don't support incremental updates (which seems like a fine thing
not to support in a first cut) then I would expect any DML against the
table to mark the view invalid and any queries against it to produce
an error (or possibly go to the source tables using the view
definition but that's probably a bad idea for most use cases). Ie.
they should behave like a view at all times and have up to date
information or fail entirely.

I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
I would expect it to be called something like INVALIDATE rather than
TRUNCATE and dropping the storage is a side effect of simply telling
the database that it doesn't need to maintain this materialized view.
Though I could be convinced "truncate" is a good name as long as it's
documented well.

--
greg


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 18:28:23
Message-ID: 1361384903.52749.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>>> The way to fix it is to not try to use the sub-production but
>>> spell it all out:
>>>
>>>       TRUNCATE TABLE relation_expr_list ...
>>>     | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
>>>     | TRUNCATE relation_expr_list ...
>>>
>>> Now the parser doesn't have to make any shift-reduce decision
>>> until after it can "see past" the first identifier.  It's a bit
>>> tedious but beats making a word more reserved than it has to
>>> be.
>>
>> Thanks!  Will do.
>
> Fwiw I think worrying about stuff like this at this point is
> probably a waste of time. There'll be a period of bike-shedding
> where people debate what the command should be called so worrying
> about parser conflicts before there's a consensus is kind
> pointless.

That sort of bikeshedding already happened three months ago.  Too
late now.

> I would like to know what operations you plan to support
> independently of the command names. I may have missed much
> earlier in the discussion but then I suspect things have evolved
> since then.
>
> It sounds like you want to support:
>
> 1) Selecting from materialized viws
> 2) Manually refreshing materialized views
> 3) Manually truncating materialized views
>
> And explicitly not support
>
> 1) Automatically rewriting queries to select from matching views
> 2) Incrementally refreshing materialized views

Those are material for later releases, building on the base of what
goes into this release.

> 3) Manual DML against data in materialized views (except truncate
> which is kind of DDL)

There is quite a lot of DML allowed -- changing tablespace,
changing schema, changing name of the MV or of individual columns
in it, changing statistics targets, creating indexes, and other
operations are supported.

> 4) Keeping track of whether the data in the materialized view is
> up to date

Only keeping track of whether data has been populated or not, for
now.  There has been agreement that one or more timestamps relating
to freshness will make sense, but these are not in the initial
patch.

> I have to say I find this model a bit odd.

It's not a model, it's a starting point.  Several people have
already said that even this much is useful and they expect to take
advantage of it.  I'm doing what I can to not paint us into a
corner where it's hard to extend to all the features everyone
dreams of, but if we waited for that to commit something, it will
never happen.

> I would expect a materialized view to be up to date all the time.

I expect that this will eventually be an option, but I expect that
is will be a seldom-used one.  Most cases that I've seen, people
want summary data that is reasonably up-to-date without unduly
affecting the performance of incremental changes to the underlying
data.  I've sketched out the roadmap from this patch to all of
these options in a vauge, handwavy fashion, and don't have a lot of
interest in taking it farther until we're past 9.3 beta.

> If we don't support incremental updates (which seems like a fine
> thing not to support in a first cut) then I would expect any DML
> against the table to mark the view invalid and any queries
> against it to produce an error (or possibly go to the source
> tables using the view definition but that's probably a bad idea
> for most use cases). Ie. they should behave like a view at all
> times and have up to date information or fail entirely.

That would render them completely useless for the use-cases I've
seen.  If you want to offer a patch to do that as an option, feel
free, but I will strongly argue against that as unconditional
behavior.

> I would expect a command like TRUNCATE MATERIALIZED VIEW to exist
> but I would expect it to be called something like INVALIDATE
> rather than TRUNCATE and dropping the storage is a side effect of
> simply telling the database that it doesn't need to maintain this
> materialized view. Though I could be convinced "truncate" is a
> good name as long as it's documented well.

I'm trying to minimize the number of new keywords.  The initial
patch only added MATERIALIZED.  I added REFRESH due to
near-universal demand for something other than the LOAD
MATERIALIZED VIEW I initially used.  Have you seen the statistics
Tom gave out on how much the size of the executable bloats with
every new keyword?  Until now nobody has expressed concern about
TRUNCATE MATERIALIZED VIEW, so it would take quite a groundswell of
concern at this point to even consider a new keyword for this
functionality this late in the game.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 18:33:20
Message-ID: 512516F0.5040501@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


> And explicitly not support
>
> 1) Automatically rewriting queries to select from matching views
> 2) Incrementally refreshing materialized views
> 3) Manual DML against data in materialized views (except truncate
> which is kind of DDL)
> 4) Keeping track of whether the data in the materialized view is up to date

The idea is to add the above features over the next few versions of
Postgres.

> I have to say I find this model a bit odd. It seems the UI you're
> presenting is that they're basically read-only tables that the
> database will fill in the data for automatically.

This is what matviews are in other DBMSes.

> My mental model of
> materialized views is that they're basically views that the database
> guarantees a different performance characteristic for.

How would we do that, exactly? That would be lovely, but unless you
have a way to accomplish it ...

> I would expect a materialized view to be up to date all the time.

Actually, there's a huge use case for asynchronously updated matviews,
so we would not want an implementation which ruled them out. Also
there's the argument that synchronously updated matviews have little
actual performance advantage over regular dynamic views.

Or to put it another way: I could use this feature, as it is, in about 8
different projects I'm currently supporting. I personally can't think
of a single project where I need synchronously updated matviews,
currently. I have in the past, but it's a LOT less frequent that the
desire for async, just as the desire for async replication is more
common than the desire for syncrep.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 18:57:36
Message-ID: 51251CA0.9070203@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/19/13 5:47 PM, Kevin Grittner wrote:
> When I went to do this, I hit a shift/reduce conflict, because with
> TABLE being optional it couldn't tell whether:
>
> TRUNCATE MATERIALIZED VIEW x, y, z;
>
> ... was looking for five relations or three. That goes away with
> MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK?

Is TRUNCATE even the right command here? For regular tables TRUNCATE is
a fast DELETE, which logically empties the table. For materialized
views, there is no deleting, so this command (I suppose?) just
invalidates the materalized view. That's not the same thing.

Are there TRUNCATE triggers on materialized views?


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 19:30:05
Message-ID: 1361388605.11473.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 2/19/13 5:47 PM, Kevin Grittner wrote:
>> When I went to do this, I hit a shift/reduce conflict, because
>> with TABLE being optional it couldn't tell whether:
>>
>> TRUNCATE MATERIALIZED VIEW x, y, z;
>>
>> ... was looking for five relations or three.  That goes away
>> with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
>> OK?
>
> Is TRUNCATE even the right command here?  For regular tables
> TRUNCATE is a fast DELETE, which logically empties the table.
> For materialized views, there is no deleting, so this command (I
> suppose?) just invalidates the materalized view.  That's not the
> same thing.

Hmm.  That's what Greg Stark just said, and I replied that nobody
else had raised the issue in over three months.  With Greg, that's
two now.

TRUNCATE MATERIALIZED VIEW discards any data which has been loaded
into the MV, rendering it unavailable for scanning.  Internally, it
does do a truncate, exactly as truncate table.  The resulting
zero-length heap file is what is used to determine whether a
materialized view is "scannable".  When a CREATE WITH DATA or a
REFRESH generates zero rows, an empty single page is created to
indicate that it is scannable (valid to use in queries) but
contains no rows.

I agree that INVALIDATE is probably more descriptive, although it
seems that there might be some even better word if we bikeshed
enough.  The question is, is it worth creating a new keyword to
call the internal truncate function for materialized views, versus
documenting that truncating a materialized view renders it invalid?
Again, given the numbers that Tom presented a while back about the
space requirements of every new keyword, I don't think this is
enough of a gain to justify that.  I still squirm a little about
having used REFRESH, even though demand for that was overwhelming.

> Are there TRUNCATE triggers on materialized views?

No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
create a trigger of any type on a materialized view.  I don't think
that would interfere with event triggers, though.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 19:35:35
Message-ID: 20130220193535.GL16142@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

* Kevin Grittner (kgrittn(at)ymail(dot)com) wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > Is TRUNCATE even the right command here?  For regular tables
> > TRUNCATE is a fast DELETE, which logically empties the table.
> > For materialized views, there is no deleting, so this command (I
> > suppose?) just invalidates the materalized view.  That's not the
> > same thing.
>
> Hmm.  That's what Greg Stark just said, and I replied that nobody
> else had raised the issue in over three months.  With Greg, that's
> two now.

TRUNCATE MAT VIEW seems like the right command to me. Just my 2c.

Thanks,

Stephen


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 21:25:50
Message-ID: 51253F5E.7000703@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/20/13 2:30 PM, Kevin Grittner wrote:
>> Are there TRUNCATE triggers on materialized views?
> No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't
> create a trigger of any type on a materialized view. I don't think
> that would interfere with event triggers, though.

More generally, I would consider the invalidation of a materialized view
a DDL command, whereas truncating a table is a DML command. This has
various implications with triggers, logging, permissions. I think it's
not good to mix those two.

Also note that un-invalidating==refreshing a materialized view is
already a DDL command.


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-20 22:03:36
Message-ID: 1361397816.6744.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 2/20/13 2:30 PM, Kevin Grittner wrote:
>>> Are there TRUNCATE triggers on materialized views?
>> No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
>> create a trigger of any type on a materialized view.  I don't
>> think that would interfere with event triggers, though.
>
> More generally, I would consider the invalidation of a
> materialized view a DDL command, whereas truncating a table is a
> DML command.

The force of that assertion is somewhat undercut by the fact that
the ExecuteTruncate() function does exactly what needs to be done
to discard the data in a materialized view and make it appear as
non-scannable.  Even if we dress it up with different syntax, it's
not clear that we wouldn't build a TruncateStmt in the parser and
pass it through exactly the same execution path.  We would just
need to look at the relkind to generate a different command tag.

> This has various implications with triggers, logging,
> permissions.  I think it's not good to mix those two.

Could you give a more concrete example of where you see a problem?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: stark(at)mit(dot)edu
Cc: kgrittn(at)ymail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, noah(at)leadboat(dot)com, robertmhaas(at)gmail(dot)com, pgmail(at)joh(dot)to, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 00:45:01
Message-ID: 20130221.094501.857828658682083612.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

> I would like to know what operations you plan to support independently
> of the command names. I may have missed much earlier in the discussion
> but then I suspect things have evolved since then.
>
> It sounds like you want to support:
>
> 1) Selecting from materialized viws
> 2) Manually refreshing materialized views
> 3) Manually truncating materialized views

Maybe plus?

4) Automatically dropping materialized views if underlying table(s)
are dropped/altered

Or this has to be done manually?

> And explicitly not support
>
> 1) Automatically rewriting queries to select from matching views
> 2) Incrementally refreshing materialized views
> 3) Manual DML against data in materialized views (except truncate
> which is kind of DDL)
> 4) Keeping track of whether the data in the materialized view is up to date
>
> I have to say I find this model a bit odd. It seems the UI you're
> presenting is that they're basically read-only tables that the
> database will fill in the data for automatically. My mental model of
> materialized views is that they're basically views that the database
> guarantees a different performance characteristic for.
>
> I would expect a materialized view to be up to date all the time. If
> we don't support incremental updates (which seems like a fine thing
> not to support in a first cut) then I would expect any DML against the
> table to mark the view invalid and any queries against it to produce
> an error (or possibly go to the source tables using the view
> definition but that's probably a bad idea for most use cases). Ie.
> they should behave like a view at all times and have up to date
> information or fail entirely.
>
> I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
> I would expect it to be called something like INVALIDATE rather than
> TRUNCATE and dropping the storage is a side effect of simply telling
> the database that it doesn't need to maintain this materialized view.
> Though I could be convinced "truncate" is a good name as long as it's
> documented well.
>
> --
> greg
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Greg Stark <stark(at)mit(dot)edu>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 04:14:09
Message-ID: CAM-w4HOL2wM9CzHKQKYb=UkTsrj3-pjpPBJje5mi2O5QFQONkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> More generally, I would consider the invalidation of a materialized view
> a DDL command, whereas truncating a table is a DML command.

That's not entirely true. From the database's point of view, TRUNCATE
is in many ways actually DDL.

I actually don't really dislike using "TRUNCATE" for this command. I
was more asking about whether this meant people were thinking of the
view as a thing where you could control the data in it by hand and
could have the view be "empty" rather than just "not valid".

The way I was thinking about it, whatever the command is named, you
might be able to tell the database to drop the storage associated with
the view but that would make the view invalid until it was refreshed.
It wouldn't make it appear to be empty.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 08:31:17
Message-ID: 28841.1361435477@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> The way I was thinking about it, whatever the command is named, you
> might be able to tell the database to drop the storage associated with
> the view but that would make the view invalid until it was refreshed.
> It wouldn't make it appear to be empty.

Actually, that seems like a pretty key point to me. TRUNCATE TABLE
results in a table that is perfectly valid, you just deleted all the
rows that used to be in it. Throwing away an MV's contents should
not result in an MV that is considered valid. That being the case,
lumping them as being the "same" operation feels like the wrong thing,
and so we should choose a different name for the MV operation.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 08:54:06
Message-ID: 20130221085406.GA14586@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2013-02-21 04:14:09 +0000, Greg Stark wrote:
> On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > More generally, I would consider the invalidation of a materialized view
> > a DDL command, whereas truncating a table is a DML command.
>
> That's not entirely true. From the database's point of view, TRUNCATE
> is in many ways actually DDL.
>
> I actually don't really dislike using "TRUNCATE" for this command. I
> was more asking about whether this meant people were thinking of the
> view as a thing where you could control the data in it by hand and
> could have the view be "empty" rather than just "not valid".

It also might get confusing when we get materialized views that are
auto-updateable. I am not suggesting to forward TRUNCATE to the internal
storage in that case but giving an error so its an easy to find
distinction to a normal table seems like a good idea.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 13:05:29
Message-ID: CA+TgmoajyWDF=+8p1=R51soMqtJ+p25xwTfOC_zgg_UMLCreSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Feb 20, 2013 at 11:14 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> More generally, I would consider the invalidation of a materialized view
>> a DDL command, whereas truncating a table is a DML command.
>
> That's not entirely true. From the database's point of view, TRUNCATE
> is in many ways actually DDL.
>
> I actually don't really dislike using "TRUNCATE" for this command.

Me neither. I'm astonished that we're seriously considering adding
new keywords for this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 13:20:11
Message-ID: 51261F0B.3040805@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/20/13 5:03 PM, Kevin Grittner wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On 2/20/13 2:30 PM, Kevin Grittner wrote:
>>>> Are there TRUNCATE triggers on materialized views?
>>> No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't
>>> create a trigger of any type on a materialized view. I don't
>>> think that would interfere with event triggers, though.
>>
>> More generally, I would consider the invalidation of a
>> materialized view a DDL command, whereas truncating a table is a
>> DML command.
>
> The force of that assertion is somewhat undercut by the fact that
> the ExecuteTruncate() function does exactly what needs to be done
> to discard the data in a materialized view and make it appear as
> non-scannable. Even if we dress it up with different syntax, it's
> not clear that we wouldn't build a TruncateStmt in the parser and
> pass it through exactly the same execution path. We would just
> need to look at the relkind to generate a different command tag.

This is a fall-out of the implementation, and that's fine (although I'd
personally still be in favor of putting that state in the catalog, not
into the block count on disk, effectively), but I'm talking about the
external interfaces we present.

>> This has various implications with triggers, logging,
>> permissions. I think it's not good to mix those two.
>
> Could you give a more concrete example of where you see a problem?

* Logging: You can set things to log DDL commands only. I would want a
MV invalidation to be logged.

* Permissions: There is a TRUNCATE permission, would that apply here?
There is no refresh permission.

* Triggers: There are TRUNCATE triggers, but they don't apply here.

* Triggers: I don't know how event triggers work, but I'd like
materialized view events to be grouped together somehow.

* Don't know the opinion of sepgsql on all this.

I think what this all comes down to, as I've mentioned before, is that
the opposite of this proposed truncate operation is the refresh
operation, and that is a DDL command under ALTER MATERIALIZED VIEW.
Both of these fundamental operations -- truncate/refresh,
invalidate/validate, empty/refill, whatever -- should be grouped
together somehow, as far as syntax, as well logging, permissions,
trigger handling, and so on are concerned. You don't need a new command
or key word for this. How about ALTER MATERIALIZED VIEW DISCARD?


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 13:22:16
Message-ID: 51261F88.9060101@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/20/13 11:14 PM, Greg Stark wrote:
> That's not entirely true. From the database's point of view, TRUNCATE
> is in many ways actually DDL.

Whether something is DDL or DML or a read operation (query) is not an
implementation detail, it's a user-exposed category. Since TRUNCATE is
logically equivalent to DELETE, it's a DML operation, as far as the user
is concerned.


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <stark(at)mit(dot)edu>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 14:25:50
Message-ID: 1361456750.11476.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 2/20/13 11:14 PM, Greg Stark wrote:
>> That's not entirely true. From the database's point of view,
>> TRUNCATE is in many ways actually DDL.
>
> Whether something is DDL or DML or a read operation (query) is
> not an implementation detail, it's a user-exposed category.
> Since TRUNCATE is logically equivalent to DELETE, it's a DML
> operation, as far as the user is concerned.

Not really.  It doesn't follow the same MVCC behavior as DML.  This
is user-visible, documented behavior.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 14:38:57
Message-ID: 1361457537.68228.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
>> The way I was thinking about it, whatever the command is named, you
>> might be able to tell the database to drop the storage associated with
>> the view but that would make the view invalid until it was refreshed.
>> It wouldn't make it appear to be empty.
>
> Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
> results in a table that is perfectly valid, you just deleted all the
> rows that used to be in it.  Throwing away an MV's contents should
> not result in an MV that is considered valid.

It doesn't.  That was one of the more contentious points in the
earlier bikeshedding phases.  Some felt that throwing away the
contents was a form of making the MV "out of date" and as such
didn't require any special handling.  Others, including myself,
felt that "data not present" was a distinct state from "generated
zero rows" and that attempting to scan a materialized view for
which data had not been generated must result in an error.  The
latter property has been maintained from the beginning -- or at
least that has been the intent.

test=# CREATE TABLE t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL);
CREATE TABLE
test=# CREATE MATERIALIZED VIEW tm AS SELECT type, sum(amt) AS totamt FROM t GROUP BY type WITH NO DATA;
SELECT 0
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--------------------------
 f
(1 row)

test=# SELECT * FROM tm;
ERROR:  materialized view "tm" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.
test=# REFRESH MATERIALIZED VIEW tm;
REFRESH MATERIALIZED VIEW
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--------------------------
 t
(1 row)

test=# TRUNCATE MATERIALIZED VIEW tm;
TRUNCATE TABLE
test=# SELECT * FROM tm;
ERROR:  materialized view "tm" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--------------------------
 f
(1 row)

> That being the case, lumping them as being the "same" operation
> feels like the wrong thing, and so we should choose a different
> name for the MV operation.

There is currently no truncation of MV data without rendering the
MV unscannable.  Do you still feel it needs a different command
name?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 14:47:33
Message-ID: 1361458053.91387.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> wrote:

> I actually don't really dislike using "TRUNCATE" for this
> command.  I was more asking about whether this meant people were
> thinking of the view as a thing where you could control the data
> in it by hand and could have the view be "empty" rather than just
> "not valid".

You can either populate the MV in the CREATE command or by REFRESH,
and it will be scannable.  If it is created WITH NO DATA or
TRUNCATEd it is not scannable, generating an error on an attempt to
reference it.

test=# select * from tm;
 type | totamt
------+--------
 y    |     12
 z    |     24
 x    |      5
(3 rows)

test=# truncate tm;
TRUNCATE TABLE
test=# select * from tm;
ERROR:  materialized view "tm" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

> The way I was thinking about it, whatever the command is named,
> you might be able to tell the database to drop the storage
> associated with the view but that would make the view invalid
> until it was refreshed.  It wouldn't make it appear to be empty.

I think we're on the same page after all.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:01:15
Message-ID: 512636BB.8030203@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 21.02.2013 16:38, Kevin Grittner wrote:
> Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Greg Stark<stark(at)mit(dot)edu> writes:
>>> The way I was thinking about it, whatever the command is named, you
>>> might be able to tell the database to drop the storage associated with
>>> the view but that would make the view invalid until it was refreshed.
>>> It wouldn't make it appear to be empty.
>>
>> Actually, that seems like a pretty key point to me. TRUNCATE TABLE
>> results in a table that is perfectly valid, you just deleted all the
>> rows that used to be in it. Throwing away an MV's contents should
>> not result in an MV that is considered valid.
>
> It doesn't. That was one of the more contentious points in the
> earlier bikeshedding phases. Some felt that throwing away the
> contents was a form of making the MV "out of date" and as such
> didn't require any special handling. Others, including myself,
> felt that "data not present" was a distinct state from "generated
> zero rows" and that attempting to scan a materialized view for
> which data had not been generated must result in an error. The
> latter property has been maintained from the beginning -- or at
> least that has been the intent.

Yeah, "data not present" is clearly different from "0 rows". I'm
surprised there isn't an explicit boolean column somewhere for that, but
I guess you can use the size of the heap for that too, as long as you're
careful to not truncate it to 0 blocks when it's empty but scannable.

There's at least one bug left in that area:

postgres=# create table t (id int4);
CREATE TABLE
postgres=# create materialized view tm as select * from t where id <
0;SELECT 0
postgres=# select * from tm;
id
----
(0 rows)

postgres=# create index i_tm on tm(id);CREATE INDEX
postgres=# cluster tm using i_tm;
CLUSTER
postgres=# select * from tm;
ERROR: materialized view "tm" has not been populated
HINT: Use the REFRESH MATERIALIZED VIEW command.

Clustering a materialized view invalidates it.

- Heikki


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:02:53
Message-ID: 1361458973.58816.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

I'll need more time to ponder your other points, but...

> You don't need a new command or key word for this.  How about
> ALTER MATERIALIZED VIEW DISCARD?

I don't like this because we don't have ALTER TABLE REINDEX.  But
the fact that DISCARD is a keyword does open up some interesting
syntax possibilities without adding more keywords.  Maybe:

DISCARD MATERIALIZED VIEW DATA tm;

Or something like that.  Paint buckets are over there by the
bikeshed.  Have at it.  :-)
 
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:10:09
Message-ID: 1361459409.1344.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> giving an error so its an easy to find distinction to a normal
> table seems like a good idea.

I'm not sure I understood your concerns entirely, but wonder
whether this helps?:

test=# \d
              List of relations
 Schema | Name  |       Type        |  Owner
--------+-------+-------------------+---------
 public | bb    | materialized view | kgrittn
 public | t     | table             | kgrittn
 public | tm    | materialized view | kgrittn
 public | tmm   | materialized view | kgrittn
 public | tv    | view              | kgrittn
 public | tvmm  | materialized view | kgrittn
 public | tvv   | view              | kgrittn
 public | tvvm  | materialized view | kgrittn
 public | tvvmv | view              | kgrittn
(9 rows)

test=# truncate table tm;
ERROR:  "tm" is not a table
test=# truncate materialized view t;
ERROR:  "t" is not a materialized view
test=# truncate materialized view tm;
TRUNCATE TABLE
test=# truncate table t;
TRUNCATE TABLE

Well, maybe those command tags could use a tweak.

Then there's this, if you don't specify an object type:

test=# truncate t, tm;
TRUNCATE TABLE

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:10:15
Message-ID: 25497.1361459415@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> That being the case, lumping them as being the "same" operation
>> feels like the wrong thing, and so we should choose a different
>> name for the MV operation.

> There is currently no truncation of MV data without rendering the
> MV unscannable. Do you still feel it needs a different command
> name?

You didn't say anything that changed my opinion: it doesn't feel like
a TRUNCATE to me. It's not changing the object to a different but
entirely valid state, which is what TRUNCATE does.

Peter claimed upthread that REFRESH is a subcommand of ALTER MATERIALIZE
VIEW and that this operation should be another one. That sounds pretty
reasonable from here.

regards, tom lane


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:18:42
Message-ID: 1361459922.34945.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 21.02.2013 16:38, Kevin Grittner wrote:
>> Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us>  wrote:
>>> Greg Stark<stark(at)mit(dot)edu>  writes:
>>>> The way I was thinking about it, whatever the command is named, you
>>>> might be able to tell the database to drop the storage associated with
>>>> the view but that would make the view invalid until it was refreshed.
>>>> It wouldn't make it appear to be empty.
>>>
>>> Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
>>> results in a table that is perfectly valid, you just deleted all the
>>> rows that used to be in it.  Throwing away an MV's contents should
>>> not result in an MV that is considered valid.
>>
>> It doesn't.  That was one of the more contentious points in the
>> earlier bikeshedding phases.  Some felt that throwing away the
>> contents was a form of making the MV "out of date" and as such
>> didn't require any special handling.  Others, including myself,
>> felt that "data not present" was a distinct state from "generated
>> zero rows" and that attempting to scan a materialized view for
>> which data had not been generated must result in an error.  The
>> latter property has been maintained from the beginning -- or at
>> least that has been the intent.
>
> Yeah, "data not present" is clearly different from "0 rows".
> I'm surprised there isn't an explicit boolean column somewhere
> for that,

There was, in earlier versions of the patch: pg_class.relisvald.
The problem is that we needed some way to determine from the heap
that it was invalid to support UNLOGGED MVs.  Several people were
offended by my attempt to use relisvald as the primary indicator
and transfer the information from the heap state to pg_class and
relcache.  There were some pretty big technical challenges to that.
So I caved on that one and went with the pg_relation_is_scannable()
function based on the heap as reported by relcache.  That being one
of the newer parts of the patch, it is probably not as solid as the
parts which haven't changed much in the last three months.

> but I guess you can use the size of the heap for that too, as
> long as you're careful to not truncate it to 0 blocks when it's
> empty but scannable.
>
> There's at least one bug left in that area:
>
> postgres=# create table t (id int4);
> CREATE TABLE
> postgres=# create materialized view tm as select * from t where id < 0;SELECT
> 0
> postgres=# select * from tm;
> id
> ----
> (0 rows)
>
> postgres=# create index i_tm on tm(id);CREATE INDEX
> postgres=# cluster tm using i_tm;
> CLUSTER
> postgres=# select * from tm;
> ERROR:  materialized view "tm" has not been populated
> HINT:  Use the REFRESH MATERIALIZED VIEW command.
>
> Clustering a materialized view invalidates it.

Good spot.  That should be easy enough to fix.

Thanks.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:19:57
Message-ID: 20130221151957.GA23876@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2013-02-21 07:10:09 -0800, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > giving an error so its an easy to find distinction to a normal
> > table seems like a good idea.
>
> I'm not sure I understood your concerns entirely, but wonder
> whether this helps?:

To explain it a bit:

I assume that at some point matviews will get (auto-)updateable, just as
normal views recently got. In that case application programmers might
not be aware anymore that something is a view either because they just
don't know or because a table got converted into a matview after the
code was written.

Because of the potential wish for transparency (with security views as a
potential user) at least normal views might get the capability to be
TRUNCATEd directly, so it might be that matviews do as well.

> test=# \d
>               List of relations
>  Schema | Name  |       Type        |  Owner
> --------+-------+-------------------+---------
>  public | bb    | materialized view | kgrittn
>  public | t     | table             | kgrittn
>  public | tm    | materialized view | kgrittn
>  public | tmm   | materialized view | kgrittn
>  public | tv    | view              | kgrittn
>  public | tvmm  | materialized view | kgrittn
>  public | tvv   | view              | kgrittn
>  public | tvvm  | materialized view | kgrittn
>  public | tvvmv | view              | kgrittn
> (9 rows)
>
> test=# truncate table tm;
> ERROR:  "tm" is not a table
> test=# truncate materialized view t;
> ERROR:  "t" is not a materialized view
> test=# truncate materialized view tm;
> TRUNCATE TABLE
> test=# truncate table t;
> TRUNCATE TABLE

Thats not bad.

But what if we allow TRUNCATE on views someday (possibly only if a
truncate trigger is defined). For consistency we might also want that on
matvies. Having a difference between TRUNCATE view; and TRUNCATE
MATERIALIZED VIEW; in that case sounds ugly to me.

What about DISABLE? DISCARD or DEALLOCATE would also be nice but it
seems hard to fit that into existing syntax.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:26:06
Message-ID: 1361460366.71283.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> That being the case, lumping them as being the "same"
>>> operation feels like the wrong thing, and so we should choose a
>>> different name for the MV operation.
>>
>> There is currently no truncation of MV data without rendering
>> the MV unscannable.  Do you still feel it needs a different
>> command name?
>
> You didn't say anything that changed my opinion: it doesn't feel
> like a TRUNCATE to me.  It's not changing the object to a
> different but entirely valid state, which is what TRUNCATE does.
>
> Peter claimed upthread that REFRESH is a subcommand of ALTER
> MATERIALIZE VIEW

It's not, nor do I think it should be.

> and that this operation should be another one.  That sounds
> pretty reasonable from here.

That feels completely wrong to me.  For one thing, I can't think of
any ALTER commands to populate or remove data.  What did you think
of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
new statment?  Or maybe RESET MATERIALIZED VIEW?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:35:58
Message-ID: 1361460958.27538.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> I assume that at some point matviews will get (auto-)updateable,
> just as normal views recently got.

I'm dubious about that.  Every use case I've seen for MVs involves
aggregation, although they are a generalized feature, so that won't
always be true.  But if you have a view like:

CREATE MATERIALIZED VIEW tm AS
 SELECT t.type,
    sum(t.amt) AS totamt
   FROM t
  GROUP BY t.type;

... I don't see how that can be updateable.  If I add 5 to totamt
for some row, what do you do?  I expect that 99% of MVs will be
updated asynchronously from changes to the underlying data -- what
do you do if someone updates a row that no longer exists in the
underlying data.  This are just seems fraught with peril and out of
sync with the usual uses of MVs.

> What about DISABLE? DISCARD or DEALLOCATE would also be nice but
> it seems hard to fit that into existing syntax.

Thanks for the suggestions.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 15:42:00
Message-ID: 26479.1361461320@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Peter claimed upthread that REFRESH is a subcommand of ALTER
>> MATERIALIZE VIEW

> It's not, nor do I think it should be.

Oh, never mind then.

>> and that this operation should be another one. That sounds
>> pretty reasonable from here.

> That feels completely wrong to me. For one thing, I can't think of
> any ALTER commands to populate or remove data. What did you think
> of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
> new statment? Or maybe RESET MATERIALIZED VIEW?

I could live with either DISCARD or RESET.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 21:48:07
Message-ID: 51269617.7070401@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2/21/13 9:25 AM, Kevin Grittner wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On 2/20/13 11:14 PM, Greg Stark wrote:
>>> That's not entirely true. From the database's point of view,
>>> TRUNCATE is in many ways actually DDL.
>>
>> Whether something is DDL or DML or a read operation (query) is
>> not an implementation detail, it's a user-exposed category.
>> Since TRUNCATE is logically equivalent to DELETE, it's a DML
>> operation, as far as the user is concerned.
>
> Not really. It doesn't follow the same MVCC behavior as DML. This
> is user-visible, documented behavior.

MVCC behavior does not determine whether something is considered DDL or DML.


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 22:11:10
Message-ID: 1361484670.6474.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:

>> What did you think of the idea of something like DISCARD
>> MATERIALIZED VIEW DATA as a new statment?  Or maybe RESET
>> MATERIALIZED VIEW?
>
> I could live with either DISCARD or RESET.

I figured this was worth a pass through the keyword list to look
for all imperative verbs suitable for this, which could support the
needed syntax without adding a new keyword.  Here are the
possibilities I came up with, along with a note about why they are
keywords already.

DISABLE MATERIALIZED VIEW mv;  -- ALTER clause for constraints
DISCARD MATERIALIZED VIEW DATA mv;  -- session state
RELEASE MATERIALIZED VIEW DATA mv;  -- savepoint
RESET MATERIALIZED VIEW DATA mv;  -- run-time parameter

I think any of these could work.  I'm personally most inclined
toward DISABLE MATERIALIZED VIEW.  It seems to convey the semantics
better, especially if you leave out DATA as an additonal word.
Given that a materialized view will retain its query, tablespace,
indexes, statistics targets, etc. with this operation, and will
just not be available for scanning, some of the above seem
downright misleading without DATA thrown in.

Opinions?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <stark(at)mit(dot)edu>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 22:16:46
Message-ID: CAM-w4HPnJGYNDTuhQ_4wr_EZ+CkfKsX5WR+-+B+gngczbUizpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Feb 21, 2013 at 2:38 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> It doesn't. That was one of the more contentious points in the
> earlier bikeshedding phases. Some felt that throwing away the
> contents was a form of making the MV "out of date" and as such
> didn't require any special handling. Others, including myself,
> felt that "data not present" was a distinct state from "generated
> zero rows" and that attempting to scan a materialized view for
> which data had not been generated must result in an error. The
> latter property has been maintained from the beginning -- or at
> least that has been the intent.

Actually this sounds like exactly what I was saying. I withdraw my
concern entirely.

--
greg


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 22:46:40
Message-ID: 1361486800.70889.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> It doesn't.  That was one of the more contentious points in the
>> earlier bikeshedding phases.  Some felt that throwing away the
>> contents was a form of making the MV "out of date" and as such
>> didn't require any special handling.  Others, including myself,
>> felt that "data not present" was a distinct state from
>> "generated zero rows" and that attempting to scan a materialized
>> view for which data had not been generated must result in an
>> error.  The latter property has been maintained from the
>> beginning -- or at least that has been the intent.
>
> Actually this sounds like exactly what I was saying. I withdraw
> my concern entirely.

Reviewing your concerns and discussions of "freshness" in general
got me thinking -- while it is clear that not having generated
values in the MV based on its query clearly should make the view
non-scannable, and will be the only criterion for that in this
initial patch; later versions will almost certainly support
age-based conditions for whether the MV is scannable.  So in the
next release the MV may become non-scannable based on the passage
of time since DML was run on a source table without the MV having
been refreshed or incrementally updated to reflect that DML.  Which
makes me wonder why DML making the MV non-scannable is such a bad
thing in the case of TRUNCATE.  Granted there is a difference in
that it is run *on the MV* rather than *on a source relation*; but
still, I'm having some second thoughts about that being a problem.

The problem with TRUNCATE MATERIALIZED VIEW from a logical
perspective doesn't seem to me so much that it makes the MV
non-scannable, as that it is the only DML which would be allowed
directly on an MV -- which is kind of a weird exception.  It is
pretty much a given that when we can get to implementing it, DML
statements will render MVs unscannable under various conditions.
Josh Berkus and Greg Stark have been the most explicit about that,
but I think most of us who are interested in the feature take it as
a given.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Materialized views WIP patch
Date: 2013-02-21 23:32:08
Message-ID: 20130221233208.GA19114@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>
> >> What did you think of the idea of something like DISCARD
> >> MATERIALIZED VIEW DATA as a new statment?  Or maybe RESET
> >> MATERIALIZED VIEW?
> >
> > I could live with either DISCARD or RESET.
>
> I figured this was worth a pass through the keyword list to look
> for all imperative verbs suitable for this, which could support the
> needed syntax without adding a new keyword.  Here are the
> possibilities I came up with, along with a note about why they are
> keywords already.
>
> DISABLE MATERIALIZED VIEW mv;  -- ALTER clause for constraints
> DISCARD MATERIALIZED VIEW DATA mv;  -- session state
> RELEASE MATERIALIZED VIEW DATA mv;  -- savepoint
> RESET MATERIALIZED VIEW DATA mv;  -- run-time parameter
>
> I think any of these could work.  I'm personally most inclined
> toward DISABLE MATERIALIZED VIEW.  It seems to convey the semantics
> better, especially if you leave out DATA as an additonal word.

> Given that a materialized view will retain its query, tablespace,
> indexes, statistics targets, etc. with this operation, and will
> just not be available for scanning, some of the above seem
> downright misleading without DATA thrown in.

I vote for RESET or DISCARD. DISABLE sounds more like you disable
automatic refreshes or somesuch.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Materialized views WIP patch
Date: 2013-02-22 08:10:12
Message-ID: 18535.1361520612@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote:
>> DISABLE MATERIALIZED VIEW mv; -- ALTER clause for constraints
>> DISCARD MATERIALIZED VIEW DATA mv; -- session state
>> RELEASE MATERIALIZED VIEW DATA mv; -- savepoint
>> RESET MATERIALIZED VIEW DATA mv; -- run-time parameter
>>
>> I think any of these could work. I'm personally most inclined
>> toward DISABLE MATERIALIZED VIEW. It seems to convey the semantics
>> better, especially if you leave out DATA as an additonal word.

> I vote for RESET or DISCARD. DISABLE sounds more like you disable
> automatic refreshes or somesuch.

Yeah, I don't much like DISABLE either. I'm also concerned about
overloading RESET this way --- that statement has complicated-enough
syntax already, not to mention way too many shades of meaning. So that
leaves me voting for DISCARD M.V. DATA, which seems pretty precise.
It's a bit verbose, but since when has SQL been succinct?

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-23 01:00:02
Message-ID: 51281492.6060501@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


> That feels completely wrong to me. For one thing, I can't think of
> any ALTER commands to populate or remove data. What did you think
> of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
> new statment? Or maybe RESET MATERIALIZED VIEW?

I prefer RESET, especially since it could eventually support RESET ALL
MATERIALIZED VIEWS if that turns out to be useful. How does the parser
like that?

BTW, to contradict Peter E., for my part I would NOT want matview resets
to be logged as DDL. I would only want matview definitition changes to
be so logged.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Greg Stark <stark(at)mit(dot)edu>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-23 12:55:29
Message-ID: CAM-w4HPZiQbqyxeD+y6X1HKuPgbwAX3QOT_gk2e2uxmyiEe2hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I prefer RESET, especially since it could eventually support RESET ALL
> MATERIALIZED VIEWS if that turns out to be useful. How does the parser
> like that?

Isn't reset currently only used for GUCs? I think that makes for a
strange crossover.

Really, I'm sorry for bringing this up. I really don't think
"truncate" is a bad way to spell it.

--
greg


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-23 13:00:14
Message-ID: CAB7nPqQyhwMHeDExqKy03N7=XbGEcS0GQrdZPtNkeDPSje4PPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sat, Feb 23, 2013 at 9:55 PM, Greg Stark <stark(at)mit(dot)edu> wrote:

> On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> > I prefer RESET, especially since it could eventually support RESET ALL
> > MATERIALIZED VIEWS if that turns out to be useful. How does the parser
> > like that?
>
> Isn't reset currently only used for GUCs? I think that makes for a
> strange crossover.
>
it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
DISCARD would be better.
My 2c.
--
Michael


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Kevin Grittner" <kgrittn(at)ymail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-25 22:07:27
Message-ID: 18f4b33d7cd8d5d914e3f007515dcbc5.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

2013-02-19 Kevin Grittner wrote:
[matview-system_views-v2.diff]

I assumed the patches matview-v4.patch and matview-system_views-v2.diff
were to be applied together.

They do apply correctly but during tests, the "test rules ... FAILED".

Perhaps it is solved already but I thought I'd mention it in case it is overlooked.

Thanks,

Erik Rijkers


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-28 14:55:18
Message-ID: CA+Tgmoab4xcBnsVmxj3_hEiKj7FqesX6ET_nOZgvpUgF+_OKEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
> DISCARD would be better.

Well, personally, I'm in favor of either TRUNCATE or ALTER
MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to
suppose that we're going to start using DISCARD for things that have
nothing to do with the existing meanings of DISCARD. Number one, I
think it's confusing. Number two, it's currently possible to
determine whether something is DDL, DML, or other by looking at the
first word of the command. If we throw that out the window we may
cause performance issues for connection pooling software that tries to
be clever like that. Mind you, I'm not aware of any connection
pooling software that actually does this today, because I think pgpool
includes a full parser and pgbouncer includes no parser at all, but it
still seems like a possibly-useful trick. And number three, the
possibility of grammar conflicts with things we might want to do in
the future seems unnecessarily high. If we use ALTER MATERIALIZED
VIEW ... WHATEVER, we only have to worry about grammar conflicts with
other ALTER MATERIALIZED VIEW commands; if we reuse DISCARD or RESET,
we've got potential conflicts with completely unrelated syntax. That
consideration alone would be sufficient reason for me to choose to
stick everything under ALTER MATERIALIZED VIEW, no matter how poor a
semantic fit it seems otherwise. Of course, if we stick with
TRUNCATE, this becomes a non-issue.

All that having been said, I'm not in favor of pushing this patch out
to 9.4 because we can't agree on minor syntax details. In the absence
of consensus, my feeling is that Kevin should exercise committer's
prerogative and commit this in the way that seems best to him. If a
clear contrary consensus subsequently emerges, we can always change
it. It is not as if the particular choice of SQL syntax is hard to
whack around. Once we release it we're stuck with it, but certainly
between now and beta we can change it whenever we like. I'd rather
have the core part of the feature committed and tinker with the syntax
than wait longer to land the patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-28 16:00:17
Message-ID: 6175.1362067217@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
>> DISCARD would be better.

> Well, personally, I'm in favor of either TRUNCATE or ALTER
> MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to
> suppose that we're going to start using DISCARD for things that have
> nothing to do with the existing meanings of DISCARD.

Yeah, there's actually a serious problem with choosing DISCARD:
surely we are not going to include "trash all MVs" in the behavior
of DISCARD ALL. So unless you would like to say that DISCARD ALL
doesn't mean what it appears to mean, we can't make MV reset be
one of the sub-flavors of DISCARD.

So that seems to leave us with either TRUNCATE or an ALTER sub-syntax.
Personally I'd prefer the latter but it's surely debatable.

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-28 16:08:50
Message-ID: 512F8112.3000607@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 28.02.2013 16:55, Robert Haas wrote:
> On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
>> DISCARD would be better.
>
> Well, personally, I'm in favor of either TRUNCATE or ALTER
> MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to
> suppose that we're going to start using DISCARD for things that have
> nothing to do with the existing meanings of DISCARD. Number one, I
> think it's confusing. Number two, it's currently possible to
> determine whether something is DDL, DML, or other by looking at the
> first word of the command. If we throw that out the window we may
> cause performance issues for connection pooling software that tries to
> be clever like that.

FWIW, I totally agree with that. From that point of view, the best thing
would be to tack this onto the REFRESH command, perhaps something like:

REFRESH matview INVALIDATE;
REFRESH matview UNREFRESH;
REFRESH matview DISCARD;

It's a bit weird that the command is called REFRESH, if the effect is
the exact opposite of refreshing it. And we usually do have two separate
commands for doing something and undoing the same; CREATE - DROP,
PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and so forth.

I think we're being too hung up on avoiding new (unreserved) keywords.
Yes, the grammar is large because of so many keywords, but surely
there's some better solution to that than adopt syntax that sucks. Let's
invent a new keyword (INVALIDATE? UNREFRESH?), and deal with the grammar
bloat separately.

- Heikki


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-02-28 16:54:38
Message-ID: CA+TgmoaNEZuXf54xKnbjJdjwg0TWNfhyhedo_CWMuR+Px4EvPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Feb 28, 2013 at 11:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, personally, I'm in favor of either TRUNCATE or ALTER
>> MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to
>> suppose that we're going to start using DISCARD for things that have
>> nothing to do with the existing meanings of DISCARD.
>
> Yeah, there's actually a serious problem with choosing DISCARD:
> surely we are not going to include "trash all MVs" in the behavior
> of DISCARD ALL. So unless you would like to say that DISCARD ALL
> doesn't mean what it appears to mean, we can't make MV reset be
> one of the sub-flavors of DISCARD.

Good point.

> So that seems to leave us with either TRUNCATE or an ALTER sub-syntax.
> Personally I'd prefer the latter but it's surely debatable.

I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-28 17:52:58
Message-ID: 1362073978.4025.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 28.02.2013 16:55, Robert Haas wrote:

>> Well, personally, I'm in favor of either TRUNCATE or ALTER
>> MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous
>> precedent to suppose that we're going to start using DISCARD for
>> things that have nothing to do with the existing meanings of
>> DISCARD.  Number one, I think it's confusing.  Number two, it's
>> currently possible to determine whether something is DDL, DML,
>> or other by looking at the first word of the command.  If we
>> throw that out the window we may cause performance issues for
>> connection pooling software that tries to be clever like that.
>
> FWIW, I totally agree with that. From that point of view, the
> best thing would be to tack this onto the REFRESH command,
> perhaps something like:
>
> REFRESH matview INVALIDATE;
> REFRESH matview UNREFRESH;
> REFRESH matview DISCARD;
>
> It's a bit weird that the command is called REFRESH, if the
> effect is the exact opposite of refreshing it. And we usually do
> have two separate commands for doing something and undoing the
> same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and
> so forth.
>
> I think we're being too hung up on avoiding new (unreserved)
> keywords. Yes, the grammar is large because of so many keywords,
> but surely there's some better solution to that than adopt syntax
> that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?),
> and deal with the grammar bloat separately.

I'm OK with any grammar that we can reach consensus on, but that
seems elusive and I don't want to hold up getting the meat of the
patch committed while we haggle out this syntax detail.  Votes have
shifted frequently, but as I make out the latest opinion of people
making a statement that looks like an explicit vote, dividing the
value of a split vote between the choices, I get:

Kevin Grittner: TRUNCATE
Stephen Frost: TRUNCATE
Peter Eisentraut: ALTER
Andres Freund: RESET or DISCARD
Josh Berkus: RESET
Greg Stark: TRUNCATE
Michael Paquier: DISCARD
Robert Haas: TRUNCATE or ALTER
Tom Lane: TRUNCATE or ALTER
Heikki Linnakangas: REFRESH?

Vote totals:

TRUNCATE:  4.0
ALTER:     2.0
DISCARD:   1.5
RESET:     1.5
REFRESH    1.0?

Barring a sudden confluence of opinion, I will go with TRUNCATE for
the initial spelling.  I tend to favor that spelling for several
reasons.  One was the size of the patch needed to add the opposite
of REFRESH to the backend code:

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..eb7a14f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1234,11 +1234,12 @@ truncate_check_rel(Relation rel)
 {
    AclResult   aclresult;
 
-   /* Only allow truncate on regular tables */
-   if (rel->rd_rel->relkind != RELKIND_RELATION)
+   /* Only allow truncate on regular tables and materialized views. */
+   if (rel->rd_rel->relkind != RELKIND_RELATION &&
+       rel->rd_rel->relkind != RELKIND_MATVIEW)
        ereport(ERROR,
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                errmsg("\"%s\" is not a table",
+                errmsg("\"%s\" is not a table or materialized view",
                        RelationGetRelationName(rel))));
 
    /* Permissions checks */

That's it.  That takes it from "no way to release the space held by
the current contents of a materialized view and render it
unscannable until its rule's query is used to populate it again" to
"working".  Now, there are docs and psql support needed on top of
that, and the regression tests use the verb, and bikeshedding led
to a minor tweak of TRUNCATE so that you could specify TRUNCATE
MATERIALIZED VIEW -- but to get working backend code, the above is
sufficient.  That strikes me a prima facie evidence that it's not a
horribly off-target verb.

In terms of things to consider in choosing a verb are that, while
complete absence of data in the MV's backing table is the only
thing which will render it unscannable in this initial version,
there is clear demand to eventually track information on how
up-to-date data is, and treat the MV as unscannable for less
extreme conditions, such as the asynchronous application of changes
from backing tables having fallen behind some configurable
threshold.  In essence, people want the error instead of scanning
the relation if the generated data is not within a range of time
that is "fresh" enough, and truncating the backing relation (or
creating the MV WITH NO DATA) is a special case of that since the
data is not representing the results of the MV's query as of *any*
known point in time.  (As previously discussed, that is distinct
from an empty relation which is known to represent a fresh view of
the results of that query.)

If we pick a new pair of verbs, the connotations I think we should
be looking for include the ability to deal with at least these:

* Make the MV represent the result set of its query as of this
moment.

* Declare the MV to be too stale for the contents of its backing
table to be useful, and free the space held by the current
contents.

Other operations will be needed, for which syntax is not settled.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Ants Aasma <ants(at)cybertec(dot)at>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-01 13:56:42
Message-ID: CA+CSw_sZfuDGEZZS+sChDq2NG4X_C5Bx1NNh43dsspeD580tOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Feb 28, 2013 at 7:52 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Barring a sudden confluence of opinion, I will go with TRUNCATE for
> the initial spelling. I tend to favor that spelling for several
> reasons. One was the size of the patch needed to add the opposite
> of REFRESH to the backend code:

FWIW, I found Andres's point about closing the door on updatable views
quite convincing. If at any point we want to add updatable
materialized views, it seems like a bad inconsistency to have TRUNCATE
mean something completely different from DELETE. While update
capability for materialized views might not be a common use case, I
don't think it's fair to completely shut the door on it to have easier
implementation and shorter syntax. Especially as the shorter syntax is
semantically inconsistent - normal truncate removes the data,
materialized view just makes the data inaccessible until the next
refresh.

Sorry for weighing in late, but it seemed to me that this point didn't
get enough consideration.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-01 14:18:33
Message-ID: 1362147513.84526.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Ants Aasma <ants(at)cybertec(dot)at> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> Barring a sudden confluence of opinion, I will go with TRUNCATE
>> for the initial spelling.  I tend to favor that spelling for
>> several reasons.  One was the size of the patch needed to add
>> the opposite of REFRESH to the backend code:
>
> FWIW, I found Andres's point about closing the door on updatable
> views quite convincing. If at any point we want to add updatable
> materialized views, it seems like a bad inconsistency to have
> TRUNCATE mean something completely different from DELETE. While
> update capability for materialized views might not be a common
> use case, I don't think it's fair to completely shut the door on
> it to have easier implementation and shorter syntax. Especially
> as the shorter syntax is semantically inconsistent - normal
> truncate removes the data, materialized view just makes the data
> inaccessible until the next refresh.
>
> Sorry for weighing in late, but it seemed to me that this point
> didn't get enough consideration.

Personally, I don't understand why anyone would want updateable
materialized views.  That's probably because 99% of the cases where
I've seen that someone wanted them, they wanted them updated to
match the underlying data using some technique that didn't require
the modification or commit of the underlying data to carry the
overhead of maintaining the MV.  In other words, they do not want
the MV to be up-to-date for performance reasons.  That is a big
part of the reason for *wanting* to use an MV.  How do you make an
asynchronously-maintained view updateable?

In addtion, at least 80% of the cases I've seen where people want
an MV it is summary information, which does not tie a single MV row
to a single underlying row.  If someone updates an aggregate number
like an average, I see no reasonable way to map that to the
underlying data in a meaningful way.

I see the contract of a materialized view as providing a
table-backed relation which shows the result set of a query as of
some point in time.  Perhaps it is a failure of imagination, but I
don't see where modifying that relation directly is compatible with
that contract.

Can you describe a meaningful use cases for an udpateable
materialized view?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Ants Aasma <ants(at)cybertec(dot)at>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-01 15:01:27
Message-ID: CA+CSw_v8ZXKADJsjKkq8eVa12CH5hB7cgk3WimsbFiVLEdnTxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Fri, Mar 1, 2013 at 4:18 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Personally, I don't understand why anyone would want updateable
> materialized views. That's probably because 99% of the cases where
> I've seen that someone wanted them, they wanted them updated to
> match the underlying data using some technique that didn't require
> the modification or commit of the underlying data to carry the
> overhead of maintaining the MV. In other words, they do not want
> the MV to be up-to-date for performance reasons. That is a big
> part of the reason for *wanting* to use an MV. How do you make an
> asynchronously-maintained view updateable?
>
> In addtion, at least 80% of the cases I've seen where people want
> an MV it is summary information, which does not tie a single MV row
> to a single underlying row. If someone updates an aggregate number
> like an average, I see no reasonable way to map that to the
> underlying data in a meaningful way.
>
> I see the contract of a materialized view as providing a
> table-backed relation which shows the result set of a query as of
> some point in time. Perhaps it is a failure of imagination, but I
> don't see where modifying that relation directly is compatible with
> that contract.
>
> Can you describe a meaningful use cases for an udpateable
> materialized view?

I actually agree that overwhelming majority of users don't need or
want updateable materialized views. My point was that we can't at this
point rule out that people will think of a good use for this. I don't
have any real use cases for this, but I can imagine a few situations
where updateable materialized views wouldn't be nonsensical.

One case would be if the underlying data is bulkloaded and is
subsetted into smaller materialized views for processing using
off-the-shelf tools that expect tables. One might want to propagate
changes from those applications to the base data.

The other case would be the theoretical future where materialized
views can be incrementally and transactionally maintained, in that
case being able to express modifications on the views could actually
make sense.

I understand that the examples are completely hypothetical and could
be solved by using regular tables. I just have feeling that will
regret conflating TRUNCATE semantics for slight implementation and
notation convenience. To give another example of potential future
update semantics, if we were to allow users manually maintaining
materialized view contents using DML commands, one would expect
TRUNCATE to mean "make this matview empty", not "make this matview
unavailable".

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Greg Stark <stark(at)mit(dot)edu>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-02 14:36:28
Message-ID: CAM-w4HPF1RNcHpsCkTgg0uXiSs8eOAcky_UA8UYsyNvTvFjHqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Fri, Mar 1, 2013 at 3:01 PM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> . To give another example of potential future
> update semantics, if we were to allow users manually maintaining
> materialized view contents using DML commands, one would expect
> TRUNCATE to mean "make this matview empty", not "make this matview
> unavailable".

Wouldn't that just be a regular table then though? How is that a
materialized view?

If anything someone might expect truncate to delete any rows from the
source table that appear in the view. But I think it's likely that
even if materialized views were updateable truncate wouldn't be one of
the updateable operations.

--
greg


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-02 15:06:18
Message-ID: 1362236778.96330.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> wrote:
> Ants Aasma <ants(at)cybertec(dot)at> wrote:

>> To give another example of potential future update semantics, if
>> we were to allow users manually maintaining materialized view
>> contents using DML commands, one would expect TRUNCATE to mean
>> "make this matview empty", not "make this matview unavailable".
>
> Wouldn't that just be a regular table then though? How is that a
> materialized view?
>
> If anything someone might expect truncate to delete any rows from
> the source table that appear in the view. But I think it's likely
> that even if materialized views were updateable truncate wouldn't
> be one of the updateable operations.

Yeah, the only way it would make sense to truncate MV contents from
a user-written maintenance trigger (assuming we might have such a
thing some day) would be if you decided that the change to the
underlying data was so severe that you effectively needed to
REFRESH, and there would probably be a better way to go about
dealing with that.

Two other reasons that this might not be a problem are:

(1)  Any DML against the MV would need to be limited to some
context fired by the underlying changes.  If we allow changes to
the MV outside of that without it being part of some "updateable
MV" feature (reversing the direction of flow of changes), the MV
could not be trusted at all.  If you're going to do that, just use
a table.

(2)  I can think of a couple not-too-horrible syntax tricks we
could use to escape from the corner we're worried about painting
ourselves into.

All of that said, some combination of Heikki's previous suggestion
that maybe REFRESH could be used and my noticing that both TRUNCATE
and REFRESH create a new heap for an MV, it's just a question of
whether we then run the MV's query to fill it with data, led to
this thought:

REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

This sort of mirrors the CREATE MATERIALIZED VIEW style (which was
based on CREATE TABLE AS) and WITH NO DATA puts the MV into the
unscannable state either way.  I can change the parser to make this
literally just the new spelling of TRUNCATE MATERIALIZED VIEW
without dashing my hopes of pushing the patch tomorrow.  (The patch
has been ready to go for weeks other than this syntax issue and
documentation which needs to refer to whatever syntax is chosen.)

Barring objections, I will use the above and push tomorrow.

The only issues which have been raised which will not be addressed
at that point are:

(1)  The suggestion that ALTER MATERIALIZED VIEW name ALTER column
support changing the collation isn't something I can see how to do
without complication and risk beyond what is appropriate at this
point in the release cycle.  It will be left off, at least for now.
 To get a new collation, you will need to drop the MV and re-create
it with a query which specifies the collation for the result
column.

(2)  The sepgsql changes are still waiting for a decision from
security focused folks.  I have two patches for that contrib module
ready based on my best reading of things -- one which uses table
security labels and one which instead uses new matview labels.
When we get a call on which is preferred, I suspect that one patch
or the other will be good as-is or with minimal change.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 08:48:03
Message-ID: CAEZATCXqytH9iLzq8JL8xoCDocFQHCrA577J8aWxdEqNm+YzCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2 March 2013 15:06, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> [ ... ] led to
> this thought:
>
> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>

[Sorry to join this discussion so late]

FWIW I had a quick look at other DBs to see if there were any other
precedents out there. Oracle was the only one I could find with
anything similar. They use the same creation syntax:

CREATE MATERIALIZED VIEW name [options] AS SELECT ...

and they use ALTER for everything else, such as refreshing the MV:

ALTER MATERIALIZED VIEW name REFRESH [options];

AFAICT the nearest thing they have to TRUNCATE/DISCARD is:

ALTER MATERIALIZED VIEW name CONSIDER FRESH;

They do also support updateable materialized views with standard DML,
but it doesn't look as though they allow TRUNCATE to operate directly
on a MV (although it can be made to propagate from a base table to a
MV, in which case allowing TRUNCATE on the MV itself with a different
meaning would likely be confusing).

Oracle's MVs have lots of options, all of which hang off the 2 basic
CREATE and ALTER commands. There's a certain appeal to that, rather
than inventing or overloading a bunch of other commands as more
options get added. The proposed REFRESH command is OK for today's
options, but I think it might be overly limiting in the future.

Of course, since this isn't in the SQL standard, we are free to use
any syntax we like. We don't have to follow Oracle, but having a
common syntax might make some people's lives easier, and I haven't
seen a convincing argument as to why any alternative syntax is better.

Regards,
Dean


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 13:12:12
Message-ID: 1362316332.75385.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

>> [ ... ] led to this thought:
>>
>> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>
> [Sorry to join this discussion so late]
>
> FWIW I had a quick look at other DBs to see if there were any
> other precedents out there. Oracle was the only one I could find
> with anything similar. They use the same creation syntax:
>
>   CREATE MATERIALIZED VIEW name [options] AS SELECT ...

It is a pretty obvious choice when you look at other SQL
statements.

> and they use ALTER for everything else, such as refreshing the
> MV:
>
>   ALTER MATERIALIZED VIEW name REFRESH [options];

No, that is for specifiying when and under what conditions an
automatic refresh is done.  To do an immediate action which is
equivalent to what I have for the REFRESH statement, they use a
REFRESH() function.  That seemed too incompatible with how we've
done everything else in PostgreSQL -- I felt that a statement would
make more sense.  Consider REINDEX, CLUSTER, and VACUUM FULL for
example.

> AFAICT the nearest thing they have to TRUNCATE/DISCARD is:
>
>   ALTER MATERIALIZED VIEW name CONSIDER FRESH;

No, that doesn't rebuild or discard data -- if the MV is
out-of-date and therefore unscannable according to the how the MV
has been set up, this overrides that indication and allows scanning
in spite of that.

> They do also support updateable materialized views with standard
> DML, but it doesn't look as though they allow TRUNCATE to operate
> directly on a MV (although it can be made to propagate from a
> base table to a MV, in which case allowing TRUNCATE on the MV
> itself with a different meaning would likely be confusing).

They allow DML on the MV in order to update it.  The default
REFRESH() function executes a TRUNCATE statement followed by INSERT
/ SELECT using the MV's query.

> Oracle's MVs have lots of options, all of which hang off the 2
> basic CREATE and ALTER commands. There's a certain appeal to
> that, rather than inventing or overloading a bunch of other
> commands as more options get added. The proposed REFRESH command
> is OK for today's options, but I think it might be overly
> limiting in the future.

For what ALTER MATERIALIZED VIEW in Oracle does, I think it makes
sense to use ALTER.  I don't think this feature should use
functions for REFRESH.  Why Oracle can get away with functions for
it is that they allow DML on an MV, which seems to me to compromise
the integrity of the feature, at least as default behavior.

I see us supporting automatic incremental updates of progressively
more complex queries, and we may at some point want to add a
trigger-based maintenance option; but the functionality available
with a trigger-based approach is almost entirely availaable in
PostgreSQL today without this feature.  Rewriting queries using
expressions which match the MV's query to pull from the MV instead
of the underlying tables is the exception.  While that is a "sexy"
feature, and I'm sure one can construct examples where it helps
performance, it seems to me unlikely to be very generally useful.
I suspect that it exists mostly so that people who want to write an
RFP to pick a particular product can include that as a requirement.
 In other words, I think the main benefit of automatic rewrite
using an MV is marketing, not technical or performance.  That's
important, too; but let's focus first on getting what is most
useful.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 14:33:08
Message-ID: 1362321188.12641.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

Given the short time, I left out the "[, ...]".  If people think
that is important to get into this release, a follow-on patch might
be possible.

> Barring objections, I will use the above and push tomorrow.

I'm still working on docs, and the changes related to the syntax
change are still only lightly tested, but as far as I know, all is
complete except for the docs.  I'm still working on those and
expect to have them completed late today.  I'm posting this patch
to allow a chance for final review of the code changes before I
push.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
matview-v5.patch.gz application/x-gzip 56.3 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 17:06:36
Message-ID: CAEZATCWcV8WeZ5OHYcg02kH1_EGQf3+1y9LnxFy7=axehT43Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 3 March 2013 13:12, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
>>> [ ... ] led to this thought:
>>>
>>> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>>
>> [Sorry to join this discussion so late]
>>
>> FWIW I had a quick look at other DBs to see if there were any
>> other precedents out there. Oracle was the only one I could find
>> with anything similar. They use the same creation syntax:
>>
>> CREATE MATERIALIZED VIEW name [options] AS SELECT ...
>
> It is a pretty obvious choice when you look at other SQL
> statements.
>
>> and they use ALTER for everything else, such as refreshing the
>> MV:
>>
>> ALTER MATERIALIZED VIEW name REFRESH [options];
>
> No, that is for specifiying when and under what conditions an
> automatic refresh is done. To do an immediate action which is
> equivalent to what I have for the REFRESH statement, they use a
> REFRESH() function. That seemed too incompatible with how we've
> done everything else in PostgreSQL -- I felt that a statement would
> make more sense. Consider REINDEX, CLUSTER, and VACUUM FULL for
> example.
>
>> AFAICT the nearest thing they have to TRUNCATE/DISCARD is:
>>
>> ALTER MATERIALIZED VIEW name CONSIDER FRESH;
>
> No, that doesn't rebuild or discard data -- if the MV is
> out-of-date and therefore unscannable according to the how the MV
> has been set up, this overrides that indication and allows scanning
> in spite of that.
>

Ah, OK I see.

I misunderstood what the Oracle docs were saying. ALTER only changes
the MV's definition, whereas their REFRESH() function and your REFRESH
statement updates the data in the MV. That makes much more sense.

Regards,
Dean


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 20:13:31
Message-ID: CAP-rdTYYR4QA9AotruR1-WUZVj3FvocvpDdUpHS=Nka=sXUZtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

2013/3/3 Kevin Grittner <kgrittn(at)ymail(dot)com>:

> Rewriting queries using
> expressions which match the MV's query to pull from the MV instead
> of the underlying tables is the exception. While that is a "sexy"
> feature, and I'm sure one can construct examples where it helps
> performance, it seems to me unlikely to be very generally useful.
> I suspect that it exists mostly so that people who want to write an
> RFP to pick a particular product can include that as a requirement.
> In other words, I think the main benefit of automatic rewrite
> using an MV is marketing, not technical or performance.

I think that automatically using materialized views even when the
query doesn’t mention them directly, is akin to automatically using
indexes without having to mention them in the query. That way, queries
can be written the natural way, and “creating materialized views” is
an optimization that can be applied by a DBA without having to update
the application queries to use them.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 21:06:56
Message-ID: 1362344816.73277.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:
> 2013/3/3 Kevin Grittner <kgrittn(at)ymail(dot)com>:

>> Rewriting queries using expressions which match the MV's query
>> to pull from the MV instead of the underlying tables is the
>> exception.  While that is a "sexy" feature, and I'm sure one can
>> construct examples where it helps performance, it seems to me
>> unlikely to be very generally useful.  I suspect that it exists
>> mostly so that people who want to write an RFP to pick a
>> particular product can include that as a requirement.  In other
>> words, I think the main benefit of automatic rewrite using an MV
>> is marketing, not technical or performance.
>
> I think that automatically using materialized views even when the
> query doesn’t mention them directly, is akin to automatically
> using indexes without having to mention them in the query. That
> way, queries can be written the natural way, and “creating
> materialized views” is an optimization that can be applied by a
> DBA without having to update the application queries to use them.

Oh, I understand that concept perfectly well, I just wonder how
often it is useful in practice.  The cost of planning with indexes
tends to go up dramatically the planner needs to evaluate all
possible combinations of access paths.  We've devoted quite a bit
of energy keeping that from being something like the factorial of
the number of indexes.  If you now need to find all materialized
views which could substitute for parts of a query, and look at all
permutations of how those could be used, and which indexes can be
used for each of those combinations, you have planning time which
can explode to extreme levels.

Where the number of database objects are small and their sizes are
large (like some data warehouse situations), you could come out
ahead; and if I wanted to showcase the capability you describe
that's what I would use.  With a large number of database objects
with only a few tens of millions of rows per table, I doubt you
will come out ahead.

Granted, you could say the same thing about indexes, and they are
very often useful.  I'm saying that I expect the usefulness of the
technique you describe is generally very low, but not zero.  Except
for marketing, where it's a flashy feature.  I would be interested
in seeing information to show where it works well, though.  There
is probably something to be learned by looking at the details of
the environment and workload of such a site.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <stark(at)mit(dot)edu>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 22:47:50
Message-ID: CAM-w4HOJXy=kh8O=_t7BE9Ea_6Oj6bH2x_L74eV=GcJcJDL0Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sat, Mar 2, 2013 at 3:06 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
> (1) Any DML against the MV would need to be limited to some
> context fired by the underlying changes. If we allow changes to
> the MV outside of that without it being part of some "updateable
> MV" feature (reversing the direction of flow of changes), the MV
> could not be trusted at all. If you're going to do that, just use
> a table.

Oh! I misunderstood what you were suggesting. I think we were talking
at cross-purposes.

You're imagining a user issues truncate against the underlying
table(s) and the code that handles updating the materialized view will
need to issue a truncate against the MV to update it.

I was imagining that you wanted to be able to issue DML against the MV
just as one can against an updateable view. That DML should propagate
to the underlying table(s) through various magic.

It's a pretty theoretical fear now but one day it may be important to
avoid confusion between these two.

--
greg


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 23:01:18
Message-ID: CAP-rdTZaD=qP1T0uw=S6BgG-CwTBNc5bFid7mt30x+Q0uzvf5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

2013/3/3 Kevin Grittner <kgrittn(at)ymail(dot)com>:

> Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>
>> I think that automatically using materialized views even when the
>> query doesn’t mention them directly, is akin to automatically
>> using indexes without having to mention them in the query. That
>> way, queries can be written the natural way, and “creating
>> materialized views” is an optimization that can be applied by a
>> DBA without having to update the application queries to use them.
>
> Oh, I understand that concept perfectly well, I just wonder how
> often it is useful in practice. The cost of planning with indexes
> tends to go up dramatically the planner needs to evaluate all
> possible combinations of access paths. We've devoted quite a bit
> of energy keeping that from being something like the factorial of
> the number of indexes. If you now need to find all materialized
> views which could substitute for parts of a query, and look at all
> permutations of how those could be used, and which indexes can be
> used for each of those combinations, you have planning time which
> can explode to extreme levels.

I guess that a basic version of such a feature would do something like this:

(1) Check for each matview whether it simply consists of an optional
bunch of joins + optional aggregation + optional general filter
conditions (to support something akin to a partial index). If not, the
optimization doesn’t take this matview into account. This step can be
done beforehand.
(2) Check for each (sub)query in the query-to-optimize whether it does
the following (at a smart point in the optimization phase). If any of
these conditions are not met, don’t use this matview:
- Joins at least the tables that are joined in the matview.
- Contains join conditions and general filter conditions that are at
least as strict.
- Doesn’t refer elsewhere to any attributes that the matview doesn’t contain.
(3) Always replace the corresponding query parts with the matview
(i.e., assume that the cost will always be lower than performing the
original query).
(4) If multiple matviews fit in step 3, try them all (and use the one
that yields the lower total cost).
(5) Always replace any aggregation with the corresponding
aggregation-results (if they exist) from the matview.

That doesn’t sound as if it would make planning time explode that much
(except, because of step 4, when there are many matviews that contain
overlapping sets of joined tables, and a query joins over the union of
those sets, *and* the replaceable-by-matviews parts are in
subqueries). It could even decrease it significantly (e.g., in the
case where a bunch of joins would be replaced with a scan of a
matview).

Also, I suppose that once such functionality exists, application
writers would be more inclined to write “heavy” queries that do lots
of aggregation even in an OLTP environment, of the kind that is these
days typically only done in OLAP environments.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-03 23:39:08
Message-ID: 17034.1362353948@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> writes:
> 2013/3/3 Kevin Grittner <kgrittn(at)ymail(dot)com>:
>> Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>>> I think that automatically using materialized views even when the
>>> query doesnt mention them directly, is akin to automatically
>>> using indexes without having to mention them in the query.

>> Oh, I understand that concept perfectly well, I just wonder how
>> often it is useful in practice.

There's a much more fundamental reason why this will never happen, which
is that the query planner is not licensed to decide that you only want
an approximate and not an exact answer to your query.

If MVs were guaranteed always up-to-date, maybe we could think about
automatic use of them --- but that's a far different feature from what
Kevin has here.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-04 00:27:02
Message-ID: 5133EA56.2050103@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


> There's a much more fundamental reason why this will never happen, which
> is that the query planner is not licensed to decide that you only want
> an approximate and not an exact answer to your query.

I think it would be worth talking about when someone wants to implement
it. I'd imagine it would require setting a GUC, though, which would be
off by default for obvious reasosn.

> If MVs were guaranteed always up-to-date, maybe we could think about
> automatic use of them --- but that's a far different feature from what
> Kevin has here.

And of limited utility, as mentioned.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-04 00:34:44
Message-ID: 1362357284.73921.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> I'm still working on docs, and the changes related to the syntax
> change are still only lightly tested, but as far as I know, all is
> complete except for the docs.  I'm still working on those and
> expect to have them completed late today.  I'm posting this patch
> to allow a chance for final review of the code changes before I
> push.

Pushed.

I'll be keeping an eye on the buildfarm.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Materialized views WIP patch
Date: 2013-03-04 00:57:41
Message-ID: 1362358661.13222.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> Pushed.

My first attempt to push it failed because of a concurrent commit
by Tom, and then when I went to redo it I accidentally included a
file with the .orig suffix.  I think I've undone my error, but
github doesn't seem to be updating, so I fear I did some damage
somehow. If someone who knows what they are doing with git better
than I do could check the server to make sure there is no residual
problem, I would appreciate it.

Sorry for the trouble.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-04 04:04:07
Message-ID: 51341D37.5020506@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 03/04/2013 08:27 AM, Josh Berkus wrote:
>> There's a much more fundamental reason why this will never happen, which
>> is that the query planner is not licensed to decide that you only want
>> an approximate and not an exact answer to your query.
> I think it would be worth talking about when someone wants to implement
> it. I'd imagine it would require setting a GUC, though, which would be
> off by default for obvious reasosn.
I'm not a fan of this, even with a GUC. Imagine doing remote debugging
by email/phone. There are enough things to check already ("does your
application REALLY commit that transaction?") without also having to
deal with settings that can cause a potentially out of date view of the
data to be used without it being visible in the query its self.

I hate to even say it, but this is where a per-query [redacted] would be
good, so we could say in the query text that this query may use matviews
that are not perfectly up to date.

At this point it's all hand-waving anyway, since no feature to allow the
planner to automatically rewrite a subtree of a query to use a matview
instead exists.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Euler Taveira <euler(at)timbira(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-04 13:37:32
Message-ID: 5134A39C.70800@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 03-03-2013 21:27, Josh Berkus wrote:
> I think it would be worth talking about when someone wants to implement
> it. I'd imagine it would require setting a GUC, though, which would be
> off by default for obvious reasosn.
>
-1. Why not adding another storage_parameter, say auto_refresh=on? Also,
that's another feature.

> And of limited utility, as mentioned.
>
For a first release, that is fine as is. Let's not complicate a feature that
has been widely discussed in this development cycle.

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Materialized views WIP patch
Date: 2013-03-04 15:39:37
Message-ID: 20130304153937.GD9507@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
> > Pushed.
>
> My first attempt to push it failed because of a concurrent commit
> by Tom, and then when I went to redo it I accidentally included a
> file with the .orig suffix.  I think I've undone my error, but
> github doesn't seem to be updating, so I fear I did some damage
> somehow. If someone who knows what they are doing with git better
> than I do could check the server to make sure there is no residual
> problem, I would appreciate it.

FWIW this email was stuck in the moderation queue, I just released it.
We might have realized earlier that there was a problem ...

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 08:50:39
Message-ID: CA+U5nM+QnFTZs7aQ0ekGi5ptKjy=zW29g4Y=12BF+KetCA5LoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 3 March 2013 23:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> writes:
>> 2013/3/3 Kevin Grittner <kgrittn(at)ymail(dot)com>:
>>> Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>>>> I think that automatically using materialized views even when the
>>>> query doesn’t mention them directly, is akin to automatically
>>>> using indexes without having to mention them in the query.
>
>>> Oh, I understand that concept perfectly well, I just wonder how
>>> often it is useful in practice.
>
> There's a much more fundamental reason why this will never happen, which
> is that the query planner is not licensed to decide that you only want
> an approximate and not an exact answer to your query.
>
> If MVs were guaranteed always up-to-date, maybe we could think about
> automatic use of them --- but that's a far different feature from what
> Kevin has here.

Its not a different feature, its what most people expect a feature
called MV to deliver. That's not a matter of opinion, its simply how
every other database works currently - Oracle, Teradata, SQLServer at
least. The fact that we don't allow MVs to automatically optimize
queries is acceptable, as long as that is clearly marked in some way
to avoid confusion, and I don't mean buried on p5 of the docs. What we
have here is a partial implementation that can be improved upon over
next few releases. I hope anyone isn't going to claim that
"Materialized Views" have been implemented in the release notes in
this release, because unqualified that would be seriously misleading
and might even stifle further funding to improve things to the level
already implemented elsewhere. Just to reiterate, I fully support the
committing of this partial feature into Postgres in this release
because it will be a long haul to complete the full feature and what
we have here is a reasonable stepping stone to get there.

Transactionally up-yo-date MVs can be used like indexes in the
planner. The idea that this is impossible because of the permutations
involved is somewhat ridiculous; there is much published work on
optimising that and some obvious optimisations. Clearly that varies
according to the number of MVs and the number of tables they touch,
not the overall complexity of the query. The overhead is probably same
or less as partial indexes, which we currently think is acceptable. In
any case, if you don't wish that overhead, don't use MVs.

Non-transactionally up-to-date MVs could also be used like indexes in
the planner, if we gave the planner the "licence" it (clearly) lacks.
If using MV makes a two-hour query return in 1 minute, then using an
MV that is 15 minutes out of date is likely to be a win. The "licence"
is some kind of user parameter/option that specifies how stale an
answer a query can return. For many queries that involve averages and
sums, a stale or perhaps an approximate answer would hardly differ
anyway. So I think there is room somewhere there for a "staleness"
time specification by the user, allowing approximation.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 12:15:55
Message-ID: 1362485755.5466.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> On 3 March 2013 23:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> writes:
>>> 2013/3/3 Kevin Grittner <kgrittn(at)ymail(dot)com>:
>>>> Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>>>>> I think that automatically using materialized views even when
>>>>> the query doesn’t mention them directly, is akin to
>>>>> automatically using indexes without having to mention them in
>>>>> the query.
>>
>>>> Oh, I understand that concept perfectly well, I just wonder
>>>> how often it is useful in practice.
>>
>> There's a much more fundamental reason why this will never
>> happen, which is that the query planner is not licensed to
>> decide that you only want an approximate and not an exact answer
>> to your query.
>>
>> If MVs were guaranteed always up-to-date, maybe we could think
>> about automatic use of them --- but that's a far different
>> feature from what Kevin has here.
>
> Its not a different feature, its what most people expect a
> feature called MV to deliver. That's not a matter of opinion, its
> simply how every other database works currently - Oracle,
> Teradata, SQLServer at least. The fact that we don't allow MVs to
> automatically optimize queries is acceptable, as long as that is
> clearly marked in some way to avoid confusion, and I don't mean
> buried on p5 of the docs. What we have here is a partial
> implementation that can be improved upon over next few releases.
> I hope anyone isn't going to claim that "Materialized Views" have
> been implemented in the release notes in this release, because
> unqualified that would be seriously misleading and might even
> stifle further funding to improve things to the level already
> implemented elsewhere. Just to reiterate, I fully support the
> committing of this partial feature into Postgres in this release
> because it will be a long haul to complete the full feature and
> what we have here is a reasonable stepping stone to get there.
>
> Transactionally up-yo-date MVs can be used like indexes in the
> planner. The idea that this is impossible because of the
> permutations involved is somewhat ridiculous; there is much
> published work on optimising that and some obvious optimisations.
> Clearly that varies according to the number of MVs and the number
> of tables they touch, not the overall complexity of the query.
> The overhead is probably same or less as partial indexes, which
> we currently think is acceptable. In any case, if you don't wish
> that overhead, don't use MVs.
>
> Non-transactionally up-to-date MVs could also be used like
> indexes in the planner, if we gave the planner the "licence" it
> (clearly) lacks. If using MV makes a two-hour query return in 1
> minute, then using an MV that is 15 minutes out of date is likely
> to be a win. The "licence" is some kind of user parameter/option
> that specifies how stale an answer a query can return. For many
> queries that involve averages and sums, a stale or perhaps an
> approximate answer would hardly differ anyway. So I think there
> is room somewhere there for a "staleness" time specification by
> the user, allowing approximation.

I don't think I disagree with any of what Simon says other than his
feelings about the planning cost.  Imagine that there are ten MVs
that might apply to a complex query, but some of them are mutually
exclusive, so there are a large number of permutations of MVs which
could be used to replace parts of the original query.  And maybe
some of base tables have indexes which could reduce execution cost
which aren't present in some or all of the MVs.  And each MV has a
number of indexes.  The combinatorial explosion of possible plans
would make it hard to constrain plan time without resorting to some
crude rules about what to choose.  That's not an unsolvable
problem, but I see it as a pretty big problem.

I have no doubt that someone could take a big data warehouse and
add one or two MVs and show a dramatic improvement in the run time
of a query.  Almost as big as if the query were rewritten to usee
the MV directly.  It would make for a very nice presentation, and
as long as they are used sparingly this could be a useful tool for
a data warehouse environment which is playing with alternative ways
to optimize slow queries which pass a lot of data.  In other
environments, I feel that it gets a lot harder to show a big win.

The good news is that it sounds like we agree on the ideal
long-term feature set.  I'm just a lot more excited, based on the
use-cases I've seen, about the addition of incremental updates than
substituting MVs into query plans which reference the underlying
tables.  Perhaps that indicates a chance to the final feature set
sooner, through everyone scratching their own itches.  :-)

And we both seem to feel that some system for managing acceptable
levels of MV "freshness" is a necessary feature in order to go very
much further.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 14:44:38
Message-ID: CA+U5nMKSrF=j54fAns4pYQsrszK7L=Vkbqt2xxVT5pc738qegw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 5 March 2013 12:15, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> I don't think I disagree with any of what Simon says other than his
> feelings about the planning cost. Imagine that there are ten MVs
> that might apply to a complex query, but some of them are mutually
> exclusive, so there are a large number of permutations of MVs which
> could be used to replace parts of the original query. And maybe
> some of base tables have indexes which could reduce execution cost
> which aren't present in some or all of the MVs. And each MV has a
> number of indexes. The combinatorial explosion of possible plans
> would make it hard to constrain plan time without resorting to some
> crude rules about what to choose. That's not an unsolvable
> problem, but I see it as a pretty big problem.

If we are proposing that we shouldn't try to optimise because its not
usefully solvable, then I would disagree.

If we are saying that more plans are possible with MVs, then I'd say,
yes there *could* be - that's the one of the purposes. That represents
more options for optimisation and we should be happy, not sad about
that. Yes, we would need some thought to how those potential
optimisations can be applied without additional planning cost, but I
see that as a long term task, not a problem. The question is will the
potential for additional planning time actually materialise into a
planning problem? (See below).

> I have no doubt that someone could take a big data warehouse and
> add one or two MVs and show a dramatic improvement in the run time
> of a query. Almost as big as if the query were rewritten to usee
> the MV directly. It would make for a very nice presentation, and
> as long as they are used sparingly this could be a useful tool for
> a data warehouse environment which is playing with alternative ways
> to optimize slow queries which pass a lot of data. In other
> environments, I feel that it gets a lot harder to show a big win.

Are there realistically going to be more options to consider? In
practice, no, because in most cases we won't be considering both MVs
and indexes.

Splatting MVs on randomly won't show any more improvement than
splatting indexes on randomly helps. Specific optimisations help in
specific cases only. And of course, adding extra data structures that
have no value certainly does increase planning time. Presumably we'd
need some way of seeing how frequently MVs were picked, so we could
drop unused MVs just like we can indexes.

* Indexes are great at speeding up various kinds of search queries. If
you don't have any queries like that, they help very little.

* MVs help in specific and restricted use cases, but can otherwise be
thought of as a new kind of index structure. MVs help with joins and
aggregations, so if you don't do much of that, you'll see no benefit.

That knowledge also allows us to develop heuristics for sane
optimisation. If the MV has a GROUP BY in it, and a query doesn't,
then it probably won't help much to improve query times. If it
involves a join you aren't using, then that won't help either. My
first guess would be that we don't even bother looking for MV plans
unless it has an aggregated result, or a large scale join. We do
something similar when we look for plans that use indexes when we have
appropriate quals - no quals, no indexes.

As a result, I don't see MVs increasing planning times for requests
that would make little or no use of them. There will be more planning
time on queries that could make use of them and that is good because
we really care about that.

Sure, a badly written planner might cost more time than it saves. All
of this work requires investment from someone with the time and/or
experience to make a good go at it. I'm not pushing Tom towards it,
nor anyone else, but I do want to see the door kept open for someone
to do this when possible (i.e. not GSoC).

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 20:03:56
Message-ID: 51364FAC.7020703@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Simon, Kevin, all:

Actually, there was already an attempt at automated MV query planning as
a prior university project. We could mine that for ideas.

Hmmm. I thought it was on pgfoundry, but it's not. Does anyone have
access to ACM databases etc. so they could search for this?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 21:08:05
Message-ID: CA+TgmoazPwB1eMW3gRnYJE9trJ1RjS_r9yBOF-eMOoGpZBn73Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> I don't think I disagree with any of what Simon says other than his
> feelings about the planning cost. Imagine that there are ten MVs
> that might apply to a complex query, but some of them are mutually
> exclusive, so there are a large number of permutations of MVs which
> could be used to replace parts of the original query. And maybe
> some of base tables have indexes which could reduce execution cost
> which aren't present in some or all of the MVs. And each MV has a
> number of indexes. The combinatorial explosion of possible plans
> would make it hard to constrain plan time without resorting to some
> crude rules about what to choose. That's not an unsolvable
> problem, but I see it as a pretty big problem.

I'm not sure I agree. Suppose you have a query like SELECT * FROM a
INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE <some
stuff>. The query planner will construct paths for scans on a, b, and
c. Then it will construct joinrels for (a b), (a c), (b c), and
eventually (a b c) and calculate a set of promising paths for each of
them. If there is a materialized view available for one of those
joinrels, all we really need to do is add the possible paths for
scanning that materialized view to the joinrel. They'll either be
better than the existing paths, or they won't. If they're better, the
paths that otherwise would have gotten chosen will get kicked out. If
they're worse, the materialized-view paths will get kicked out.
Either way, we don't have any more paths than we would have otherwise
- so no combinatorial explosion.

There is one case where we might end up with more paths than we had
before. Suppose there's a materialized view on the query SELECT a.x,
a.y, a.z, b.t FROM a INNER JOIN b ON a.x = b.x ORDER BY a.z and the
users enters just that query. Suppose further that the materialized
view has an index on column z, but table a does not. In that case,
the best paths for the joinrel (a b) not involving the materialized
view will be an unordered path, but we could scan the materialized
view using the index and so will have a path that is ordered by a.z to
add to the joinrel. This path will stick around even if it's more
expensive than the unordered path because we know it avoids a final
sort. So in that case we do have more paths, but they are potentially
useful paths, so I don't see that as a problem.

It seems to me that the tricky part of this is not that it might add a
lot more paths (because I don't think it will, and if it does I think
they're useful paths), but that figuring out whether or not a
materialized view matches any particular joinrel might be expensive.
I think we're going to need a pretty accurate heuristic for quickly
discarding materialized views that can't possibly be relevant to the
query as a whole, or to particular joinrels. There are a couple of
possible ways to approach that. The most manual of these is probably
to have a command like ALTER TABLE x {ENABLE | DISABLE } REWRITE
MATERIALIZED y, where the user has to explicitly ask for materialized
views to be considered, or else they aren't. That sort of
fine-grained control might have other benefits as well.

I think a more automated solution is also possible, if we want it.
For a materialized view to match a query, all of the baserels in the
materialized view must also be present in the query. (Actually, there
are situations where this isn't true; e.g. the materialized view has
an extra table, but it's joined in a way that could be pruned away by
the join removal code, but I think we could ignore such
somewhat-pathological cases at least initially.) It seems to me that
if we could figure out a very-cheap way to throw away all of the
materialized views that don't pass that basic test, we'd be reasonably
close to a workable solution. A database with tons of materialized
views defined on the same set of target relations might still have
some planning time problems, but so might a database with tons of
indexes.

In that regard, what I was thinking about is to use something sort of
like a Bloom filter. Essentially, produce a "fingerprint" for each
materialized view. For the sake of argument, let's say that the
fingerprint is a 64-bit integer, although it could be a bit string of
any length. To construct the fingerprint, hash the OID of each
relation involved in the view onto a bit position between 0 and 63.
Set the bits for all relations involved in the materialized view.
Then, construct a fingerprint for the query in the same way. Any
materialized view where (matview_fingerprint & query_fingerprint) !=
matview_fingerprint needn't be considered; moreover, for a given
joinrel, any materialized view where matview_fingerprint !=
joinrel_fingerprint (computed using the same scheme) needn't be
considered. Of course, a matching fingerprint doesn't mean that the
materialized view matches, or even necessarily that it involves the
correct set of relations, so there's a lot more double-checking that
has to be done afterwards before you can decide that the view is in
fact a match, but at least it's a fast way to throw away some large
percentage of materialized views that are definitely NOT relevant to
the query, and only do more detailed evaluation on the ones that might
be matches. The chances of false positives can be reduced if you care
to make the fingerprints bigger (and thus more expensive to compare).

All that having been said, it's hard for me to imagine that anyone
really cares about any of this until we have an incremental update
feature, which right now we don't. Actually, I'm betting that's going
to be significantly harder than automatic-query-rewrite, when all is
said and done. Automatic-query-rewrite, if and when we get it, will
not be easy and will require a bunch of work from someone with a good
understanding of the planner, but it strikes me as the sort of thing
that might work out to one large project and then it's done. Whereas,
incremental update sounds to me like a series of projects over a
series of releases targeting various special cases, where we can
always point to some improvements vs. release N-1 but we're never
actually done and able to move on to the next thing. As a roadmap
goes, I think that's OK. Even a reasonably simplistic and partial
implementation of incremental update will benefit a lot of users. But
in terms of relative difficulty, it's not at all obvious to me that
that's the easier part of the project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 21:09:08
Message-ID: 1362517748.34869.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

There is no shortage of literature on the topic, although any
papers from the ACM could certainly be of interest due to the
general high quality of papers published there.  Adding anything
like this to 9.3 is clearly out of the question, though, so I
really don't want to spend time researching this now, or
encouraging others to do so until after we have a 9.3 release
candidate.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 21:51:52
Message-ID: 1362520312.80777.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> All that having been said, it's hard for me to imagine that
> anyone really cares about any of this until we have an
> incremental update feature, which right now we don't.

These are actually independent of one another, as long as we nail
down how we're determining "freshness" -- which is probably needed
for either.  Someone who's immersed in tuning long-running DW
queries might be interested in this before incremental update.
(They might load the data once per month, so refreshing the MVs as
a step in that process might be cheaper than incrementally
maintaining them.)  Someone could base "freshness" on
pg_relation_is_scannable() and start working on automatic query
rewrite right now, if they wanted to.

> Actually, I'm betting that's going to be significantly harder
> than automatic-query-rewrite, when all is said and done.
> Automatic-query-rewrite, if and when we get it, will not be easy
> and will require a bunch of work from someone with a good
> understanding of the planner, but it strikes me as the sort of
> thing that might work out to one large project and then it's
> done.

I still think we're going to hit the wall on planning time under
certain circumstances and need to tweak that over the course of
several releases, but now is not the time to get into the details
of why I think that.  We've spent way too much time on it already
for the point we're at in the 9.3 cycle.  I've kept my concerns
hand-wavy on purpose, and am trying hard to resist the temptation
to spend a lot of time demonstrating the problems.

> Whereas, incremental update sounds to me like a series of
> projects over a series of releases targeting various special
> cases, where we can always point to some improvements vs. release
> N-1 but we're never actually done

Exactly.  I predict that we will eventually have some special sort
of trigger for maintaining MVs based on base table changes to
handle the ones that are just too expensive (in developer time or
run time) to fully automate.  But there is a lot of low-hanging
fruit for automation.

> Even a reasonably simplistic and partial implementation of
> incremental update will benefit a lot of users.

Agreed.

> But in terms of relative difficulty, it's not at all obvious to
> me that that's the easier part of the project.

I totally agree that getting something working to use MVs in place
of underlying tables is not all that different or more difficult
than using partial indexes.  I just predict that we'll get a lot of
complaints about cases where it results in worse performance and
we'll need to deal with those issues.  I don't seem that as being
brain surgery; just a messy matter of trying to get this pretty
theory to work well in the real world -- probably using a bunch of
not-so-elegant heuristics.  And in the end, the best you can hope
for is performance not noticeably worse than you would get if you
modified your query to explicitly use the MV(s) -- you're just
saving yourself the rewrite.  Well, OK, there is the point that,
(like indexes) if you run the query which hits the base tables with
different parameters, and a new plan is generated each time, it
might pick different MVs or exclude them as is most efficient for
the given parameters.  That's the Holy Grail of all this.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 21:58:44
Message-ID: CAP-rdTaTQ5qHEBkuAOfpLTBjf6=VcWMFgQH9XXt633CU5s5pSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

2013/3/5 Robert Haas <robertmhaas(at)gmail(dot)com>:

> All that having been said, it's hard for me to imagine that anyone
> really cares about any of this until we have an incremental update
> feature, which right now we don't. Actually, I'm betting that's going
> to be significantly harder than automatic-query-rewrite, when all is
> said and done.

I agree.

E.g., things such as keeping a matview consistent relative to changes
applied to the base tables during the same transaction, might be
mightily difficult to implement in a performant way. OTOH, matviews
that can only be used for optimization if their base tables were not
changed “too recently” (e.g., by transactions that are still in
flight, including the current transaction), are probably kind of
useful in themselves as long as those base tables are not updated all
the time.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 22:02:46
Message-ID: 16818.1362520966@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> I don't think I disagree with any of what Simon says other than his
>> feelings about the planning cost.

> I'm not sure I agree. Suppose you have a query like SELECT * FROM a
> INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE <some
> stuff>. The query planner will construct paths for scans on a, b, and
> c. Then it will construct joinrels for (a b), (a c), (b c), and
> eventually (a b c) and calculate a set of promising paths for each of
> them. If there is a materialized view available for one of those
> joinrels, all we really need to do is add the possible paths for
> scanning that materialized view to the joinrel.

That only works to the extent that a materialized view can be described
by a path. My impression is that most of the use-cases for MVs will
involve aggregates or similar data reduction operators, and we don't
currently implement anything about aggregates at the Path level.
Arguably it would be useful to do so; in particular, we could get rid
of the currently hard-wired mechanism for choosing between sorted and
hashed aggregation, and perhaps there'd be a less grotty way to deal
with index-optimized MIN/MAX aggregates. But there's a great deal to do
to make that happen, and up to now I haven't seen any indication that it
would do much except add overhead.

FWIW, my opinion is that doing anything like this in the planner is
going to be enormously expensive. Index matching is already pretty
expensive, and that has the saving grace that we only do it once per
base relation. Your sketch above implies trying to match to MVs once
per considered join relation, which will be combinatorially worse.
Even with a lot of sweat over reducing the cost of the matching, it
will hurt.

> All that having been said, it's hard for me to imagine that anyone
> really cares about any of this until we have an incremental update
> feature, which right now we don't.

Agreed. Even if we're willing to have an "approximate results are OK"
GUC (which frankly strikes me as a horrid idea), people would certainly
not be willing to turn it on without some guarantee as to how stale the
results could be.

regards, tom lane


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 22:04:13
Message-ID: CAP-rdTZSV7_tzgv8244NE8sx9dNjteJQTiQhMsNG_aBKuKgMpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

2013/3/5 Kevin Grittner <kgrittn(at)ymail(dot)com>:

> Exactly. I predict that we will eventually have some special sort
> of trigger for maintaining MVs based on base table changes to
> handle the ones that are just too expensive (in developer time or
> run time) to fully automate. But there is a lot of low-hanging
> fruit for automation.

I think it would be totally OK to restrict the possible definitions
for matviews that can be maintained fully incrementally to something
like:

SELECT attributes and aggregations FROM trivial joins WHERE trivial
condition GROUP BY attributes;

Those definitions are the most useful for optimizing the things that
matviews are good at (joins and aggregation).

Nicolas

PS. Sorry for having fired off this discussion that obviously doesn’t
really relate to the current patch.

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 22:15:03
Message-ID: 1362521703.55760.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:

> PS. Sorry for having fired off this discussion that obviously
> doesn’t really relate to the current patch.

I know it's hard to resist.  While I think there will be a number
of people for whom the current patch will be a convenience and will
therefore use it, it is hard to look at what's there and *not* go
"if only it also..."

Perhaps it would be worth looking for anything in the patch that
you think might be painting us into a corner where it would be hard
to do all the other cool things.  While it's late enough in the
process that changing anything like that which you find would be
painful, it might be a lot more painful later if we release without
doing something about it.  My hope, of course, is that you won't
find any such thing.  With this patch I've tried to provide a
minimal framework onto which these other things can be bolted.
I've tried hard not to do anything which would make it hard to
extend, but new eyes may see something I missed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-05 22:37:41
Message-ID: 513673B5.8040508@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 03/05/2013 01:09 PM, Kevin Grittner wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
> There is no shortage of literature on the topic, although any
> papers from the ACM could certainly be of interest due to the
> general high quality of papers published there. Adding anything
> like this to 9.3 is clearly out of the question, though, so I
> really don't want to spend time researching this now, or
> encouraging others to do so until after we have a 9.3 release
> candidate.

Good point.

Just FYI: once we start work on 9.4, some university team got planner
stuff for matviews working with postgres, using version 8.0 or something.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: kgrittn(at)ymail(dot)com
Cc: stark(at)mit(dot)edu, ants(at)cybertec(dot)at, hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 04:00:31
Message-ID: 20130306.130031.1925699785249303273.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
>> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>
> Given the short time, I left out the "[, ...]".  If people think
> that is important to get into this release, a follow-on patch might
> be possible.
>
>> Barring objections, I will use the above and push tomorrow.
>
> I'm still working on docs, and the changes related to the syntax
> change are still only lightly tested, but as far as I know, all is
> complete except for the docs.  I'm still working on those and
> expect to have them completed late today.  I'm posting this patch
> to allow a chance for final review of the code changes before I
> push.

Was the remaining work on docs done? I would like to test MVs and am
waiting for the docs completed.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 08:28:10
Message-ID: CA+U5nMJrOZPqGi_srsZMz0tX5NphKTsDLbyci+kMZ1h2MM5=qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 5 March 2013 22:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> FWIW, my opinion is that doing anything like this in the planner is
> going to be enormously expensive. Index matching is already pretty
> expensive, and that has the saving grace that we only do it once per
> base relation. Your sketch above implies trying to match to MVs once
> per considered join relation, which will be combinatorially worse.
> Even with a lot of sweat over reducing the cost of the matching, it
> will hurt.

As we already said: no MVs => zero overhead => no problem. It costs in
the cases where time savings are possible and not otherwise. The type
of queries and their typical run times are different to the OLTP case,
so any additional planning time is likely to be acceptable. I'm sure
we can have a deep disussion about how to optimise the planner for
this; I'm pretty sure there are reasonable answers to the not-small
difficulties along that road.

Most importantly, I want to make sure we don't swing the door shut on
improvements here, especially if you (Tom) are not personally
convinced enough to do the work yourself.

Making transactional MVs work would be in part justified by the
possible existence of automatic lookaside planning, so yes, the two
things are not linked but certainly closely related.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 14:16:59
Message-ID: 14403.1362579419@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On 5 March 2013 22:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> FWIW, my opinion is that doing anything like this in the planner is
>> going to be enormously expensive.

> As we already said: no MVs => zero overhead => no problem.

Well, in the first place that statement is false on its face: we'll
still spend cycles looking for relevant MVs, or at least maintaining a
complexly-indexed cache that helps us find out that there are none in
a reasonable amount of time. In the second place, even if it were
approximately true it wouldn't help the people who were using MVs.

> It costs in
> the cases where time savings are possible and not otherwise.

And that is just complete nonsense: matching costs whether you find a
match or not. Could we have a little less Pollyanna-ish optimism and
a bit more realism about the likely cost of such a feature?

regards, tom lane


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: "stark(at)mit(dot)edu" <stark(at)mit(dot)edu>, "ants(at)cybertec(dot)at" <ants(at)cybertec(dot)at>, "hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "josh(at)agliodbs(dot)com" <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 14:53:20
Message-ID: 1362581600.96752.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:

> Was the remaining work on docs done? I would like to test MVs and
> am waiting for the docs completed.

I think they are done.  If you notice anything missing or in need
of clarification please let me know.  At this point missing docs
would be a bug in need of a fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <stark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 16:13:05
Message-ID: CAM-w4HMDEx1FSckQicTr+oRDKMKaivE-dqQDwGmndBd8i7BstA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Mar 5, 2013 at 9:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> All that having been said, it's hard for me to imagine that anyone
> really cares about any of this until we have an incremental update
> feature, which right now we don't. Actually, I'm betting that's going
> to be significantly harder than automatic-query-rewrite, when all is
> said and done. Automatic-query-rewrite, if and when we get it, will
> not be easy and will require a bunch of work from someone with a good
> understanding of the planner, but it strikes me as the sort of thing
> that might work out to one large project and then it's done. Whereas,
> incremental update sounds to me like a series of projects over a
> series of releases targeting various special cases, where we can
> always point to some improvements vs. release N-1 but we're never
> actually done and able to move on to the next thing. As a roadmap
> goes, I think that's OK. Even a reasonably simplistic and partial
> implementation of incremental update will benefit a lot of users. But
> in terms of relative difficulty, it's not at all obvious to me that
> that's the easier part of the project.

While true that's true for a lot of Postgres features. The only ones
that are one-shot projects are buried deep in the internals. Anything
with UI implications inevitably has limitations and then other people
come along and and work on removing or extending those features.

I do agree with Tom though -- the most frequently asked for
materialized view in the past has always been "select count(*) from
tab". People assume we already do this and are surprised when we
don't. The cookie cutter solution for it is basically exactly what a
incrementally updated materialized view solution would look like
(with the queue of updates with transacion information that are
periodically flattened into the aggregate). Rewriting this might be a
bit tricky and require heuristics to determine just how much work to
expend trying to match materialized views, this type of view would be
where most of the win would be.

I also can't see implementing query rewriting for
non-transactionally-accurate materialized views. If people want a
snapshot of the data that may be out of date that's great. I can tons
of use cases for that. But then surely they won't be surprised to have
to query the snapshot explicitly. If can't see going to all this
trouble to implement transactions and snapshots and wal logging and so
on and then silently rewriting queries to produce data that is not up
to date. I think users would be surprised to find bog-standard SQL
occasionally producing "incorrect" results.

That said, there are cases where snapshots might be up to date even
though we don't implement any incremental updates. If the underlying
data is read-only or hasn't received any update commits since the
snapshot was taken then it might still be useful. There are tons of
ETL applications where you load the data once and then build MVs for
it and never touch the underlying data again.

--
greg


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: "stark(at)mit(dot)edu" <stark(at)mit(dot)edu>, "ants(at)cybertec(dot)at" <ants(at)cybertec(dot)at>, "hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "josh(at)agliodbs(dot)com" <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 21:51:20
Message-ID: 1362606680.53999.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
>> Was the remaining work on docs done? I would like to test MVs and
>> am waiting for the docs completed.
>
> I think they are done.  If you notice anything missing or in need
> of clarification please let me know.  At this point missing docs
> would be a bug in need of a fix.

I decided to take another pass through to try to spot anyplace I
might have missed.  I found that I had missed documenting the new
pg_matviews system view, so I have just pushed that.

I also think that something should be done about the documentation
for indexes.  Right now that always refers to a "table".  It would
clearly be awkward to change that to "table or materialized view"
everywhere.  I wonder if most of thosse should be changed to
"relation" with a few mentions that the relation could be a table
or a materialized view, or whether some less intrusive change would
be better.  Opinions welcome.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, "stark(at)mit(dot)edu" <stark(at)mit(dot)edu>, "ants(at)cybertec(dot)at" <ants(at)cybertec(dot)at>, "hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "josh(at)agliodbs(dot)com" <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 22:28:29
Message-ID: AFF73C23-9DA9-44D2-BB81-48DAB6441496@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mar 6, 2013, at 1:51 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> I also think that something should be done about the documentation
> for indexes. Right now that always refers to a "table". It would
> clearly be awkward to change that to "table or materialized view"
> everywhere. I wonder if most of thosse should be changed to
> "relation" with a few mentions that the relation could be a table
> or a materialized view, or whether some less intrusive change would
> be better. Opinions welcome.

Isn’t a materialized view really just a table that gets updated periodically? And isn’t a non-matierialized view also thought of as a “relation”?

If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables).

Best,

David


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: kgrittn(at)ymail(dot)com
Cc: ishii(at)postgresql(dot)org, stark(at)mit(dot)edu, ants(at)cybertec(dot)at, hlinnakangas(at)vmware(dot)com, robertmhaas(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 22:45:22
Message-ID: 20130307.074522.1755169086862119106.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>
>>> Was the remaining work on docs done? I would like to test MVs and
>>> am waiting for the docs completed.
>>
>> I think they are done.  If you notice anything missing or in need
>> of clarification please let me know.  At this point missing docs
>> would be a bug in need of a fix.

Ok.

> I decided to take another pass through to try to spot anyplace I
> might have missed.  I found that I had missed documenting the new
> pg_matviews system view, so I have just pushed that.
>
> I also think that something should be done about the documentation
> for indexes.  Right now that always refers to a "table".  It would
> clearly be awkward to change that to "table or materialized view"
> everywhere.  I wonder if most of thosse should be changed to
> "relation" with a few mentions that the relation could be a table
> or a materialized view, or whether some less intrusive change would
> be better.  Opinions welcome.

You might want to add small description about MVs to Tutorial
documentation "3.2 Views". Here is the current description of views in
the doc.

---------------------------------------------------------------------
3.2. Views

Refer back to the queries in Section 2.6. Suppose the combined listing
of weather records and city location is of particular interest to your
application, but you do not want to type the query each time you need
it. You can create a view over the query, which gives a name to the
query that you can refer to like an ordinary table:

CREATE VIEW myview AS
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;

SELECT * FROM myview;

Making liberal use of views is a key aspect of good SQL database
design. Views allow you to encapsulate the details of the structure of
your tables, which might change as your application evolves, behind
consistent interfaces.

Views can be used in almost any place a real table can be
used. Building views upon other views is not uncommon.
---------------------------------------------------------------------
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-07 08:55:04
Message-ID: CA+U5nMKEMU=JZrmbs3-jg=OFtB7hg14JtmtsyEjwXuj-L_ai-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 6 March 2013 14:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On 5 March 2013 22:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> FWIW, my opinion is that doing anything like this in the planner is
>>> going to be enormously expensive.
>
>> As we already said: no MVs => zero overhead => no problem.
>
> Well, in the first place that statement is false on its face: we'll
> still spend cycles looking for relevant MVs, or at least maintaining a
> complexly-indexed cache that helps us find out that there are none in
> a reasonable amount of time. In the second place, even if it were
> approximately true it wouldn't help the people who were using MVs.

We can store info in the relcache, and reduce such a lookup to a
simple if test in the planner. Populating the cache would be easy
enough, approx same overhead as deriving list of constraints for the
relcache.

If you were using MVs, there are further heuristics to apply. MVs come
in various shapes, so we can assess whether they use aggregates,
joins, filters etc and use that for a general match against a query. I
don't see the need for complex assessments in every case.

>> It costs in
>> the cases where time savings are possible and not otherwise.
>
> And that is just complete nonsense: matching costs whether you find a
> match or not. Could we have a little less Pollyanna-ish optimism and
> a bit more realism about the likely cost of such a feature?

It's not a trivial feature; this is a lot of work. But it can be done
efficiently, without significant effect on other workloads. If that
really were to be true, then enable_lookaside = off can be the
default, just as we have for another costly planning feature,
constraint_exclusion.

Matview lookaside is the next-best-action for further work on the
planner, AFAICS. Correctly optimised query parallelism is harder,
IMHO.

What I'm hearing at the moment is "please don't make any changes in my
area" or "don't climb the North face". Considering the rather high bar
to being able to do this effectively, I do understand your interest in
not having your/our time wasted by casual attempts to approach the
problem, but I don't want to slam the door on a serious attempt (2
year project, 1+ man year effort).

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, "stark(at)mit(dot)edu" <stark(at)mit(dot)edu>, "ants(at)cybertec(dot)at" <ants(at)cybertec(dot)at>, "hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "josh(at)agliodbs(dot)com" <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-07 15:55:29
Message-ID: 1362671729.11638.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

David E. Wheeler <david(at)justatheory(dot)com> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
>> I also think that something should be done about the
>> documentation for indexes.  Right now that always refers to a
>> "table".  It would clearly be awkward to change that to "table
>> or materialized view" everywhere.  I wonder if most of thosse
>> should be changed to "relation" with a few mentions that the
>> relation could be a table or a materialized view, or whether
>> some less intrusive change would be better.  Opinions welcome.
>
> Isn’t a materialized view really just a table that gets updated
> periodically?

Not exactly.  It is a relation which has some characteristics of a
view (including an entry in pg_rewrite exactly like that for a
view) and some characteristics of a table (including a heap and
optional indexes).  Whether it looks more like a table or more like
a view depends on how you tilt your head.  You could just as easily
say that it is really just a view which periodically caches its
results on disk.  They really are "their own thing".  As future
releases add more subtle "freshness" concepts, incremental updates,
and query rewrite that unique identity will become even more
conspicuous, I think.

> And isn’t a non-matierialized view also thought of as a
> “relation”?

Yes.  Tables, views, and materialized views are all relations.

> If the answer to both those questions is “yes,” I think the term
> should remain “table,” with a few mentions that the term includes
> materialized views (and excludes foreign tables).

And if the answers are "not exactly" and "yes"?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, "stark(at)mit(dot)edu" <stark(at)mit(dot)edu>, "ants(at)cybertec(dot)at" <ants(at)cybertec(dot)at>, "hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "josh(at)agliodbs(dot)com" <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-07 17:14:57
Message-ID: 72AFFA40-91AF-496A-91C1-64594956082D@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mar 7, 2013, at 7:55 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

>> If the answer to both those questions is “yes,” I think the term
>> should remain “table,” with a few mentions that the term includes
>> materialized views (and excludes foreign tables).
>
> And if the answers are "not exactly" and "yes"?

I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term should be assumed to represent tables and matviews, and otherwise required disambiguations.

Trying to make the least possible work for you here. :-)

David


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-07 23:28:05
Message-ID: CAP-rdTY+VZ8T_8JBvsksDRdz-_wr5ef80M-_UKgu5bHw+rkjpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

2013/3/5 Kevin Grittner <kgrittn(at)ymail(dot)com>:

> Perhaps it would be worth looking for anything in the patch that
> you think might be painting us into a corner where it would be hard
> to do all the other cool things. While it's late enough in the
> process that changing anything like that which you find would be
> painful, it might be a lot more painful later if we release without
> doing something about it. My hope, of course, is that you won't
> find any such thing. With this patch I've tried to provide a
> minimal framework onto which these other things can be bolted.
> I've tried hard not to do anything which would make it hard to
> extend, but new eyes may see something I missed.

(Without having looked at the patch, or even the documentation :-/.)

I think that something that might prove useful is the following:
Keeping in mind the possibility of storing something in the matview’s
heap that doesn’t correspond to what a SELECT * FROM matview would
yield (i.e., the “logical content”). The transformation could be
performed by an INSTEAD rule (similar to how a view is expanded to its
definition, a reference to a matview would expand to its heap content
transformed to the “logical content”).

(Note that I don’t have any reason to believe that the current
implementation would make this more difficult than it should be.)

<ridiculously long rationale for the previous>

(All the following requires making matviews (inter- and intra-)
transactionally up-to-date w.r.t. their base tables at the moment of
querying them. I don’t deal with approximate results, however useful
that might be.)

I think that the possibility of optimizing COUNT(*) (see mail by Greg
Stark in this thread with “the queue of updates with transacion
information that are periodically flattened into the aggregate”) can
be generalized to generic aggregation that way. The idea would be that
a transaction that adds (or deletes or updates) a row in a base table
causes a “delta” row version in the matview. Selecting from the
matview then merges these deltas into one value (for each row that is
logically present in the matview). Every once in a while (or rather
quite often, if the base tables change often), a VACUUM-like clean-up
operations must be run to merge all rows that are “old enough” (i.e.,
whose transactions are not in flight anymore).

Example of trivial aggregation matview weight_per_kind defined as:

SELECT kind, SUM(weight) FROM fruit GROUP BY kind;

The matview would then physically contain rows such as:

xmin, xmax, kind, weight
1000, 0, 'banana', 123
1000, 0, 'apple', 1
1001, 0, 'banana', 2
1002, 0, 'banana', -3

Which means:

* tx 1000 probably performed a clean-up operation and merged a bunch
of banana rows together to yield 123; it also inserted an apple of
weight 1.
* tx 1001 inserted a banana of weight 2. Any clean-up operation coming
by could not merge the 2 into the first row, as long as tx 1000 is in
flight. Otherwise, it would yield 125; physically this would mean
adding a 125 row, marking the 123 and 2 rows as deleted, and then
waiting for VACUUM to remove them).
* tx 1002 deleted a banana with weight 3.

The result of a SELECT * FROM weight_per_kind; would actually execute
SELECT kind, SUM_merge(weight) FROM heap_of_weight_per_kind GROUP BY
kind;

This would result, for tx 1001 (assuming tx 1000 committed and our
snapshot can see it), in:

kind, weight
'banana', 125
'apple', 1

(The -3 is not taken into account, because it is not visible to tx 1001.)

The operator to use at the location of SUM_merge is something that
merges multiple aggregation results (plus results that represent some
kind of “negation”) together. For SUM, it would be SUM itself and the
negation would be numerical negation. There might also be some kind of
“difference” concept used for UPDATE: When updating a weight from 4 to
3, the difference would be -1. Those additional properties could
optionally be added to the definition of each aggregation function; It
must be done for each function that you want to use in such a way.

Other aggregation functions such as AVG would require storing the SUM
+ number of rows in the matview (otherwise two AVGs could not be
merged); again a difference between the heap and the logical content.
Functions such as MIN and MAX are more difficult to fit in this
framework: I can only see how it would work if row deletion were not
allowed (which might still be a valid use-case); luckily, I think MIN
and MAX are not the typical things for which you would want to use
matviews, because quick computation can typically be done directly
using the base tables.

This whole thing would result in incremental updates of
aggregation-matviews that don’t require physical serialization of the
transactions that update the base tables and that query the matview,
which other models (that depend on correspondence between the heap and
logical content of matviews) would probably require.

And that’s where I stop rambling because nobody gets this far anyway,
and I urgently need some sleep :-).

</ridiculously long rationale>

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-08 14:11:55
Message-ID: 20130308141155.GE25013@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Mar 6, 2013 at 09:16:59AM -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On 5 March 2013 22:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> FWIW, my opinion is that doing anything like this in the planner is
> >> going to be enormously expensive.
>
> > As we already said: no MVs => zero overhead => no problem.
>
> Well, in the first place that statement is false on its face: we'll
> still spend cycles looking for relevant MVs, or at least maintaining a
> complexly-indexed cache that helps us find out that there are none in
> a reasonable amount of time. In the second place, even if it were
> approximately true it wouldn't help the people who were using MVs.
>
> > It costs in
> > the cases where time savings are possible and not otherwise.
>
> And that is just complete nonsense: matching costs whether you find a
> match or not. Could we have a little less Pollyanna-ish optimism and
> a bit more realism about the likely cost of such a feature?

Should we add this to the TODO list as a possibility?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Ants Aasma <ants(at)cybertec(dot)at>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-08 14:45:59
Message-ID: 20130308144559.GF25013@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Mar 5, 2013 at 08:50:39AM +0000, Simon Riggs wrote:
> Its not a different feature, its what most people expect a feature
> called MV to deliver. That's not a matter of opinion, its simply how
> every other database works currently - Oracle, Teradata, SQLServer at
> least. The fact that we don't allow MVs to automatically optimize

Good points.

> queries is acceptable, as long as that is clearly marked in some way
> to avoid confusion, and I don't mean buried on p5 of the docs. What we
> have here is a partial implementation that can be improved upon over
> next few releases. I hope anyone isn't going to claim that
> "Materialized Views" have been implemented in the release notes in
> this release, because unqualified that would be seriously misleading
> and might even stifle further funding to improve things to the level
> already implemented elsewhere. Just to reiterate, I fully support the
> committing of this partial feature into Postgres in this release
> because it will be a long haul to complete the full feature and what
> we have here is a reasonable stepping stone to get there.
>
> Transactionally up-yo-date MVs can be used like indexes in the
> planner. The idea that this is impossible because of the permutations
> involved is somewhat ridiculous; there is much published work on
> optimising that and some obvious optimisations. Clearly that varies
> according to the number of MVs and the number of tables they touch,
> not the overall complexity of the query. The overhead is probably same
> or less as partial indexes, which we currently think is acceptable. In
> any case, if you don't wish that overhead, don't use MVs.

While you are right that automatically using materialized views is like
the optimizer choosing partial indexes, we actually already have
auto-selection of row-level materialized views with expression indexes
and index-only scans. When you do the insert or update, the indexed
function is called and the value stored in the index. If you later
query the function call, we can pull the value right from the index.
This, of course, is a very crude definition of materialized view, but it
seems relevant.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, "stark(at)mit(dot)edu" <stark(at)mit(dot)edu>, "ants(at)cybertec(dot)at" <ants(at)cybertec(dot)at>, "hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "josh(at)agliodbs(dot)com" <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-08 15:31:30
Message-ID: CA+TgmoYHzsUDUebuQbR8ugvfz+dL1uRYdYEZ+ABbZJcMbmtKiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Mar 7, 2013 at 12:14 PM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
> On Mar 7, 2013, at 7:55 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>>> If the answer to both those questions is “yes,” I think the term
>>> should remain “table,” with a few mentions that the term includes
>>> materialized views (and excludes foreign tables).
>>
>> And if the answers are "not exactly" and "yes"?
>
> I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term should be assumed to represent tables and matviews, and otherwise required disambiguations.

This ship has already sailed. There are plenty of places where
operations apply to a subset of the relation types that exist today,
and we either list them out or refer to "relations" generically.
Changing that would require widespread changes to both the
documentation and existing error message text. We cannot decide that
"table" now means "table or materialized view" any more than we can
decide that it means "table or foreign table", as was proposed around
the time those changes went in. Yeah, it's more work, and it's a
little annoying, but it's also clear. Nothing else is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company