Re: Materialized views WIP patch

Lists: pgsql-hackers
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Noah Misch" <noah(at)leadboat(dot)com>
Cc: "Marko Tiikkaja" <pgmail(at)joh(dot)to>,"Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-25 06:00:59
Message-ID: 20130125060059.119070@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch wrote:

> The patch conflicts with git master; I tested against master(at){2013-01-20}(dot)

New patch rebased, fixes issues raised by Thom Brown, and addresses
some of your points.

> PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
> matter whether this change happens or not. I suggest adding a comment,
> whether or not you keep the code change.

Reverted code changes to version_old_8_3.c; added comments.

> > *** a/contrib/sepgsql/sepgsql.h
> > --- b/contrib/sepgsql/sepgsql.h
> > ***************
> > *** 32,37 ****
> > --- 32,39 ----
> >
> > /*
> > * Internally used code of object classes
> > + *
> > + * NOTE: Materialized views are treated as tables for now.
>
> This smells like a bypass of mandatory access control. Unless you've
> determined that this is correct within the sepgsql security model, I suggest
> starting with a draconian policy, like simply crippling MVs. Even if you have
> determined that, separating out the nontrivial sepgsql support might be good.
> The set of ideal reviewers is quite different.

Robert suggested this way of coping for now. Will post just the
sepgsql separately to try to attract the right crowd to confirm.

> It concerns me slightly that older vacuumlo could silently remove large
> objects still referenced by MVs. Only slightly, though, because the next MV
> refresh would remove those references anyway. Is there anything we should do
> to help that situation? If nothing else, perhaps backpatch this patch hunk.

Defensive backpatching of this code sounds like a good idea to me.
I'm open to other opinions on whether we need to defend 9.3 and
later against earler versions of vacuumlo being run against them.

> Let's not support OIDs on MVs. They'll be regenerated on every refresh.

Do they have any value for people who might want to use cursors? If
nobody speaks up for them, I will drop OID support for materialized
views.

> If I'm reading this right, you always overwrite the passed-in dest without
> looking at it. What's the intent here?

Let me get back to you on that one.

> > + /*
> > + * Kludge here to allow refresh of a materialized view which is invalid
> > + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
> > + * first two RangeTblEntry list elements, which were added to the front
> > + * of the rewritten Query to keep the rules system happy, with the
> > + * isResultRel flag to indicate that it is OK if they are flagged as
> > + * invalid.
> > + */
> > + rtable = dataQuery->rtable;
> > + ((RangeTblEntry *) linitial(rtable))->isResultRel = true;
>> + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true;
>
> Is it safe to assume that the first two RTEs are the correct ones to flag?

I'm trying to play along with UpdateRangeTableOfViewParse() in
view.c. See the comment in front of that function for details.

>> + finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin);
>
> The check_constraints argument should be "true", because the refresh could
> have invalidated a UNIQUE index.

Fixed.

> If the user desires an actually-clustered MV, he must re-CLUSTER it after each
> refresh. That deserves a documentation mention.

That point had not occurred to me. Let me see if I can fix that before changing docs.

> > + * Ensure that all referrenced relations are flagged as valid.
>
> Typo.

Fixed.

>> + ExecCheckRelationsValid(rangeTable);
>
> I believe this ought to happen after the executor lock acquisitions, perhaps
> right in ExecOpenScanRelation(). Since you'll then have an open Relation,
> RelationIsFlaggedAsValid() can use the relcache.

That would break MVs entirely. This probably deserves more
comments. It's a little fragile, but was the best way I found to
handle things. An MV has a rule associated with it, just like a
"regular" view, which is parse-analyzed but not rewritten or
planned. We need to allow the rewrite and planning for statements
which populate the view, but suppress expansion of the rule for
simple references. It is OK for an MV to be invalid if it is being
populated, but not if it is being referenced. Long story short,
this call helps determine which relations will be opened.

If someone can suggest a better alternative, I'll see what I can
do; otherwise I guess I should add comments around the key places.

>> ***************
>> *** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
>> --- 1592,1607 ----
>> rel = heap_open(rte->relid, NoLock);
>>
>> /*
>> + * Skip materialized view expansion when resultRelation is set.
>> + */
>> + if (rel->rd_rel->relkind == RELKIND_MATVIEW &&
>> + rel->rd_rel->relisvalid)
>> + {
>> + heap_close(rel, NoLock);
>> + break;
>> + }
>
> Would you elaborate on this?

It's diretly related to the point immediately preceding. At this
point we have thrown an error if the MV is invalid and being used
as a source of data. If it is the target of data it is flagged as
invalid so that it will not be expanded. Maybe we need a better way
to determine this, but I'm not sure just what to use.

>> + /* Strip off the trailing semicolon so that other things may follow. */
>> + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);
>
> I suggest verifying that the last character is indeed a semicolon.

How about if I have it exit_horribly if the semicolon added 21
lines up has disappeared? Or use Assert if we have that for the
frontend now?

> > + static void
> > + dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo)
>
> This is so similar to dumpIndex(); can we avoid this level of duplication?

It is identical except for name. I can only assume that I thought I
needed a modified version and changed my mind. Removed.

> Please retain an interesting sample of materialized views in the regression
> database. Among other benefits, the pg_upgrade test suite exercises pg_dump
> and pg_upgrade for all object types retained in the regression database.

OK

> The regression tests should probably include a few other wrinkles, like an
> index on a MV.

Yeah. Will do.

> Creating a RULE on an MV succeeds, but refreshing the view then fails:

Fixed by prohibiting CREATE RULE on an MV.

> The documentation is a good start. I would expect a brief introduction in
> Tutorial -> Advanced Features and possibly a deeper discussion under The SQL
> Language. I suggest updating Explicit Locking to mention the new commands;
> users will be interested in the lock level of a refresh.

Yeah, the docs need another pass. It seemed prudent to make sure of
what I was documenting first.

> You have chosen to make pg_dump preserve the valid-or-invalid state of each
> MV. That seems reasonable, though I'm slightly concerned about the case of a
> dump taken from a standby.

I'm not clear on the problem. Could you explain?

> Some of the ALTER TABLE variants would make plenty of sense for MVs:
>
>  ALTER [ COLUMN ] column_name SET STATISTICS integer
>  ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
>  ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
>  ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
>
> It wouldn't be a problem to skip those for the first patch, though.
> Conversely, this syntax is accepted:
>
>  ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )
>
> But there are no available options. The only option accepted for regular
> views, security_barrier, is rejected. MVs always have security_barrier
> semantics, in any event.

I think those are doc problems, not implementation of the
functionality. Will double-check and fix where needed.

> Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
> determine which ones make sense for MVs. For each one in the sensible set,
> either allow it or add a comment indicating that it could reasonably be
> allowed in the future. For each one outside the set, forbid it. Verify that
> the documentation, the results of your evaluation, and the actual allowed
> operations are all consistent.

I have already tried to do that in the coding, although maybe you
think more comments are needed there? The docs definitely need to
catch up. This part isn't in flux, so I'll fix that part of the
docs in the next day or two.

Thanks for the review!

-Kevin

Attachment Content-Type Size
matview-v3.patch.gz application/x-gzip 51.6 KB

From: Thom Brown <thom(at)linux(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-25 10:42:45
Message-ID: CAA-aLv5oq9SXJqcERdPQnR8epM=rOMUVJrJeousdfrxOVzw==w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25 January 2013 06:00, Kevin Grittner <kgrittn(at)mail(dot)com> wrote:
> Noah Misch wrote:
>
>> The patch conflicts with git master; I tested against master(at){2013-01-20}(dot)
>
> New patch rebased, fixes issues raised by Thom Brown, and addresses
> some of your points.

Thanks for the new version. All previous issues I raised have been resolved.

I have an inconsistency to note between VIEWs and MATERIALIZED VIEWs:

CREATE VIEW v_test8 (meow, "?column?") AS SELECT 1 bark, 2;
CREATE MATERIALIZED VIEW mv_test8 (meow, "?column?") AS SELECT 1 bark, 2;

pg_dump output:

CREATE VIEW v_test8 AS
SELECT 1 AS meow, 2;

CREATE MATERIALIZED VIEW mv_test8 (
meow,
"?column?"
) AS
SELECT 1 AS meow, 2
WITH NO DATA;

The materialized view adds in column name parameters, whereas the
standard view doesn't. But it seems to add column parameters
regardless:

CREATE VIEW v_test9 AS SELECT 1;
CREATE MATERIALIZED VIEW v_test9 AS SELECT 1;

CREATE VIEW v_test9 AS
SELECT 1;

CREATE MATERIALIZED VIEW mv_test9 (
"?column?"
) AS
SELECT 1
WITH NO DATA;

VIEWs never seem to use column parameters, MATERIALIZED VIEWs always
appear to use them.

--
Thom


From: Noah Misch <noah(at)leadboat(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-26 22:09:22
Message-ID: 20130126220922.GA22856@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 25, 2013 at 01:00:59AM -0500, Kevin Grittner wrote:
> Noah Misch wrote:
> > > *** a/contrib/sepgsql/sepgsql.h
> > > --- b/contrib/sepgsql/sepgsql.h
> > > ***************
> > > *** 32,37 ****
> > > --- 32,39 ----
> > >
> > > /*
> > > * Internally used code of object classes
> > > + *
> > > + * NOTE: Materialized views are treated as tables for now.
> >
> > This smells like a bypass of mandatory access control. Unless you've
> > determined that this is correct within the sepgsql security model, I suggest
> > starting with a draconian policy, like simply crippling MVs. Even if you have
> > determined that, separating out the nontrivial sepgsql support might be good.
> > The set of ideal reviewers is quite different.
>
> Robert suggested this way of coping for now. Will post just the
> sepgsql separately to try to attract the right crowd to confirm.

Sounds good.

> > Let's not support OIDs on MVs. They'll be regenerated on every refresh.
>
> Do they have any value for people who might want to use cursors?

Not that I have heard, for whatever that's worth.

> > > + /*
> > > + * Kludge here to allow refresh of a materialized view which is invalid
> > > + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
> > > + * first two RangeTblEntry list elements, which were added to the front
> > > + * of the rewritten Query to keep the rules system happy, with the
> > > + * isResultRel flag to indicate that it is OK if they are flagged as
> > > + * invalid.
> > > + */
> > > + rtable = dataQuery->rtable;
> > > + ((RangeTblEntry *) linitial(rtable))->isResultRel = true;
> >> + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true;
> >
> > Is it safe to assume that the first two RTEs are the correct ones to flag?
>
> I'm trying to play along with UpdateRangeTableOfViewParse() in
> view.c. See the comment in front of that function for details.

Ah. Perhaps assert that those RTEs have the aliases "old" and "new"?

> >> + ExecCheckRelationsValid(rangeTable);
> >
> > I believe this ought to happen after the executor lock acquisitions, perhaps
> > right in ExecOpenScanRelation(). Since you'll then have an open Relation,
> > RelationIsFlaggedAsValid() can use the relcache.
>
> That would break MVs entirely. This probably deserves more
> comments. It's a little fragile, but was the best way I found to
> handle things. An MV has a rule associated with it, just like a
> "regular" view, which is parse-analyzed but not rewritten or
> planned. We need to allow the rewrite and planning for statements
> which populate the view, but suppress expansion of the rule for
> simple references. It is OK for an MV to be invalid if it is being
> populated, but not if it is being referenced. Long story short,
> this call helps determine which relations will be opened.
>
> If someone can suggest a better alternative, I'll see what I can
> do; otherwise I guess I should add comments around the key places.

I see. It seems wrong to check MV validity before the executor locks a table;
if the executor lock were in fact the first lock on the relation, then the
view could become invalid again before we lock it. I don't know a way to
actually make the executor's lock be the first lock; some parser or planner
lock invariably seems to precede it. Is that proper to rely on?

> >> + /* Strip off the trailing semicolon so that other things may follow. */
> >> + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);
> >
> > I suggest verifying that the last character is indeed a semicolon.
>
> How about if I have it exit_horribly if the semicolon added 21
> lines up has disappeared? Or use Assert if we have that for the
> frontend now?

The code 21 lines back is adding a semicolon to a query being sent to the
server to retrieve the view's definition. Here you're removing the trailing
semicolon from a column of the server's response. Granted, there's not much
reason we'd ever change the server to omit the trailing semicolon, and the
breakage would be relatively obvious even without an explicit test here.

> > You have chosen to make pg_dump preserve the valid-or-invalid state of each
> > MV. That seems reasonable, though I'm slightly concerned about the case of a
> > dump taken from a standby.
>
> I'm not clear on the problem. Could you explain?

Currently none. If you took my suggestion regarding relisvalid, then MVs
would be considered invalid on the standby. That is one disadvantage of the
suggestion, perhaps.

> > Some of the ALTER TABLE variants would make plenty of sense for MVs:
> >
> > ??ALTER [ COLUMN ] column_name SET STATISTICS integer
> > ??ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
> > ??ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
> > ??ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
> >
> > It wouldn't be a problem to skip those for the first patch, though.
> > Conversely, this syntax is accepted:
> >
> > ??ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )
> >
> > But there are no available options. The only option accepted for regular
> > views, security_barrier, is rejected. MVs always have security_barrier
> > semantics, in any event.
>
> I think those are doc problems, not implementation of the
> functionality. Will double-check and fix where needed.

Looks so.

> > Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
> > determine which ones make sense for MVs. For each one in the sensible set,
> > either allow it or add a comment indicating that it could reasonably be
> > allowed in the future. For each one outside the set, forbid it. Verify that
> > the documentation, the results of your evaluation, and the actual allowed
> > operations are all consistent.
>
> I have already tried to do that in the coding, although maybe you
> think more comments are needed there? The docs definitely need to
> catch up. This part isn't in flux, so I'll fix that part of the
> docs in the next day or two.

I won't worry about such comments for now; it looks like you're targeting most
of the reasonable-to-support ALTER operations. I just didn't realize that the
docs were out of date in this respect.

By the way, your mailer omits References: and In-Reply-To: headers, breaking
the thread.

nm


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-02-15 17:44:30
Message-ID: 511E73FE.60104@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/25/13 1:00 AM, Kevin Grittner wrote:
> New patch rebased, fixes issues raised by Thom Brown, and addresses
> some of your points.

This patch doesn't apply anymore, so I just took a superficial look. I
think the intended functionality and the interfaces look pretty good.
Documentation looks complete, tests are there.

I have a couple of notes:

* What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED.
It might be better to use that as well then.

* You use fields named relkind in the parse nodes, but they don't
actually contain relkind values, which is confusing. I'd just name the
field is_matview or something.

* More generally, I wouldn't be so fond of combining the parse handling
of CREATE TABLE AS and CREATE MATERIALIZED VIEW. They are similar, but
then again so are a lot of other things.

* Some of the terminology is inconsistent. A materialized view is
sometimes called valid, populated, or built, with approximately the same
meaning. Personally, I would settle on "built", as per above, but it
should be one term only.

* I find the name of the relisvalid column a bit confusing. Especially
because it only applies to materialized views, and there is already a
meaning of "valid" for indexes. (Recall that indexes are also stored in
pg_class, but they are concerned about indisvalid.) I would name it
something like relmvbuilt.

Btw., half of the patch seems to consist of updating places referring to
relkind. Is something wrong with the meaning of relkind that this sort
of thing is required? Maybe these places should be operating in terms
of features, not accessing relkind directly.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-03-06 16:11:58
Message-ID: 51376ACE.50409@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin,

I haven't seen a reply to this. Were you able to give my notes below
any consideration?

On 2/15/13 12:44 PM, Peter Eisentraut wrote:
> On 1/25/13 1:00 AM, Kevin Grittner wrote:
>> New patch rebased, fixes issues raised by Thom Brown, and addresses
>> some of your points.
>
> This patch doesn't apply anymore, so I just took a superficial look. I
> think the intended functionality and the interfaces look pretty good.
> Documentation looks complete, tests are there.
>
> I have a couple of notes:
>
> * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED.
> It might be better to use that as well then.
>
> * You use fields named relkind in the parse nodes, but they don't
> actually contain relkind values, which is confusing. I'd just name the
> field is_matview or something.
>
> * More generally, I wouldn't be so fond of combining the parse handling
> of CREATE TABLE AS and CREATE MATERIALIZED VIEW. They are similar, but
> then again so are a lot of other things.
>
> * Some of the terminology is inconsistent. A materialized view is
> sometimes called valid, populated, or built, with approximately the same
> meaning. Personally, I would settle on "built", as per above, but it
> should be one term only.
>
> * I find the name of the relisvalid column a bit confusing. Especially
> because it only applies to materialized views, and there is already a
> meaning of "valid" for indexes. (Recall that indexes are also stored in
> pg_class, but they are concerned about indisvalid.) I would name it
> something like relmvbuilt.
>
>
> Btw., half of the patch seems to consist of updating places referring to
> relkind. Is something wrong with the meaning of relkind that this sort
> of thing is required? Maybe these places should be operating in terms
> of features, not accessing relkind directly.
>
>
>