Re: Materialized views WIP patch

Lists: pgsql-hackers
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Kevin Grittner" <kgrittn(at)mail(dot)com>,"Marko Tiikkaja" <pgmail(at)joh(dot)to>
Cc: "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-16 05:40:55
Message-ID: 20130116054055.324960@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

- pg_dump sorting for MVs which depend on other MVs
- proper handling of the relisvalid flag for unlogged MVs after recovery

I've been hacking at the code in those areas without success;
what's here is the least broken form I have, but work is still
needed for these cases. Any other problems are news to me.

In addition, the docs need another pass, and there is an open
question about what is the right thing to use for TRUNCATE syntax.

-Kevin

Attachment Content-Type Size
matview-v2.patch text/x-patch 248.3 KB

From: Thom Brown <thom(at)linux(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-16 17:07:29
Message-ID: CAA-aLv52bEa58aYUGCO4hZY+kiL6RGi-+_eJEz-FNzznVpbNMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 January 2013 05:40, Kevin Grittner <kgrittn(at)mail(dot)com> wrote:

> Here is a new version of the patch, with most issues discussed in
> previous posts fixed.
>
> I've been struggling with two areas:
>
> - pg_dump sorting for MVs which depend on other MVs
> - proper handling of the relisvalid flag for unlogged MVs after recovery
>

Some weirdness:

postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
Materialized view "public.mv_test2"
Column | Type | Modifiers | Storage | Stats target | Description
----------+---------+-----------+---------+--------------+-------------
moo | integer | | plain | |
?column? | integer | | plain | |
View definition:
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";
Has OIDs: no

The "weirdness" I refer you to is the view definition. This does not occur
with a straightforward UNION.

This does not occur with a regular view:

postgres=# CREATE VIEW v_test3 AS SELECT moo, 2*moo FROM v_test2 UNION ALL
SELECT moo, 3*moo FROM v_test2;
CREATE VIEW
postgres=# \d+ v_test3
View "public.v_test3"
Column | Type | Modifiers | Storage | Description
----------+---------+-----------+---------+-------------
moo | integer | | plain |
?column? | integer | | plain |
View definition:
SELECT v_test2.moo, 2 * v_test2.moo
FROM v_test2
UNION ALL
SELECT v_test2.moo, 3 * v_test2.moo
FROM v_test2;

--
Thom


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-17 16:46:32
Message-ID: CA+U5nM+3LBi1NcgVkQgd5=Mq2FRj+2Je1HF+QBq-nv3ArkuWMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 January 2013 05:40, Kevin Grittner <kgrittn(at)mail(dot)com> wrote:

> Here is a new version of the patch, with most issues discussed in
> previous posts fixed.

Looks good.

The patch implements one kind of MV. In the future, we hope to have
other features or other kinds of MV alongside this:
* Snapshot MV - built once at start and then refreshed by explicit command only
* Snapshot MV with fast refresh
* Maintained MV (lazy) - changes trickle continuously into lazy MVs
* Maintained MV (transactional) - changes applied as part of write transactions
and or others

So I think we should agree now some aspects of those other options so
we can decide syntax. Otherwise we'll be left in the situation that
what we implement in 9.3 becomes the default for all time and/or we
have difficulties adding things later. e.g.
REFRESH ON COMMAND

Also, since there is no optimizer linkage between these MVs and the
tables they cover, I think we need to have that explicitly as a
command option, e.g.
DISABLE OPTIMIZATION

That way in the future we can implement "ENABLE OPTIMIZATION" mode and
REFRESH TRANSACTIONAL mode as separate items.

So all I am requesting is that we add additional syntax now, so that
future additional features are clear.

Please suggest syntax, not wedded to those... and we may want to use
more compatible syntax also.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)leadboat(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Marko Tiikkaja <pgmail(at)joh(dot)to>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-01-24 18:09:28
Message-ID: 20130124180928.GC2448@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Kevin,

The patch conflicts with git master; I tested against master(at){2013-01-20}(dot)

On Wed, Jan 16, 2013 at 12:40:55AM -0500, Kevin Grittner wrote:
> I've been struggling with two areas:
>
> - pg_dump sorting for MVs which depend on other MVs

From your later messages, I understand that you have a way forward on this.

> - proper handling of the relisvalid flag for unlogged MVs after recovery

I have discussed this in a separate email. While reading the patch to assess
that topic, I found a few more things:

> *** a/contrib/pg_upgrade/version_old_8_3.c
> --- b/contrib/pg_upgrade/version_old_8_3.c
> ***************
> *** 145,151 **** old_8_3_check_for_tsquery_usage(ClusterInfo *cluster)
> "FROM pg_catalog.pg_class c, "
> " pg_catalog.pg_namespace n, "
> " pg_catalog.pg_attribute a "
> ! "WHERE c.relkind = 'r' AND "
> " c.oid = a.attrelid AND "
> " NOT a.attisdropped AND "
> " a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "
> --- 145,151 ----
> "FROM pg_catalog.pg_class c, "
> " pg_catalog.pg_namespace n, "
> " pg_catalog.pg_attribute a "
> ! "WHERE c.relkind in ('r', 'm') AND "
> " c.oid = a.attrelid AND "
> " NOT a.attisdropped AND "
> " a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "

PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
matter whether this change happens or not. I suggest adding a comment,
whether or not you keep the code change.

> *** a/contrib/sepgsql/sepgsql.h
> --- b/contrib/sepgsql/sepgsql.h
> ***************
> *** 32,37 ****
> --- 32,39 ----
>
> /*
> * Internally used code of object classes
> + *
> + * NOTE: Materialized views are treated as tables for now.

This smells like a bypass of mandatory access control. Unless you've
determined that this is correct within the sepgsql security model, I suggest
starting with a draconian policy, like simply crippling MVs. Even if you have
determined that, separating out the nontrivial sepgsql support might be good.
The set of ideal reviewers is quite different.

> */
> #define SEPG_CLASS_PROCESS 0
> #define SEPG_CLASS_FILE 1
> *** a/contrib/vacuumlo/vacuumlo.c
> --- b/contrib/vacuumlo/vacuumlo.c
> ***************
> *** 209,215 **** vacuumlo(const char *database, const struct _param * param)
> strcat(buf, " AND a.atttypid = t.oid ");
> strcat(buf, " AND c.relnamespace = s.oid ");
> strcat(buf, " AND t.typname in ('oid', 'lo') ");
> ! strcat(buf, " AND c.relkind = 'r'");
> strcat(buf, " AND s.nspname !~ '^pg_'");
> res = PQexec(conn, buf);
> if (PQresultStatus(res) != PGRES_TUPLES_OK)
> --- 209,215 ----
> strcat(buf, " AND a.atttypid = t.oid ");
> strcat(buf, " AND c.relnamespace = s.oid ");
> strcat(buf, " AND t.typname in ('oid', 'lo') ");
> ! strcat(buf, " AND c.relkind in ('r', 'm')");

It concerns me slightly that older vacuumlo could silently remove large
objects still referenced by MVs. Only slightly, though, because the next MV
refresh would remove those references anyway. Is there anything we should do
to help that situation? If nothing else, perhaps backpatch this patch hunk.

> + <varlistentry>
> + <term><literal>WITH OIDS</></term>
> + <term><literal>WITHOUT OIDS</></term>
> + <listitem>
> + <para>
> + These are obsolescent syntaxes equivalent to <literal>WITH (OIDS)</>
> + and <literal>WITH (OIDS=FALSE)</>, respectively. If you wish to give
> + both an <literal>OIDS</> setting and storage parameters, you must use
> + the <literal>WITH ( ... )</> syntax; see above.
> + </para>
> + </listitem>
> + </varlistentry>

Let's not support OIDs on MVs. They'll be regenerated on every refresh.

> ***************
> *** 336,342 **** ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es,
> */
> void
> ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
> ! const char *queryString, ParamListInfo params)
> {
> if (utilityStmt == NULL)
> return;
> --- 338,345 ----
> */
> void
> ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
> ! const char *queryString, DestReceiver *dest,
> ! ParamListInfo params)
> {
> if (utilityStmt == NULL)
> return;
> ***************
> *** 349,361 **** ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
> * contained parsetree another time, but let's be safe.
> */
> CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
> ! List *rewritten;
>
> Assert(IsA(ctas->query, Query));
> ! rewritten = QueryRewrite((Query *) copyObject(ctas->query));
> ! Assert(list_length(rewritten) == 1);
> ! ExplainOneQuery((Query *) linitial(rewritten), ctas->into, es,
> ! queryString, params);
> }
> else if (IsA(utilityStmt, ExecuteStmt))
> ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es,
> --- 352,366 ----
> * contained parsetree another time, but let's be safe.
> */
> CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
> ! Query *query = (Query *) ctas->query;
> !
> ! dest = CreateIntoRelDestReceiver(into);
>
> Assert(IsA(ctas->query, Query));
> !
> ! query = SetupForCreateTableAs(query, ctas->into, queryString, params, dest);
> !
> ! ExplainOneQuery(query, ctas->into, es, queryString, dest, params);
> }
> else if (IsA(utilityStmt, ExecuteStmt))
> ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es,

If I'm reading this right, you always overwrite the passed-in dest without
looking at it. What's the intent here?

> + /*
> + * Kludge here to allow refresh of a materialized view which is invalid
> + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
> + * first two RangeTblEntry list elements, which were added to the front
> + * of the rewritten Query to keep the rules system happy, with the
> + * isResultRel flag to indicate that it is OK if they are flagged as
> + * invalid.
> + */
> + rtable = dataQuery->rtable;
> + ((RangeTblEntry *) linitial(rtable))->isResultRel = true;
> + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true;

Is it safe to assume that the first two RTEs are the correct ones to flag?

> + /*
> + * Swap the physical files of the target and transient tables, then
> + * rebuild the target's indexes and throw away the transient table.
> + */
> + finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin);

The check_constraints argument should be "true", because the refresh could
have invalidated a UNIQUE index.

> ***************
> *** 3049,3055 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
> break;
> case AT_ClusterOn: /* CLUSTER ON */
> case AT_DropCluster: /* SET WITHOUT CLUSTER */
> ! ATSimplePermissions(rel, ATT_TABLE);
> /* These commands never recurse */
> /* No command-specific prep needed */
> pass = AT_PASS_MISC;
> --- 3104,3110 ----
> break;
> case AT_ClusterOn: /* CLUSTER ON */
> case AT_DropCluster: /* SET WITHOUT CLUSTER */
> ! ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW);

If the user desires an actually-clustered MV, he must re-CLUSTER it after each
refresh. That deserves a documentation mention.

> ***************
> *** 724,729 **** InitPlan(QueryDesc *queryDesc, int eflags)
> --- 765,775 ----
> ExecCheckRTPerms(rangeTable, true);
>
> /*
> + * Ensure that all referrenced relations are flagged as valid.

Typo.

> + */
> + ExecCheckRelationsValid(rangeTable);

I believe this ought to happen after the executor lock acquisitions, perhaps
right in ExecOpenScanRelation(). Since you'll then have an open Relation,
RelationIsFlaggedAsValid() can use the relcache.

> ***************
> *** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
> --- 1592,1607 ----
> rel = heap_open(rte->relid, NoLock);
>
> /*
> + * Skip materialized view expansion when resultRelation is set.
> + */
> + if (rel->rd_rel->relkind == RELKIND_MATVIEW &&
> + rel->rd_rel->relisvalid)
> + {
> + heap_close(rel, NoLock);
> + break;
> + }

Would you elaborate on this?

> + /* Strip off the trailing semicolon so that other things may follow. */
> + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);

I suggest verifying that the last character is indeed a semicolon.

> /*
> + * dumpMatViewIndex
> + * write out to fout a user-defined index
> + */
> + static void
> + dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo)

This is so similar to dumpIndex(); can we avoid this level of duplication?

> *** /dev/null
> --- b/src/test/regress/sql/matview.sql

> + -- test diemv when the mv does exist
> + DROP MATERIALIZED VIEW IF EXISTS tum;
> +
> + -- make sure that dependencies are reported properly when they block the drop
> + DROP TABLE t;
> +
> + -- make sure dependencies are dropped and reported
> + DROP TABLE t CASCADE;

Please retain an interesting sample of materialized views in the regression
database. Among other benefits, the pg_upgrade test suite exercises pg_dump
and pg_upgrade for all object types retained in the regression database.

The regression tests should probably include a few other wrinkles, like an
index on a MV.

Creating a RULE on an MV succeeds, but refreshing the view then fails:

[local] test=# create rule mvrule as on insert to mymv where 1 = 0 do also select 1;
CREATE RULE
[local] test=# REFRESH MATERIALIZED VIEW mymv;
ERROR: materialized view "mymv" has too many rules

The documentation is a good start. I would expect a brief introduction in
Tutorial -> Advanced Features and possibly a deeper discussion under The SQL
Language. I suggest updating Explicit Locking to mention the new commands;
users will be interested in the lock level of a refresh.

You have chosen to make pg_dump preserve the valid-or-invalid state of each
MV. That seems reasonable, though I'm slightly concerned about the case of a
dump taken from a standby.

We support ALTER TABLE against regular views for historical reasons. When we
added foreign tables, we did not extend that permissiveness; one can only use
ALTER FOREIGN TABLE on foreign tables. Please do the same for materialized
views. See RangeVarCallbackForAlterRelation(). Note that "ALTER TABLE
... RENAME colname TO newname" and "ALTER TABLE ... RENAME CONSTRAINT" are
currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW.

There's no documented support for table constraints on MVs, but UNIQUE
constraints are permitted:

[local] test=# alter materialized view mymv add unique (c);
ALTER MATERIALIZED VIEW
[local] test=# alter materialized view mymv add check (c > 0);
ERROR: "mymv" is not a table
[local] test=# alter materialized view mymv add primary key (c);
ERROR: "mymv" is not a table or foreign table

Some of the ALTER TABLE variants would make plenty of sense for MVs:

ALTER [ COLUMN ] column_name SET STATISTICS integer
ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

It wouldn't be a problem to skip those for the first patch, though.
Conversely, this syntax is accepted:

ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )

But there are no available options. The only option accepted for regular
views, security_barrier, is rejected. MVs always have security_barrier
semantics, in any event.

Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
determine which ones make sense for MVs. For each one in the sensible set,
either allow it or add a comment indicating that it could reasonably be
allowed in the future. For each one outside the set, forbid it. Verify that
the documentation, the results of your evaluation, and the actual allowed
operations are all consistent.

Thanks,
nm


From: Noah Misch <noah(at)leadboat(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-07-14 20:45:26
Message-ID: 20130714204526.GB11636@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While doing some post-commit review of the 9.3 materialized view feature, I
noticed a few loose ends:

On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote:
> Note that [...] "ALTER TABLE ... RENAME CONSTRAINT" [is]
> currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW.
>
> There's no documented support for table constraints on MVs, but UNIQUE
> constraints are permitted:
>
> [local] test=# alter materialized view mymv add unique (c);
> ALTER MATERIALIZED VIEW
> [local] test=# alter materialized view mymv add check (c > 0);
> ERROR: "mymv" is not a table
> [local] test=# alter materialized view mymv add primary key (c);
> ERROR: "mymv" is not a table or foreign table

The above points still apply.

Also, could you explain the use of RelationCacheInvalidateEntry() in
ExecRefreshMatView()? CacheInvalidateRelcacheByRelid() followed by
CommandCounterIncrement() is the typical pattern; this is novel. I suspect,
though, neither is necessary now that the relcache does not maintain populated
status based on a fork size reading.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-08-15 18:37:14
Message-ID: 1376591834.59985.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Apologies, but this sub-thread got lost when I changed email
accounts.  I found it in a final review to make sure nothing had
fallen through the cracks.

Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote:

>> There's no documented support for table constraints on MVs, but
>> UNIQUE constraints are permitted:
>>
>> [local] test=# alter materialized view mymv add unique (c);
>> ALTER MATERIALIZED VIEW

Fix pushed.

> Also, could you explain the use of RelationCacheInvalidateEntry()
> in ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid()
> followed by CommandCounterIncrement() is the typical pattern;
> this is novel. I suspect, though, neither is necessary now that
> the relcache does not maintain populated status based on a fork
> size reading.

Yeah, that was part of the attempt to support unlogged materialized
views while also not returning bogus results if the view had not
been populated, using heap file size.  I agree that this line can
just come out.  If there are no objections real soon now, I will
remove it in master and the 9.3 branch before the release
candidate.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views WIP patch
Date: 2013-08-18 21:28:35
Message-ID: 1376861315.25123.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Noah Misch <noah(at)leadboat(dot)com> wrote:

>> Also, could you explain the use of RelationCacheInvalidateEntry()
>> in ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid()
>> followed by CommandCounterIncrement() is the typical pattern;
>> this is novel. I suspect, though, neither is necessary now that
>> the relcache does not maintain populated status based on a fork
>> size reading.
>
> Yeah, that was part of the attempt to support unlogged materialized
> views while also not returning bogus results if the view had not
> been populated, using heap file size.  I agree that this line can
> just come out.  If there are no objections real soon now, I will
> remove it in master and the 9.3 branch before the release
> candidate.

Done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company