Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Date: 2013-08-01 04:22:16
Message-ID: 20130801042216.GA331932@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, Jul 31, 2013 at 03:39:14AM +0200, Andres Freund wrote:
> On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance
> > >   be done a good bit earlier? We're executing queries before
> > >   that.
> >
> > This can't be in effect while we are creating temp tables.  If
> > we're going to support a differential REFRESH technique, it must be
> > done more "surgically" than to wrap the whole REFRESH statement
> > execution.

That InSecurityRestrictedOperation() check happens in DefineRelation(); I
advise bypassing it by calling heap_create_with_catalog() directly.
make_new_heap() is a comparable example. It's a good thing for more reasons
than the one prompting this. For example, the current implementation will
fail if the caller lacks TEMP permission on the database. Though it's
probably a rare thing to deny TEMP but allow creating permanent objects such
as a MV, that's a wart.

> Sure, it cannot be active when creating the temp table. But it's
> currently inactive while you SPI_execute queries. That can't be right.

It may be okay once the operator search issues Andres notes later are fixed.
The queries that now run outside a security restricted context _should_ call
only functions associated with default B-tree or hash equality operators.
Since only superusers can define operator classes, we can assume that those
operators never do something unseemly. But if the code will indeed be safe
for that reason, I would hope for a comment explaining as much. That would
also make the SetUserIdAndSecContext() calls later in refresh_by_match_merge()
unnecessary. All that being said, the implementation would be more robust if
we remove the SQL-level use of CREATE TEMP TABLE and return to running the
entire operation as the MV owner.

> > > * All not explicitly looked up operators should be qualified
> > >   using OPERATOR(). It's easy to define your own = operator for
> > >   tid and set the search_path to public,pg_catalog. Now, the
> > >   whole thing is restricted to talbe owners afaics, making this
> > >   decidedly less dicey, but still. But if anyobdy actually uses a
> > >   search_path like the above you can easily hurt them.
> > > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM
> > >   x.* operations.
> > > * (y.*) = (x.*) also needs to do proper operator lookups.

A few supplements to Andres's comments:

1. ri_triggers.c is the model for constructing queries in a manner that
guarantees uniform semantics in the face of overloading and search_path
variations. Check out how it handles these issues.

2. DISTINCT FROM and IN syntax cannot be used in this kind of query at all,
because they always depend on the search path. However, you could construct
your own DistinctExpr node referencing the operator you want.

Here's an SQL statement generated by this code that currently uses "rowvar IS
DISTINCT FROM rowvar":

CREATE TEMP TABLE pg_temp_1.pg_temp_41108_2 AS
SELECT x.ctid AS tid, y
FROM public.mv x FULL JOIN pg_temp_1.pg_temp_41108 y ON
(y.key OPERATOR(pg_catalog.=) x.key AND y = x)
WHERE (y.*) IS DISTINCT FROM (x.*)
ORDER BY tid

The point of the "(y.*) IS DISTINCT FROM (x.*)" is just to filter out the
inner portion of the full join, right? Could write that a different way.

> I think for the cases where you're comparing tids it's fine to just to
> hardcode the operator to OPERATOR(pg_catalog.=).

Yes, that's fine whenever you're dealing with a known data type that uses "="
as its default B-tree equality operator (e.g. nearly every built-in type). So
"(x.*) OPERATOR(pg_catalog.=) (y.*)" for records is fine, too.

Kevin, this code is aware that "rowvar = rowvar" uses DISTINCT FROM semantics
for individual columns, right? The block comment at refresh_by_match_merge()
talks about NULL comparisons, but I couldn't tell definitively whether it
recognizes that fact.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Stephen Frost 2013-08-01 05:13:37 pgsql: Allow a context to be passed in for error handling
Previous Message Alvaro Herrera 2013-07-31 22:05:35 pgsql: Fix mis-indented lines

Browse pgsql-hackers by date

  From Date Subject
Next Message Gibheer 2013-08-01 05:19:49 Re: Backup throttling
Previous Message Fujii Masao 2013-08-01 03:37:47 Re: [9.3 bug] disk space in pg_xlog increases during archive recovery