Re: Remaining beta blockers

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remaining beta blockers
Date: 2013-04-30 13:47:23
Message-ID: 20130430134723.GB25261@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Could we please stop the ad-hominem stuff from all sides? We want to
solve the issue not to make it bigger.

On 2013-04-30 04:29:26 -0700, Kevin Grittner wrote:
> Let's look at the "corner" this supposedly paints us into.  If a
> later major release creates a better mechanism, current pg_dump and
> load will already use it, based on the way matviews are created
> empty and REFRESHed by pg_dump.  Worst case, we need to modify the
> behavior of pg_dump running with the switch used by pg_upgrade to
> use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever
> syntax is chosen) -- a command we would probably want at that point
> anyway.  I'm not seeing cause for panic here.

I don't think panic is appropriate either, but I think there are some
valid concerns around this.

1) vacuum can truncate the table to an empty relation already if there is
no data returned by the view's query which then leads to the wrong
scannability state.

S1: CREATE MATERIALIZED VIEW matview_empty AS SELECT false WHERE random() < -1;
S2: S2: SELECT * FROM matview_empty; -- works
S1: VACUUM matview_empty;
S2: SELECT * FROM matview_empty; -- still works
S3: SELECT * FROM matview_empty; -- errors out

So we need to make vacuum skip cleaning out the last page. Once we
get incrementally updated matviews there are more situations to get
into this than just a query not returning anything.
I remember this being discussed somewhere already, but couldn't find
it quickly enough.

Imo this is quite an indicator that the idea of using the filesize is
just plain wrong. Adding logic to vacuum not to truncate data because
its a flag for matview scannability is quite the layer violation and
a sign for a bad design decision. Such a hack has already been added
to copy_heap_data(), while not as bad, shows that it is hard to find
all the places where we need to add it.

2) Since we don't have a metapage to represent scannability in 9.3 we
cannot easily use one in 9.4+ without pg_upgrade emptying all
matviews unless we only rely on the catalogs which we currently
cannot. This will either slow down development or make users
unhappy. Alternatively we can add yet another fork, but that has its
price (quite a bit more open files during normal operation, slower
CREATE DATABASE).

This is actually an argument for not releasing matviews without such
an external state. Going from disk-based state to catalog is easy,
the other way round: not so much.

3) Using the filesize as a flag will make other stuff like preallocating
on-disk data in bigger chunks and related things more complicated.

4) doing the check for scannability in the executor imo is a rather bad
idea. Note that we e.g. don't see an error about a matview which
won't be scanned because of constraint exclusion or one-time
filters.

S1: CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false WITH NO DATA;
S1: SELECT * FROM matview_unit_false;

You can get there in far more reasonable cases than WHERE false.

5) I have to agree with Kevin that the scannability is an important thing
to track though.

a) we cannot remove support for WITH NO DATA because of pg_dump order
and unlogged relations. So even without unlogged relations the
problem exists although its easier to represent.
b) Just giving wrong responses is bad [tm]. Outdated data is something
completely different (it has existed in that state in the (near)
past) than giving an empty response (might never have been a visible
state, and more likely not so in any reasonably near
past). Consider an application behind a pooler suddently getting
an empty response from a SELECT * FROM unlogged_matview; It won't
notice anything without a unscannable state since it probably
won't even notice the database restart.

Not sure what the consequence of this is. The most reasonable solution
seems to be to introduce a metapage somewhere *now* which sucks, but ...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-04-30 13:57:02 Re: The missing pg_get_*def functions
Previous Message Stephen Frost 2013-04-30 13:43:02 Re: The missing pg_get_*def functions