GSoC - Materialized Views - is stale or fresh?

Lists: pgsql-hackers
From: Pavel Baros <baros(dot)p(at)seznam(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: GSoC - Materialized Views - is stale or fresh?
Date: 2010-06-12 14:17:08
Message-ID: 4C1396E4.7000308@seznam.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am curious how could I solve the problem:

During refreshing I would like to know, if MV is stale or fresh? And I
had an idea:

In fact, MV need to know if its last refresh (transaction id) is older
than any INSERT, UPDATE, DELETE transaction launched against source
tables. So if MV has information about last (highest) xmin in source
tables, it could simply compare its own xmin to xmins (xmax for deleted
rows) from source tables and decide, if is stale or fresh.

Whole realization could look like this:
1. Make new column in pg_class (or somewhere in pg_stat* ?):
pg_class.rellastxid (of type xid)

2. After each INSERT, UPDATE, DELETE statement (transaction)
pg_class.rellastxid would be updated. That should not be time- or
memory- consuming (not so much) since pg_class is cached, I guess.

3. When refreshing, as described above, MV rellastxid compared to
source tables rellastxid could answer if MV is stale or still fresh.
Decision, if to run refreshing, would be as simple as it can.

a) Is the idea right?

b) Could appear some cases when it is not true? (except xid wraparound).

c) I was looking for some help with it in pg_stat*, but there is no
information about transactions, last changes in relations or anything.

d) or there are other mechanisms or ideas how to check if MV source
tables are changed from last refresh?

Thanks for replies

Pavel Baros


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Pavel Baros <baros(dot)p(at)seznam(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GSoC - Materialized Views - is stale or fresh?
Date: 2010-06-14 05:54:21
Message-ID: 4C15C40D.1060702@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Baros wrote:
> After each INSERT, UPDATE, DELETE statement (transaction)
> pg_class.rellastxid would be updated. That should not be time- or
> memory- consuming (not so much) since pg_class is cached, I guess.

An update in PostgreSQL is essentially an INSERT followed a later DELETE
when VACUUM gets to the dead row no longer visible. The problem with
this approach is that it will leave behind so many dead rows in pg_class
due to the heavy updates that the whole database could grind to a halt,
as so many operations will have to sort through all that garbage. It
could potentially double the total write volume on the system, and
you'll completely kill people who don't have autovacuum running during
some periods of the day.

The basic idea of saving the last update time for each relation is not
unreasonable, but you can't store the results by updating pg_class. My
first thought would be to send this information as a message to the
statistics collector. It's already being sent updates at the point
you're interested in for the counters of how many INSERT/UPDATE/DELETE
statements are executing against the table. You might bundle your last
update information into that existing message with minimal overhead.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Pavel Baros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GSoC - Materialized Views - is stale or fresh?
Date: 2010-06-14 09:00:32
Message-ID: AANLkTimnF0Cl0WNjaLxRpWTpwd2O1fQl5dh8xSCuZ2xm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/6/14 Greg Smith <greg(at)2ndquadrant(dot)com>:
> Pavel Baros wrote:
>>
>> After each INSERT, UPDATE, DELETE statement (transaction)
>> pg_class.rellastxid would be updated. That should not be time- or memory-
>> consuming (not so much) since pg_class is cached, I guess.
>
> An update in PostgreSQL is essentially an INSERT followed a later DELETE
> when VACUUM gets to the dead row no longer visible.  The problem with this
> approach is that it will leave behind so many dead rows in pg_class due to
> the heavy updates that the whole database could grind to a halt, as so many
> operations will have to sort through all that garbage.  It could potentially
> double the total write volume on the system, and you'll completely kill
> people who don't have autovacuum running during some periods of the day.
>
> The basic idea of saving the last update time for each relation is not
> unreasonable, but you can't store the results by updating pg_class.  My
> first thought would be to send this information as a message to the
> statistics collector.  It's already being sent updates at the point you're
> interested in for the counters of how many INSERT/UPDATE/DELETE statements
> are executing against the table.  You might bundle your last update
> information into that existing message with minimal overhead.

Right. Do remember that the stats collector is designed to be lossy,
though, so you're not guaranteed that the information reaches the
other end. In reality it tends to do that, but there needs to be some
sort of recovery path for the case when it doesn't.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Pavel Baros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSoC - Materialized Views - is stale or fresh?
Date: 2010-06-14 11:46:19
Message-ID: AANLkTikLr_rFSPi3SUVVDicutgasuZaeQda4aRPetKEx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 14, 2010 at 5:00 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> 2010/6/14 Greg Smith <greg(at)2ndquadrant(dot)com>:
>> Pavel Baros wrote:
>>>
>>> After each INSERT, UPDATE, DELETE statement (transaction)
>>> pg_class.rellastxid would be updated. That should not be time- or memory-
>>> consuming (not so much) since pg_class is cached, I guess.
>>
>> An update in PostgreSQL is essentially an INSERT followed a later DELETE
>> when VACUUM gets to the dead row no longer visible.  The problem with this
>> approach is that it will leave behind so many dead rows in pg_class due to
>> the heavy updates that the whole database could grind to a halt, as so many
>> operations will have to sort through all that garbage.  It could potentially
>> double the total write volume on the system, and you'll completely kill
>> people who don't have autovacuum running during some periods of the day.
>>
>> The basic idea of saving the last update time for each relation is not
>> unreasonable, but you can't store the results by updating pg_class.  My
>> first thought would be to send this information as a message to the
>> statistics collector.  It's already being sent updates at the point you're
>> interested in for the counters of how many INSERT/UPDATE/DELETE statements
>> are executing against the table.  You might bundle your last update
>> information into that existing message with minimal overhead.
>
> Right. Do remember that the stats collector is designed to be lossy,
> though, so you're not guaranteed that the information reaches the
> other end. In reality it tends to do that, but there needs to be some
> sort of recovery path for the case when it doesn't.

What Pavel's trying to do here is be smart about figuring out when an
MV needs to be refreshed. I'm pretty sure this is the wrong way to go
about it, but it seems entirely premature considering that we don't
have a working implementation of a *manually* refreshed MV.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Magnus Hagander" <magnus(at)hagander(dot)net>
Cc: "Greg Smith" <greg(at)2ndquadrant(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Pavel Baros" <baros(dot)p(at)seznam(dot)cz>
Subject: Re: GSoC - Materialized Views - is stale or fresh?
Date: 2010-06-14 15:17:55
Message-ID: 4C1601D30200002500032296@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> What Pavel's trying to do here is be smart about figuring out when
> an MV needs to be refreshed. I'm pretty sure this is the wrong
> way to go about it, but it seems entirely premature considering
> that we don't have a working implementation of a *manually*
> refreshed MV.

Agreed all around.

At the risk of sounding obsessed, this is an area where predicate
locks might be usefully extended, if and when the serializable patch
makes it in.

-Kevin


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Pavel Baros <baros(dot)p(at)seznam(dot)cz>
Subject: Re: GSoC - Materialized Views - is stale or fresh?
Date: 2010-06-14 18:23:34
Message-ID: 4C1673A6.8090908@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> At the risk of sounding obsessed, this is an area where predicate
> locks might be usefully extended, if and when the serializable patch
> makes it in.

Yes, we see your patch in 9.1-first. ;-)

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com