Re: GSoC - proposal - Materialized Views in PostgreSQL

Lists: pgsql-hackers
From: pavelbaros <baros(dot)p(at)seznam(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-09 20:36:04
Message-ID: hpo33i$7am$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I am sending my proposal on GSoC. Details are listed below. Please, if
you have ideas, tips, or if you only want to say you opinion about my
project, go ahead.

thanks,
Pavel Baros

Abstract:

It is effort to implement snapshot materialized view (are only updated
when refreshed) in PostgreSQL. In this time I finished some part of it
and I am trying to publish my present work on git.postgresql.org

Benefits to the PostgreSQL Community

First of all, it would be the best if my work is helpful to everybody
who misses materialized views in PostgreSQL, because PostgreSQL do not
have still implemented materialized views. In addition, MV is mentioned
as feature in TODO list.

Deliverables

First of all, at the end of whole my project is not only writing
bachelors thesis, but finish it as patch and if possible, get patch into
next PostgrSQL release, or keep git repository actual to last PosgreSQL
version. I have also personal goals. Arouse the interest about
implementing MV in PostgreSQL, or at least arouse discussion about it.

Project Schedule

My work goes quite good, I am on good way to finish main parts on
backend in few weeks. After that I will make and run tests and implement
related changes to PosgreSQL tools (psql, pg_dump, etc.). I am also
waiting for approval for my repository named "materialized_view" on
git.postgresql.org, so I could publish completed parts. For now next
step will be to discuss implementation on postgresql.hackers.

Bio

I am from Czech Republic and I am studying on Faculty of Electrical
Engineering on Czech Technical University in Prague www.fel.cvut.cz/en/.
My bachelor thesis is based on this project, implementing MV in PostgreSQL.

I've experienced many different jobs. The best experience for me was,
when I've worked as tester and software engineer in C/C++ and C# for
Radiant Systems Inc. for more than year. After that I've worked as Web
developer with Internet technologies (PHP, HTML, CSS, ...), where the
goal was to make an internal system for an advertising agency. Finally
my recent job was as Windows Mobile Developer. Except the first
experience, others lasts only few months mainly because those were
temporary projects. For now I am looking for some part time job, of
course, preferably something closer to database systems.

Implementation: could be divided to few steps:

1) create materialized view
- modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ...
- change executor, so that it will create physical table defined by
select statement

2) change rewriter
- usually, view is relation with defined rule and when rewriting, rule
is fired and relation (view) is replaced by definition of view. If
relation do not have rule, planner and executor behave to it as physical
table (relation). In case of materialized view we want to rewrite select
statement only in case when we refreshing MV. In other cases rewriter
should skip rewriting and pick up physical relation. Exclude situation
when other rewrite rules which are not related to MV definition are
specified.

3) create command that takes snapshot (refresh MV)
- modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH;
- taking snapshot (refreshing) is similar to command "SELECT INTO ..."
and I decided to follow the way it works. After parsing query and before
transformation is MANUALLY created tree representation of "SELECT * INTO
..." with flag IntoClause->isrefresh set true, indicating it is
refreshing materialized view. Everithing acts as it would be regular
"SELECT INTO ..." except functions OpenIntoRel() and CloseIntoRel(). In
function OpenIntoRel is created temp table (without catalog) and set as
destination for result of select. In function CloseIntoRel executor swap
relfilenode's of temp table and original table and finally delete temp
table. Behavior of CloseIntoRel function is inspired by CLUSTER statement.

Contacts: baros(dot)p(at)seznam(dot)cz


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "pavelbaros" <baros(dot)p(at)seznam(dot)cz>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-09 20:53:58
Message-ID: 4BBF4D96020000250003069C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

pavelbaros <baros(dot)p(at)seznam(dot)cz> wrote:

> I am also waiting for approval for my repository named
> "materialized_view" on git.postgresql.org

They seem to prefer that you get a repository under your name and
use materialized_view as a branch name. See my account on
git.postgresql.org and its serializable branch for an example.

I learned by putting in a request similar to your pending one.
;-)

-Kevin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pavelbaros <baros(dot)p(at)seznam(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-09 22:05:32
Message-ID: x2g603c8f071004091505re8a50a07m5aa465fb6a9a1660@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/4/9 pavelbaros <baros(dot)p(at)seznam(dot)cz>:
> Implementation:  could be divided to few steps:
>
> 1) create materialized view
> - modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ...
> - change executor, so that it will create physical table defined by select
> statement

This basically needs to work the same was as CREATE TABLE ... AS
SELECT ... - save that it should also stuff the rewritten query
someplace, so that it can be re-executed. I think one of the
important design questions here is figuring out exactly where that
"someplace" should be.

I also suspect that we want to block any write access to the relation
except for view refreshes. IOW, INSERT, UPDATE, and DELETE on the
underlying relation should be rejected (though perhaps rewrite rules
redirecting such operations to other tables could be allowed).

> 2) change rewriter
> - usually, view is relation with defined rule and when rewriting, rule is
> fired and relation (view) is replaced by definition of view. If relation do
> not have rule, planner and executor behave to it as physical table
> (relation). In case of materialized view we want to rewrite select statement
> only in case when we refreshing MV. In other cases rewriter should skip
> rewriting and pick up physical relation. Exclude situation when other
> rewrite rules which are not related to MV definition are specified.
>
> 3) create command that takes snapshot (refresh MV)
> - modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH;
> - taking snapshot (refreshing) is similar to command "SELECT INTO ..." and I
> decided to follow the way it works. After parsing query and before
> transformation is MANUALLY created tree representation of "SELECT * INTO
> ..." with flag IntoClause->isrefresh set true, indicating it is refreshing
> materialized view. Everithing acts as it would be regular "SELECT INTO ..."
> except functions OpenIntoRel() and CloseIntoRel(). In function OpenIntoRel
> is created temp table (without catalog) and set as destination for result of
> select. In function CloseIntoRel executor swap relfilenode's of temp table
> and original table and finally delete temp table. Behavior of CloseIntoRel
> function is inspired by CLUSTER statement.

I'll have to read the code before I can comment on the rest of this in detail.

...Robert


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: pavelbaros <baros(dot)p(at)seznam(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-10 03:53:29
Message-ID: 4BBFF639.6010305@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

pavelbaros wrote:
> I am also waiting for approval for my repository named
> "materialized_view" on git.postgresql.org, so I could publish
> completed parts.

Presuming that you're going to wander there and get assigned what looks
like an official repo name for this project is a bit...optimistic. I
would recommend that you publish to something like github instead (you
can fork http://github.com/postgres/postgres ), and if the work looks
good enough that it gets picked up by the community maybe you migrate it
onto the main site eventually. git.postgresql.org is really not setup
to be general hosting space for everyone who has a PostgreSQL related
project; almost every repo on there belongs to someone who has already
been a steady project contributor for a number of years.

(Switching to boilerplate mode for a paragraph...) You have picked a
PostgreSQL feature that is dramatically more difficult than it appears
to be, and I wouldn't expect you'll actually finish even a fraction of
your goals in a summer of work. You're at least in plentiful
company--most students do the same. As a rule, if you see a feature on
our TODO list that looks really useful and fun to work on, it's only
still there because people have tried multiple times to build it
completely but not managed to do so because it's harder than it
appears. This is certainly the case with materialized views.

You've outlined a reasonable way to build a prototype that does a
limited implementation here. The issue is what it will take to extend
that into being production quality for the real-world uses of
materialized views. How useful your prototype is depends on how well it
implements a subset of that in a way that will get used by the final design.

The main hidden complexity in this particular project relates to
handling view refreshes. The non-obvious problem is that when the view
updates, you need something like a SQL MERGE to really handle that in a
robust way that doesn't conflict with concurrent access to queries
against the materialized view. And work on MERGE support is itself
blocked behind the fact that PostgreSQL doesn't have a good way to lock
access to a key value that doesn't exist yet--what other databases call
key range locking. See the notes for "Add SQL-standard
MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo
for more information.

You can work around that to build a prototype by grabbing a full table
lock on the materialized view when updating it, but that's not a
production quality solution. Solving that little detail is actually
more work than the entire project you've outlined. Your suggested
implementation--"In function CloseIntoRel executor swap relfilenode's of
temp table and original table and finally delete temp table"--is where
the full table lock is going to end up at. The exact use cases that
need materialized views cannot handle a CLUSTER-style table recreation
each time that needs an exclusive lock to switchover, so that whole part
of your design is going to be a prototype that doesn't work at all like
what needs to get built to make this feature committable. It's also not
a reasonable assumption that you have enough disk space to hold a second
copy of the MV in a production system.

Once there's a good way to merge updates, how to efficiently generate
them against the sort of large data sets that need materalized views--so
you just write out the updates rather than a whole new copy--is itself a
large project with a significant quantity of academic research to absorb
before starting. Dan Colish at Portland State has been playing around
with prototypes for the specific problem of finding a good algorithm for
view refreshing that is compatible with PostgreSQL's execution model.
He's already recognized the table lock issue here and for the moment is
ignoring that part. I don't have a good feel yet for how long the
targeted update code will take to mature, but based on what I do know I
suspect that little detail is also a larger effort than the entire scope
you're envisioning. There's a reason why the MIT Press compendium
"Materialized Views: Techniques, Implementations, and Applications" is
over 600 pages long--I hope you've already started digging through that
material.

Now, with all that said, that doesn't mean there's not a useful project
for you buried in this mess. The first two steps in your plan:

1) create materialized view
2) change rewriter

Include building a prototype grammer, doing an initial executor
implementation, and getting some sort of rewriter working. That is
potentially good groundwork to lay here. I would suggest that you
completely drop your step 3:

3) create command that takes snapshot (refresh MV)

Because you cannot built that in a way that will be useful (and by that
I mean committable quality) until there's a better way to handle updates
than writing a whole new table and grabbing a full relation lock to
switch to it. To do a good job just on the first two steps should take
at least a whole summer anyway--there's a whole stack of background
research needed I haven't seen anyone do yet, and that isn't on your
plan yet. There is a precedent for taking this approach. After getting
stalled trying to add the entirety of easy partitioning to PostgreSQL,
the current scope has been scaled back to just trying to get the syntax
and on-disk structure right, then finish off the implementation. See
http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
that's been broken into those two major chunks.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-10 04:32:38
Message-ID: v2y603c8f071004092132rd2624525n79a3c7253958ede0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/4/9 Greg Smith <greg(at)2ndquadrant(dot)com>:
> The main hidden complexity in this particular project relates to handling
> view refreshes.  The non-obvious problem is that when the view updates, you
> need something like a SQL MERGE to really handle that in a robust way that
> doesn't conflict with concurrent access to queries against the materialized
> view.  And work on MERGE support is itself blocked behind the fact that
> PostgreSQL doesn't have a good way to lock access to a key value that
> doesn't exist yet--what other databases call key range locking.  See the
> notes for "Add SQL-standard MERGE/REPLACE/UPSERT command" at
> http://wiki.postgresql.org/wiki/Todo for more information.
>
> You can work around that to build a prototype by grabbing a full table lock
> on the materialized view when updating it, but that's not a production
> quality solution.  Solving that little detail is actually more work than the

Hmm... I am not sure you're right about this. It's not obvious to me
that a brief full-table lock wouldn't be acceptable for an initial
implementation. Obviously it wouldn't be suitable for every use case
but since we're talking about manually refreshed views that was bound
to be true anyway.

...Robert


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: pavelbaros <baros(dot)p(at)seznam(dot)cz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-10 07:07:35
Message-ID: 4BC023B7.9040402@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:
> The main hidden complexity in this particular project relates to
> handling view refreshes. The non-obvious problem is that when the view
> updates, you need something like a SQL MERGE to really handle that in a
> robust way that doesn't conflict with concurrent access to queries
> against the materialized view. And work on MERGE support is itself
> blocked behind the fact that PostgreSQL doesn't have a good way to lock
> access to a key value that doesn't exist yet--what other databases call
> key range locking. See the notes for "Add SQL-standard
> MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo
> for more information.
>
> You can work around that to build a prototype by grabbing a full table
> lock on the materialized view when updating it, but that's not a
> production quality solution.

It would still be useful for many applications. And it would provide a
basis to extend later. You don't need to solve all problems at once, as
long as what you implement is a useful subset.

> Now, with all that said, that doesn't mean there's not a useful project
> for you buried in this mess. The first two steps in your plan:
>
> 1) create materialized view
> 2) change rewriter
>
> Include building a prototype grammer, doing an initial executor
> implementation, and getting some sort of rewriter working. That is
> potentially good groundwork to lay here. I would suggest that you
> completely drop your step 3:
>
> 3) create command that takes snapshot (refresh MV)
>
> Because you cannot built that in a way that will be useful (and by that
> I mean committable quality) until there's a better way to handle updates
> than writing a whole new table and grabbing a full relation lock to
> switch to it. To do a good job just on the first two steps should take
> at least a whole summer anyway--there's a whole stack of background
> research needed I haven't seen anyone do yet, and that isn't on your
> plan yet. There is a precedent for taking this approach. After getting
> stalled trying to add the entirety of easy partitioning to PostgreSQL,
> the current scope has been scaled back to just trying to get the syntax
> and on-disk structure right, then finish off the implementation. See
> http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
> that's been broken into those two major chunks.

The good thing about this subject for GSoC is that it can be divided
into many small steps. There's two largely independent main parts:

1. Keep the materialized view up-to-date when the base tables change.
This can be further divided into many steps, you can begin by supporting
automatic updates only on very simple views with e.g a single table and
a where clause. Then extend that to support joins, aggregates,
subqueries etc. Keeping it really limited, you could even require the
user to write the required triggers himself.

2. Teach the planner to use materialized views automatically when a
query references the base tables. So if you issue the query "SELECT *
FROM table WHERE foo > 10 AND bar = 10", and there's a materialized view
on "SELECT * FROM table WHERE bar = 10", the planner can transform the
original query into "SELECT * FROM materializedview WHERE foo > 10".
This largely depends on 1, although some DBMSs offer the option to use
manually refreshed materialized views too, knowing that they might not
be completely up-to-date.

There's a lot room to choose which problems you want to tackle, which is
good for a summer-of-code project. Your proposal basically describes
doing 1, in a limited fashion where the view is not updated
automatically, but only when the DBA runs a command to refresh it. I'm
not sure if that's useful enough on its own, writing "CREATE
MATERIALIZED VIEW ... SELECT ..." doesn't seem any easier than just
writing "CREATE TABLA AS ...". But if you can do something about 2, or
even a very limited part of 1, keeping the view up-to-date
automatically, it becomes much more useful.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-10 13:15:25
Message-ID: 4BC079ED.1070906@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
>
> 1. Keep the materialized view up-to-date when the base tables change.
> This can be further divided into many steps, you can begin by supporting
> automatic updates only on very simple views with e.g a single table and
> a where clause. Then extend that to support joins, aggregates,
> subqueries etc. Keeping it really limited, you could even require the
> user to write the required triggers himself.
>
>

That last bit doesn't strike me as much of an advance. Isn't the whole
point of this to automate it? Creating greedy materialized views is
usually not terribly difficult now, but you do have to write the triggers.

The other thing that could be interesting about this would be some
scheme for lazy refresh that didn't involve re-extracting the whole data
set.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-10 13:18:34
Message-ID: z2p603c8f071004100618t8638ca5dh65111c95cb0cf67f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/4/10 Andrew Dunstan <andrew(at)dunslane(dot)net>:
> Heikki Linnakangas wrote:
>>
>> 1. Keep the materialized view up-to-date when the base tables change.
>> This can be further divided into many steps, you can begin by supporting
>> automatic updates only on very simple views with e.g a single table and
>> a where clause. Then extend that to support joins, aggregates,
>> subqueries etc. Keeping it really limited, you could even require the
>> user to write the required triggers himself.
>
> That last bit doesn't strike me as much of an advance. Isn't the whole point
> of this to automate it? Creating greedy materialized views is usually not
> terribly difficult now, but you do have to write the triggers.

Yeah, I agree.

> The other thing that could be interesting about this would be some scheme
> for lazy refresh that didn't involve re-extracting the whole data set.

One way to do this would be to infer a primary key for the result set
based on the input query. But I think we don't really have the
infrastructure to do this right now, so not really a project for a
beginner.

...Robert


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-11 03:03:31
Message-ID: 4BC13C03.90706@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Your proposal basically describes
> doing 1, in a limited fashion where the view is not updated
> automatically, but only when the DBA runs a command to refresh it. I'm
> not sure if that's useful enough on its own, writing "CREATE
> MATERIALIZED VIEW ... SELECT ..." doesn't seem any easier than just
> writing "CREATE TABLA AS ...". But if you can do something about 2, or
> even a very limited part of 1, keeping the view up-to-date
> automatically, it becomes much more useful.
>

You've hit upon the core issue here. You can build materialized views
right now using "CREATE TABLE AS". You can even update them by creating
a new table the same way, with a new name, and doing the
LOCK/RENAME/DROP shuffle--what people used to do for rewriting bloated
tables before there was CLUSTER. The first step in the proposal here is
essentially syntax to give an easier UI for that. That's an interesting
step, but recognize that it doesn't actually provide anything you can't
do already.

If you then note that doing any sort of incremental update to the view
is a hard problem, and that a lot of the useful cases for materialized
views involve tables where it's impractical to recreate the whole thing
anyway, you'll inevitably find yourself deeply lost in the minutia of
how to handle the updates. It's really the core problem in building
what people expect from a materialized view implementation in a serious
database. Chipping away at the other pieces around it doesn't move the
feature that far forward, even if you get every single one of them
except incremental updates finished, because everything else combined is
still not that much work in comparison to the issues around updates.

There certainly are a fair number of subproblems you can break out of
here. I just think it's important to recognize that the path that leads
to a useful GSoC project and the one that gives a production quality
materialized view implementation may not have that much in common, and
to manage expectations on both sides accordingly. If Pavel thinks he's
going to end up being able to say "I added materialized views to
PostgreSQL" at the end of the summer, that's going to end in
disappointment. And if people think this project plan will lead to
being able to claim PostgreSQL now has this feature, that's also not
going to go well. If the scope is "add initial grammar and rewriting
moving toward a future materialized view feature", which the underlying
implementation noted as a stub prototype, that might work out OK. This
is why I likened it to the work on "Syntax for partitioning", which has
a similarly focused subgoal structure.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-11 03:40:48
Message-ID: 4BC144C0.3020909@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> It's not obvious to me
> that a brief full-table lock wouldn't be acceptable for an initial
> implementation. Obviously it wouldn't be suitable for every use case
> but since we're talking about manually refreshed views that was bound
> to be true anyway.
>

There already is an initial implementation of sorts. There are a couple
of ways you can build these right now, so any new development has to
look like it will end with good odds of being an improvement over what's
already available before it's worth putting development resources into.

As a rough idea of what people want these for in the field, based on
what I've seen requests for, imagine that someone has a 1TB table
they're materializing a view on in order to get at least a 10:1, and
hopefully close to a 100:1, speedup on viewing summary data. Now,
picture what happens if you have someone doing a sequential scan on the
MV, which is still quite big, the updater process lines up to grab an
exclusive lock when it's done, and now a second user wanting to read a
single row quickly comes along behind it. Given a full-table lock
implementation, that scenario is unlikely to play out with the second
user getting a fast response. They'll likely sit in a lock queue for
some potentially long period of time instead, waiting for the active seq
scan to finish then the update to happen. You have to build it that way
or a steady stream of people reading could block out updates forever.

To be frank, that makes for a materalized view implementation of little
value over what you can currently do as far as I'm concerned. It might
be interesting as a prototype, but that's not necessarily going to look
like what's needed to do this for real at all. I'm not a big fan of
dumping work into projects when you can see exactly how it's going to
fail before you even get started. As I see if, if you know where it's
going to fall down, you don't need to build a prototype as an exercise
to show you how to build it--you should work on that part first instead.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-11 05:08:45
Message-ID: g2w603c8f071004102208na5f81e34u95c8f178fd9376f0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 10, 2010 at 11:40 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> To be frank, that makes for a materalized view implementation of little
> value over what you can currently do as far as I'm concerned.  It might be
> interesting as a prototype, but that's not necessarily going to look like
> what's needed to do this for real at all.  I'm not a big fan of dumping work
> into projects when you can see exactly how it's going to fail before you
> even get started.  As I see if, if you know where it's going to fall down,
> you don't need to build a prototype as an exercise to show you how to build
> it--you should work on that part first instead.

Hopefully, you're already aware that I have enormous respect for your
opinions on a wide variety of topics; if not, let me publicly say that
I absolutely do.

Having said that, I disagree with your conclusions in this instance.
I see nothing but upside from this work. It is vastly easier to write
a patch that builds on existing functionality than it is to write
something new from scratch. If there's any value in having manually
refreshed materialized views, then having the simplest possible
implementation of what those can look like committed will make it far
easier to plan out next steps. While the proposed implementation may
not solve a huge number of real-world problems, I think there's a good
argument that some people will get good use of it. Not everyone has
1TB tables with continuous access patterns. And, provided that it
doesn't conflict with anything we want to do in the future, being
useful to some people is a good enough reason to put it in.

I also think that you're underestimating the number of problems that
will have to be solved to get this done. It's going to take some
significant work - both design work and coding work - to figure out
how this should integrate into the rest of the system. (What should
be the value of pg_class.relkind? Where should the node
representation of the snapshot query be stored? And did we handle all
of those OID dependencies correctly?)

Where I can see this possibly falling down (other than being just too
much work for a relative PostgreSQL novice to get it done in one
summer) is if there are concerns about it being incompatible with
incrementally-updated views. I imagine that we're going to want to
eventually support both, so we need to make sure that this
implementation doesn't box us into a corner. But as far as snapshot
views go, complaining that the proposed locking is too strong doesn't
seem quite fair. Fixing that, AFAICS, is a very hard project,
possibly involving significant planner support and an implementation
of MERGE, and I would much rather try to land a fundamentals patch
like this first and then deal with the gyrations that will be involved
in making this work than try to land the whole thing all at once.

Of course, if I'm missing something, and there's a SIMPLE way to get
materialized views that can be refreshed without a full-table lock,
that's another story altogether - maybe you have an idea?

Finally, even if we decided NOT to merge this patch because of the
limitations you mention (and right now that doesn't seem to be the
consensus), having this part of it completed as a starting point for
future work might be reason enough by itself.

In short: I think you may be letting the perfect be the enemy of the good.

...Robert


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-11 09:24:07
Message-ID: 4BC19537.9030109@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> I also think that you're underestimating the number of problems that
> will have to be solved to get this done. It's going to take some
> significant work - both design work and coding work - to figure out
> how this should integrate into the rest of the system. (What should
> be the value of pg_class.relkind? Where should the node
> representation of the snapshot query be stored? And did we handle all
> of those OID dependencies correctly?)
>

I don't think I'm underestimating all that, but I suspect Pavel is by a
considerable amount. This is why I've been suggesting that a GSoC scope
here might just be wrestling with this area of the problem for the whole
summer--not even getting into updates beyond a completely trivial
implementation, if any at all. Things like "handle OID dependencies"
are definitely not on the fun side of the development work that people
tend to think about in advance.

> Where I can see this possibly falling down (other than being just too
> much work for a relative PostgreSQL novice to get it done in one
> summer) is if there are concerns about it being incompatible with
> incrementally-updated views. I imagine that we're going to want to
> eventually support both, so we need to make sure that this
> implementation doesn't box us into a corner.

Exactly my concern; comitting this part without knowing how that's later
going to fit into place strikes me the sort of the thing this project
doesn't like to do. The alternate approach of starting with the update
machinery is less likely IMHO to get stuck wondering if there's a future
blind spot coming or not, since you'd be building from the bottom up
starting with the hardest parts.

From the rest of your comments, I'm comfortable that you're in sync
with the not necessarily obvious risky spots here I wanted to raise
awareness of. It's unreasonable to expect we'll have exactly the same
priorities here, and I doubt it's useful to debate how I perceive the
merit of various development subsets here compared to yourself. I don't
think it's really important whether anyone agrees with me or not about
exactly the value of a full table lock implementation. The main thing
I'm concerned about is just that it's noted as a known risky part, one
that could end up blocking the project's ability to commit even a subset
of the proposed patch here.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-11 14:26:03
Message-ID: 4BC1DBFB.8050700@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> 2010/4/10 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>> Heikki Linnakangas wrote:
>>> 1. Keep the materialized view up-to-date when the base tables change.
>>> This can be further divided into many steps, you can begin by supporting
>>> automatic updates only on very simple views with e.g a single table and
>>> a where clause. Then extend that to support joins, aggregates,
>>> subqueries etc. Keeping it really limited, you could even require the
>>> user to write the required triggers himself.
>> That last bit doesn't strike me as much of an advance. Isn't the whole point
>> of this to automate it? Creating greedy materialized views is usually not
>> terribly difficult now, but you do have to write the triggers.
>
> Yeah, I agree.

It doesn't accomplish anything interesting on its own. But if you do the
planner changes to automatically use the materialized view to satisfy
queries (item 2. in my previous email), it's useful.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-11 18:47:36
Message-ID: u2k603c8f071004111147g3e66e561o1fdaa60e526afe3d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Robert Haas wrote:
>> 2010/4/10 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>> Heikki Linnakangas wrote:
>>>> 1. Keep the materialized view up-to-date when the base tables change.
>>>> This can be further divided into many steps, you can begin by supporting
>>>> automatic updates only on very simple views with e.g a single table and
>>>> a where clause. Then extend that to support joins, aggregates,
>>>> subqueries etc. Keeping it really limited, you could even require the
>>>> user to write the required triggers himself.
>>> That last bit doesn't strike me as much of an advance. Isn't the whole point
>>> of this to automate it? Creating greedy materialized views is usually not
>>> terribly difficult now, but you do have to write the triggers.
>>
>> Yeah, I agree.
>
> It doesn't accomplish anything interesting on its own. But if you do the
> planner changes to automatically use the materialized view to satisfy
> queries (item 2. in my previous email), it's useful.

But you can't do that with a snapshot view, only a continuous updated one.

...Robert


From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 02:13:34
Message-ID: 4BC281CE.2080909@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11.04.10 20:47 , Robert Haas wrote:
> On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Robert Haas wrote:
>>> 2010/4/10 Andrew Dunstan<andrew(at)dunslane(dot)net>:
>>>> Heikki Linnakangas wrote:
>>>>> 1. Keep the materialized view up-to-date when the base tables
>>>>> change. This can be further divided into many steps, you can
>>>>> begin by supporting automatic updates only on very simple
>>>>> views with e.g a single table and a where clause. Then extend
>>>>> that to support joins, aggregates, subqueries etc. Keeping it
>>>>> really limited, you could even require the user to write the
>>>>> required triggers himself.
>>>> That last bit doesn't strike me as much of an advance. Isn't
>>>> the whole point of this to automate it? Creating greedy
>>>> materialized views is usually not terribly difficult now, but
>>>> you do have to write the triggers.
>>>
>>> Yeah, I agree.
>>
>> It doesn't accomplish anything interesting on its own. But if you
>> do the planner changes to automatically use the materialized view
>> to satisfy queries (item 2. in my previous email), it's useful.
>
> But you can't do that with a snapshot view, only a continuous updated
> one.

If continuous updates prove to be too hard initially, you could instead
update the view on select if it's outdated. Such a materialized view
would be a kind of inter-session cache for subselects.

The hard part would probably be to figure out how to decide whether the
view is outdated or not, and to deal with two concurrent transactions
trying to use an outdates view (and both trying to refresh it). What
makes the second problem hard is that you wouldn't want one of the
transactions to wait for the other to complete, because this is not how
SELECTs traditionally behave.

best regards, Florian Pflug


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 03:00:57
Message-ID: g2t603c8f071004112000o6028510l3e97aeb58bb00bec@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 11, 2010 at 10:13 PM, Florian G. Pflug <fgp(at)phlo(dot)org> wrote:
> If continuous updates prove to be too hard initially, you could instead
> update the view on select if it's outdated. Such a materialized view
> would be a kind of inter-session cache for subselects.
>
> The hard part would probably be to figure out how to decide whether the
> view is outdated or not, and to deal with two concurrent transactions
> trying to use an outdates view (and both trying to refresh it). What
> makes the second problem hard is that you wouldn't want one of the
> transactions to wait for the other to complete, because this is not how
> SELECTs traditionally behave.

Well, the proposed project is to create views that only get refreshed manually.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 03:09:45
Message-ID: t2o603c8f071004112009mb5e82ca4o73f51885c6c6f06d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 11, 2010 at 5:24 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> From the rest of your comments, I'm comfortable that you're in sync with the
> not necessarily obvious risky spots here I wanted to raise awareness of.
>  It's unreasonable to expect we'll have exactly the same priorities  here,
> and I doubt it's useful to debate how I perceive the merit of various
> development subsets here compared to yourself.  I don't think it's really
> important whether anyone agrees with me or not about exactly the value of a
> full table lock implementation.  The main thing I'm concerned about is just
> that it's noted as a known risky part, one that could end up blocking the
> project's ability to commit even a subset of the proposed patch here.

I think that one of the things that we need to get our hands around is
how we're going to distinguish the "snapshot" flavor of materialized
view from the "continuous update" flavor. By definition, the latter
will only ever be supportable for a fairly restricted subset of all
possible queries, and I am assuming that we will not want to decide
what the behavior is going to be based on the query but rather based
on what the user specifies. Anything else seems like it would be have
the potential for severe POLA violations. So we need to think now
about how we'll distinguish between the two flavors. I imagine some
sort of syntactic marker would be appropriate; not sure what.

Reading this thread, I'm starting to grow concerned that some people
may feel that manually refreshed materialized views are not even worth
bothering with, because (the argument goes) you could just use some
table and write a function that updates it. There's probably some
truth to that, but I guess my thought is that it would have some value
as a convenience feature; and eventually we might optimize it to the
point where it would make more sense to use the built-in feature
rather than rolling your own. However, if we're going to have
complaints that manually refreshed materialized views suck and we
should only ever support materialized views to the extent that we can
make them automatically update on-the-fly, then let's have those
complaints now before someone spends several months of their life on
the project only to be told that we don't want it. Let's be clear: I
think it's useful, but, if other people disagree, we need to iron that
out now.

...Robert


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 05:13:20
Message-ID: 4BC2ABF0.1040402@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Robert Haas wrote:
>>> 2010/4/10 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>>> Heikki Linnakangas wrote:
>>>>> 1. Keep the materialized view up-to-date when the base tables change.
>>>>> This can be further divided into many steps, you can begin by supporting
>>>>> automatic updates only on very simple views with e.g a single table and
>>>>> a where clause. Then extend that to support joins, aggregates,
>>>>> subqueries etc. Keeping it really limited, you could even require the
>>>>> user to write the required triggers himself.
>>>> That last bit doesn't strike me as much of an advance. Isn't the whole point
>>>> of this to automate it? Creating greedy materialized views is usually not
>>>> terribly difficult now, but you do have to write the triggers.
>>> Yeah, I agree.
>> It doesn't accomplish anything interesting on its own. But if you do the
>> planner changes to automatically use the materialized view to satisfy
>> queries (item 2. in my previous email), it's useful.
>
> But you can't do that with a snapshot view, only a continuous updated one.

A materialized view with manually-defined triggers to keep it up-to-date
is a continuously updated one.

Other DBMSs allow that with snapshot views too, you just don't get
totally up-to-date results, but I not sure we want to go there.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 06:16:16
Message-ID: u2n162867791004112316v39e949c4ic731594bc2094293@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/4/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sun, Apr 11, 2010 at 5:24 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>> From the rest of your comments, I'm comfortable that you're in sync with the
>> not necessarily obvious risky spots here I wanted to raise awareness of.
>>  It's unreasonable to expect we'll have exactly the same priorities  here,
>> and I doubt it's useful to debate how I perceive the merit of various
>> development subsets here compared to yourself.  I don't think it's really
>> important whether anyone agrees with me or not about exactly the value of a
>> full table lock implementation.  The main thing I'm concerned about is just
>> that it's noted as a known risky part, one that could end up blocking the
>> project's ability to commit even a subset of the proposed patch here.
>
> I think that one of the things that we need to get our hands around is
> how we're going to distinguish the "snapshot" flavor of materialized
> view from the "continuous update" flavor.  By definition, the latter
> will only ever be supportable for a fairly restricted subset of all
> possible queries, and I am assuming that we will not want to decide
> what the behavior is going to be based on the query but rather based
> on what the user specifies.  Anything else seems like it would be have
> the potential for severe POLA violations.  So we need to think now
> about how we'll distinguish between the two flavors.  I imagine some
> sort of syntactic marker would be appropriate; not sure what.
>
> Reading this thread, I'm starting to grow concerned that some people
> may feel that manually refreshed materialized views are not even worth
> bothering with, because (the argument goes) you could just use some
> table and write a function that updates it.  There's probably some
> truth to that, but I guess my thought is that it would have some value
> as a convenience feature; and eventually we might optimize it to the
> point where it would make more sense to use the built-in feature
> rather than rolling your own.  However, if we're going to have
> complaints that manually refreshed materialized views suck and we
> should only ever support materialized views to the extent that we can
> make them automatically update on-the-fly, then let's have those
> complaints now before someone spends several months of their life on
> the project only to be told that we don't want it.  Let's be clear: I
> think it's useful, but, if other people disagree, we need to iron that
> out now.
>
> ...Robert

I thing so manually refreshed materialized views has sense. It is
similar to replication - there was replications like slony, but for
some people is more important integrated replication in 9.0. More -
manually refreshed (periodically refreshed) views can share lot if
infrastructure with dynamically actualised views. I am sure so
dynamical materialised views is bad task for GSoC - it is too large,
too complex. Manually refreshed views is adequate to two months work
and it has sense.

Regards
Pavel Stehule

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 10:52:05
Message-ID: l2o603c8f071004120352y99c679fbz9a772fa2622e2fde@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 12, 2010 at 2:16 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I am sure so
> dynamical materialised views is bad task for GSoC - it is too large,
> too complex. Manually refreshed views is adequate to two months work
> and it has sense.

That is my feeling also - though I fear that even the simplest
possible implementation of this feature may be a stretch. Anyway we
agree: keep it simple.

...Robert


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pavelbaros <baros(dot)p(at)seznam(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 17:50:46
Message-ID: 4BC35D76.6000101@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/9/10 1:36 PM, pavelbaros wrote:
> 2) change rewriter
> - usually, view is relation with defined rule and when rewriting, rule
> is fired and relation (view) is replaced by definition of view. If
> relation do not have rule, planner and executor behave to it as physical
> table (relation). In case of materialized view we want to rewrite select
> statement only in case when we refreshing MV. In other cases rewriter
> should skip rewriting and pick up physical relation. Exclude situation
> when other rewrite rules which are not related to MV definition are
> specified.

This was done (although not completed) against PostgreSQL 7.1 by
students in Georgia, USA, I believe. It might be worthwhile looking at
their work if I can find it (if nowhere else, it should be in the ACM).

There are basically 2 major parts for materialized views:

A) Planner: Getting the query planner to swap in the MatView for part of
a query automatically for query plan portions which the MatView supports;

B) Maintenance: maintaining the MatView data according to the programmed
scheme (synch, asynch, periodic).

I do not believe it is possible to do both of the above in one summer.
Of the two, (A) would be more useful since it is possible to manually
implement (B) using triggers, queues and cron jobs today.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 18:04:21
Message-ID: z2j603c8f071004121104zff2eacc2gafd7e2d8881af93b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 12, 2010 at 1:50 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 4/9/10 1:36 PM, pavelbaros wrote:
>> 2) change rewriter
>> - usually, view is relation with defined rule and when rewriting, rule
>> is fired and relation (view) is replaced by definition of view. If
>> relation do not have rule, planner and executor behave to it as physical
>> table (relation). In case of materialized view we want to rewrite select
>> statement only in case when we refreshing MV. In other cases rewriter
>> should skip rewriting and pick up physical relation. Exclude situation
>> when other rewrite rules which are not related to MV definition are
>> specified.
>
> This was done (although not completed) against PostgreSQL 7.1 by
> students in Georgia, USA, I believe.  It might be worthwhile looking at
> their work if I can find it (if nowhere else, it should be in the ACM).
>
> There are basically 2 major parts for materialized views:
>
> A) Planner: Getting the query planner to swap in the MatView for part of
> a query automatically for query plan portions which the MatView supports;
>
> B) Maintenance: maintaining the MatView data according to the programmed
> scheme (synch, asynch, periodic).
>
> I do not believe it is possible to do both of the above in one summer.
> Of the two, (A) would be more useful since it is possible to manually
> implement (B) using triggers, queues and cron jobs today.

I don't believe that it's possible to do EITHER of those things in one
summer. I believe that a basic implementation that has NO bells and
whistles at all, as originally proposed, is going to be a Very Hard
Project.

...Robert


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 19:43:51
Message-ID: 4BC377F7.8010600@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> There are basically 2 major parts for materialized views:
> A) Planner: Getting the query planner to swap in the MatView for part of
> a query automatically for query plan portions which the MatView supports;
> B) Maintenance: maintaining the MatView data according to the programmed
> scheme (synch, asynch, periodic).
>

I'm run more into problems where it's perfectly fine to specify using
the materialized view directly in the query, but keeping that view up to
date usefully was the real problem. The whole idea of getting a MV used
automatically is valuable, but far down the roadmap as I see it.

Not everyone would agree of course, and your description does suggest a
better way to organize a high-level summary though; here's a first cut:

1) Creation of materalized view
Current state: using "CREATE TABLE AS" or similar mechanism, maintain
manually
Optimal: "CREATE MATERIALIZED VIEW" grammar, metadata to store MV data,
dump/reload support

2) Updating materialized views
Current state: periodically create new snapshots, or maintain using
triggers
Optimal: Built-in refresh via multiple strategies, with minimal locking
as to improve concurrent access

3) Using materialized views in the planner
Current state: specify the manually created MV in queries that can use it
Optimal: Automatically accelerate queries that could be satisfied by
substituting available MVs

With (1) being what I think is the only GSoC sized subset here.

I'm not saying someone can't jump right into (3), using the current
implementations for (1) and (2) that are floating around out there. I
just think it would end up wasting a fair amount of work on prototypes
that don't work quite the same way as the eventual fully integrated
version. You certainly can start working on (3) without a fully fleshed
out implementation of (2), I don't know that it makes sense to work on
before (1) though.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 20:03:09
Message-ID: n2z603c8f071004121303if6fa85ddg9217d2c62603ce2f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 12, 2010 at 3:43 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Josh Berkus wrote:
>>
>> There are basically 2 major parts for materialized views:
>> A) Planner: Getting the query planner to swap in the MatView for part of
>> a query automatically for query plan portions which the MatView supports;
>> B) Maintenance: maintaining the MatView data according to the programmed
>> scheme (synch, asynch, periodic).
>>
>
> I'm run more into problems where it's perfectly fine to specify using the
> materialized view directly in the query, but keeping that view up to date
> usefully was the real problem.  The whole idea of getting a MV used
> automatically is valuable, but far down the roadmap as I see it.
>
> Not everyone would agree of course, and your description does suggest a
> better way to organize a high-level summary though; here's a first cut:
>
> 1) Creation of materalized view
> Current state:  using "CREATE TABLE AS" or similar mechanism, maintain
> manually
> Optimal:  "CREATE MATERIALIZED VIEW" grammar, metadata to store MV data,
> dump/reload support
>
> 2) Updating materialized views
> Current state:  periodically create new snapshots, or maintain using
> triggers
> Optimal:  Built-in refresh via multiple strategies, with minimal locking as
> to improve concurrent access
>
> 3) Using materialized views in the planner
> Current state:  specify the manually created MV in queries that can use it
> Optimal:  Automatically accelerate queries that could be satisfied by
> substituting available MVs
>
> With (1) being what I think is the only GSoC sized subset here.
>
> I'm not saying someone can't jump right into (3), using the current
> implementations for (1) and (2) that are floating around out there.  I just
> think it would end up wasting a fair amount of work on prototypes that don't
> work quite the same way as the eventual fully integrated version.  You
> certainly can start working on (3) without a fully fleshed out
> implementation of (2), I don't know that it makes sense to work on before
> (1) though.

Good summary.

...Robert


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 21:46:20
Message-ID: 4BC394AC.4060508@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg,

> I'm not saying someone can't jump right into (3), using the current
> implementations for (1) and (2) that are floating around out there. I
> just think it would end up wasting a fair amount of work on prototypes
> that don't work quite the same way as the eventual fully integrated
> version. You certainly can start working on (3) without a fully fleshed
> out implementation of (2), I don't know that it makes sense to work on
> before (1) though.

What would be the use case for (1) by itself?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-12 22:46:46
Message-ID: 4BC3A2D6.1000501@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> What would be the use case for (1) by itself?
>

There isn't any use case for just working on the infrastructure, just
like there's no use case for "Syntax for partitioning" on its own. That
why people rarely work on that part of these problems--it's boring and
produces no feature of value on its own. I believe that in both cases,
attempts to build the more complicated parts, ones that don't first
address some of the core infrastructure first, will continue to produce
only prototypes.

I don't want to see Materialized Views wander down the same path as
partitioning, where lots of people produce "fun parts" patches, while
ignoring the grunt work of things like production quality catalog
support for the feature. I think Pavel's proposal got that part right
by starting with the grammar and executor setup trivia. And Robert's
comments about the details in that area it's easy to forget about hit
the mark too.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-13 01:31:07
Message-ID: 4BC3C95B.6050400@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I don't want to see Materialized Views wander down the same path as
> partitioning, where lots of people produce "fun parts" patches, while
> ignoring the grunt work of things like production quality catalog
> support for the feature. I think Pavel's proposal got that part right
> by starting with the grammar and executor setup trivia. And Robert's
> comments about the details in that area it's easy to forget about hit
> the mark too.

Good point. And GSoC may be one of the few times we can get people to
do that kind of work. Other than Simon, of course. ;-)

I just worry about any feature which doesn't get as far as a
user-visible implementation. If someone doesn't do the rest of the
parts soon, such features tend to atrophy because nobody is using them.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-13 05:20:08
Message-ID: 4BC3FF08.2050401@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> I just worry about any feature which doesn't get as far as a
> user-visible implementation. If someone doesn't do the rest of the
> parts soon, such features tend to atrophy because nobody is using them.
>

While they're limited, there are complexly viable prototype quality
implementations possible here without a large amount of work to get them
started. I'm not worried too much about this feature being unused. As
I was just reminded when assembling an page on the wiki about it:
http://wiki.postgresql.org/wiki/Materalized_Views it's currently ranked
#1--by a large margin--on the UserVoice feature request survey that
Peter kicked off.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pavelbaros <baros(dot)p(at)seznam(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-13 20:00:57
Message-ID: 9244.1271188857@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> There are basically 2 major parts for materialized views:

> A) Planner: Getting the query planner to swap in the MatView for part of
> a query automatically for query plan portions which the MatView supports;

> B) Maintenance: maintaining the MatView data according to the programmed
> scheme (synch, asynch, periodic).

> I do not believe it is possible to do both of the above in one summer.
> Of the two, (A) would be more useful since it is possible to manually
> implement (B) using triggers, queues and cron jobs today.

A patch that implements only (A) will be DOA. The reason is that the
planner can *never* "swap in a MatView" on its own authority, unless it
can prove that this does not change the semantics of the query. Which
it obviously will be unable to do unless there's a fully transparent
continuous-update scheme in place.

So the correct approach is to work on (B) first. When and if we get to
a sufficiently transparent update implementation, we can think about
changing the planner.

regards, tom lane


From: Pavel <baros(dot)p(at)seznam(dot)cz>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-20 17:59:52
Message-ID: 4BCDEB98.8010608@seznam.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:

> pavelbaros wrote:
>> I am also waiting for approval for my repository named
>> "materialized_view" on git.postgresql.org, so I could publish
>> completed parts.
>
> Presuming that you're going to wander there and get assigned what
> looks like an official repo name for this project is a
> bit...optimistic. I would recommend that you publish to something
> like github instead (you can fork http://github.com/postgres/postgres
> ), and if the work looks good enough that it gets picked up by the
> community maybe you migrate it onto the main site eventually.
> git.postgresql.org is really not setup to be general hosting space for
> everyone who has a PostgreSQL related project; almost every repo on
> there belongs to someone who has already been a steady project
> contributor for a number of years.

Yes, you're true, I'm kind of newbe in this kind of project and
specially in PostgreSQL. But I think it is best way to get into
PostgreSQL. When I chose my bachelor thesis I did not know I could
participate GSoC or try to make it commitable. Anyway I will make repo
on github, so everybody could look at it, as soon as posible.

<http://github.com/pbaros/postgres>
>
> (Switching to boilerplate mode for a paragraph...) You have picked a
> PostgreSQL feature that is dramatically more difficult than it appears
> to be, and I wouldn't expect you'll actually finish even a fraction of
> your goals in a summer of work. You're at least in plentiful
> company--most students do the same. As a rule, if you see a feature
> on our TODO list that looks really useful and fun to work on, it's
> only still there because people have tried multiple times to build it
> completely but not managed to do so because it's harder than it
> appears. This is certainly the case with materialized views.
>
> You've outlined a reasonable way to build a prototype that does a
> limited implementation here. The issue is what it will take to extend
> that into being production quality for the real-world uses of
> materialized views. How useful your prototype is depends on how well
> it implements a subset of that in a way that will get used by the
> final design.
>
> The main hidden complexity in this particular project relates to
> handling view refreshes. The non-obvious problem is that when the
> view updates, you need something like a SQL MERGE to really handle
> that in a robust way that doesn't conflict with concurrent access to
> queries against the materialized view. And work on MERGE support is
> itself blocked behind the fact that PostgreSQL doesn't have a good way
> to lock access to a key value that doesn't exist yet--what other
> databases call key range locking. See the notes for "Add SQL-standard
> MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo
> for more information.
>
> You can work around that to build a prototype by grabbing a full table
> lock on the materialized view when updating it, but that's not a
> production quality solution. Solving that little detail is actually
> more work than the entire project you've outlined. Your suggested
> implementation--"In function CloseIntoRel executor swap relfilenode's
> of temp table and original table and finally delete temp table"--is
> where the full table lock is going to end up at. The exact use cases
> that need materialized views cannot handle a CLUSTER-style table
> recreation each time that needs an exclusive lock to switchover, so
> that whole part of your design is going to be a prototype that doesn't
> work at all like what needs to get built to make this feature
> committable. It's also not a reasonable assumption that you have
> enough disk space to hold a second copy of the MV in a production system.

For now I know it is not commitable in actual state, but for my thesis
it is enough and I know it will not be commitable with this design at
all. In case of GSoC it will depends on the time I will be able to spend
on it, if I will consider some other design.

>
> Once there's a good way to merge updates, how to efficiently generate
> them against the sort of large data sets that need materalized
> views--so you just write out the updates rather than a whole new
> copy--is itself a large project with a significant quantity of
> academic research to absorb before starting. Dan Colish at Portland
> State has been playing around with prototypes for the specific problem
> of finding a good algorithm for view refreshing that is compatible
> with PostgreSQL's execution model. He's already recognized the table
> lock issue here and for the moment is ignoring that part. I don't
> have a good feel yet for how long the targeted update code will take
> to mature, but based on what I do know I suspect that little detail is
> also a larger effort than the entire scope you're envisioning.
> There's a reason why the MIT Press compendium "Materialized Views:
> Techniques, Implementations, and Applications" is over 600 pages
> long--I hope you've already started digging through that material.

I would like to start to dig through that, but I'm in a hurry now. I
already have made a small research on MV as part of my thesis. I also
plan to continue study PostgreSQL and Materialized Views more into the
depth, preferably as my master thesis. But I realize MV feature
commitable to PostgreSQL is not project for one person, of course.

>
> Now, with all that said, that doesn't mean there's not a useful
> project for you buried in this mess. The first two steps in your plan:
>
> 1) create materialized view
> 2) change rewriter
>
> Include building a prototype grammer, doing an initial executor
> implementation, and getting some sort of rewriter working. That is
> potentially good groundwork to lay here. I would suggest that you
> completely drop your step 3:
>
> 3) create command that takes snapshot (refresh MV)
>
> Because you cannot built that in a way that will be useful (and by
> that I mean committable quality) until there's a better way to handle
> updates than writing a whole new table and grabbing a full relation
> lock to switch to it. To do a good job just on the first two steps
> should take at least a whole summer anyway--there's a whole stack of
> background research needed I haven't seen anyone do yet, and that
> isn't on your plan yet. There is a precedent for taking this
> approach. After getting stalled trying to add the entirety of easy
> partitioning to PostgreSQL, the current scope has been scaled back to
> just trying to get the syntax and on-disk structure right, then finish
> off the implementation. See
> http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
> that's been broken into those two major chunks.
>

Anyway thanks for all of your advices and help.

best regards,
Pavel Baros


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel <baros(dot)p(at)seznam(dot)cz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-21 15:59:10
Message-ID: y2n603c8f071004210859r21446528v8fffebba1408012f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/4/20 Pavel <baros(dot)p(at)seznam(dot)cz>:
> For now I know it is not commitable in actual state, but for my thesis it is
> enough and I know it will not be commitable with this design at all. In case
> of GSoC it will depends on the time I will be able to spend on it, if I will
> consider some other design.

I am not sure about this, but I would think we would not want to
accept the project unless you intend to try to make it committable. I
haven't looked at your actual code to see how much work I think that
would take.

...Robert