Re: GSoC - proposal - Materialized Views in PostgreSQL

From: Pavel <baros(dot)p(at)seznam(dot)cz>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-20 17:59:52
Message-ID: 4BCDEB98.8010608@seznam.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:

> pavelbaros wrote:
>> I am also waiting for approval for my repository named
>> "materialized_view" on git.postgresql.org, so I could publish
>> completed parts.
>
> Presuming that you're going to wander there and get assigned what
> looks like an official repo name for this project is a
> bit...optimistic. I would recommend that you publish to something
> like github instead (you can fork http://github.com/postgres/postgres
> ), and if the work looks good enough that it gets picked up by the
> community maybe you migrate it onto the main site eventually.
> git.postgresql.org is really not setup to be general hosting space for
> everyone who has a PostgreSQL related project; almost every repo on
> there belongs to someone who has already been a steady project
> contributor for a number of years.

Yes, you're true, I'm kind of newbe in this kind of project and
specially in PostgreSQL. But I think it is best way to get into
PostgreSQL. When I chose my bachelor thesis I did not know I could
participate GSoC or try to make it commitable. Anyway I will make repo
on github, so everybody could look at it, as soon as posible.

<http://github.com/pbaros/postgres>
>
> (Switching to boilerplate mode for a paragraph...) You have picked a
> PostgreSQL feature that is dramatically more difficult than it appears
> to be, and I wouldn't expect you'll actually finish even a fraction of
> your goals in a summer of work. You're at least in plentiful
> company--most students do the same. As a rule, if you see a feature
> on our TODO list that looks really useful and fun to work on, it's
> only still there because people have tried multiple times to build it
> completely but not managed to do so because it's harder than it
> appears. This is certainly the case with materialized views.
>
> You've outlined a reasonable way to build a prototype that does a
> limited implementation here. The issue is what it will take to extend
> that into being production quality for the real-world uses of
> materialized views. How useful your prototype is depends on how well
> it implements a subset of that in a way that will get used by the
> final design.
>
> The main hidden complexity in this particular project relates to
> handling view refreshes. The non-obvious problem is that when the
> view updates, you need something like a SQL MERGE to really handle
> that in a robust way that doesn't conflict with concurrent access to
> queries against the materialized view. And work on MERGE support is
> itself blocked behind the fact that PostgreSQL doesn't have a good way
> to lock access to a key value that doesn't exist yet--what other
> databases call key range locking. See the notes for "Add SQL-standard
> MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo
> for more information.
>
> You can work around that to build a prototype by grabbing a full table
> lock on the materialized view when updating it, but that's not a
> production quality solution. Solving that little detail is actually
> more work than the entire project you've outlined. Your suggested
> implementation--"In function CloseIntoRel executor swap relfilenode's
> of temp table and original table and finally delete temp table"--is
> where the full table lock is going to end up at. The exact use cases
> that need materialized views cannot handle a CLUSTER-style table
> recreation each time that needs an exclusive lock to switchover, so
> that whole part of your design is going to be a prototype that doesn't
> work at all like what needs to get built to make this feature
> committable. It's also not a reasonable assumption that you have
> enough disk space to hold a second copy of the MV in a production system.

For now I know it is not commitable in actual state, but for my thesis
it is enough and I know it will not be commitable with this design at
all. In case of GSoC it will depends on the time I will be able to spend
on it, if I will consider some other design.

>
> Once there's a good way to merge updates, how to efficiently generate
> them against the sort of large data sets that need materalized
> views--so you just write out the updates rather than a whole new
> copy--is itself a large project with a significant quantity of
> academic research to absorb before starting. Dan Colish at Portland
> State has been playing around with prototypes for the specific problem
> of finding a good algorithm for view refreshing that is compatible
> with PostgreSQL's execution model. He's already recognized the table
> lock issue here and for the moment is ignoring that part. I don't
> have a good feel yet for how long the targeted update code will take
> to mature, but based on what I do know I suspect that little detail is
> also a larger effort than the entire scope you're envisioning.
> There's a reason why the MIT Press compendium "Materialized Views:
> Techniques, Implementations, and Applications" is over 600 pages
> long--I hope you've already started digging through that material.

I would like to start to dig through that, but I'm in a hurry now. I
already have made a small research on MV as part of my thesis. I also
plan to continue study PostgreSQL and Materialized Views more into the
depth, preferably as my master thesis. But I realize MV feature
commitable to PostgreSQL is not project for one person, of course.

>
> Now, with all that said, that doesn't mean there's not a useful
> project for you buried in this mess. The first two steps in your plan:
>
> 1) create materialized view
> 2) change rewriter
>
> Include building a prototype grammer, doing an initial executor
> implementation, and getting some sort of rewriter working. That is
> potentially good groundwork to lay here. I would suggest that you
> completely drop your step 3:
>
> 3) create command that takes snapshot (refresh MV)
>
> Because you cannot built that in a way that will be useful (and by
> that I mean committable quality) until there's a better way to handle
> updates than writing a whole new table and grabbing a full relation
> lock to switch to it. To do a good job just on the first two steps
> should take at least a whole summer anyway--there's a whole stack of
> background research needed I haven't seen anyone do yet, and that
> isn't on your plan yet. There is a precedent for taking this
> approach. After getting stalled trying to add the entirety of easy
> partitioning to PostgreSQL, the current scope has been scaled back to
> just trying to get the syntax and on-disk structure right, then finish
> off the implementation. See
> http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
> that's been broken into those two major chunks.
>

Anyway thanks for all of your advices and help.

best regards,
Pavel Baros

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-04-20 18:24:35 Re: Thoughts on pg_hba.conf rejection
Previous Message Marc G. Fournier 2010-04-20 17:53:49 Re: BETA