- GSoC - snapshot materialized view (work-in-progress) patch

Lists: pgsql-hackers
From: Pavel Baroš <baros(dot)p(at)seznam(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: - GSoC - snapshot materialized view (work-in-progress) patch
Date: 2010-07-08 22:29:23
Message-ID: 4C365143.9070706@seznam.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

here is my patch trying to implement snapshot materialized views. On
http://github.com/pbaros/postgres can be seen too (or downloaded source
code). But it is still in progress. Together with patch I enclosed sql
script that shows what can be done with MV.

Description of patch:
1) can create MV, and is created uninitialized with data
CREATE MATERIALIZED VIEW mvname AS SELECT ...

2) can refresh MV
ALTER MATERIALIZED VIEW mvname REFRESH

3) MV cannot be modified by DML commands (INSERT, UPDATE and DELETE are
not permitted)

4) index can be created and used with MV

5) pg_dump is repaired, in previous patch dump threw error, now dont,
but it is sort of dummy, I want to reach state, where refreshing command
will be posed after all COPY statements (when all data are in tables).
In this patch REFRESH command is right behind CREATE MV command.

6) psql works too, new command \dm[S+] was added to the list
\d[S+] [PATTERN] - lists all db objects like tables, view,
materialized view and sequences
\dm[S+] [PATTERN] - lists all materialized views

7) there are some docs too, but I guess it is not enough, at least my
english will need to correct

8) some ALTER TABLE commands works, ie. RENAME TO, OWNER TO, SET SCHEMA,
SET TABLESPACE

9) MV and columns can be commented

10) also some functions behave as expected, but if you know about some I
did not mention and could fail when used with MV, I appreciate your hints
pg_get_viewdef()
pg_get_ruledef()
pg_relation_filenode()
pg_relation_filepath()
pg_table_size()

In progress:
- regression tests
- behavior of various ALTER commands, ie SET STATISTIC, CLUSTER ON,
ENABLE/DISABLE RULE, etc.

thanks for comments

Pavel Baros

Attachment Content-Type Size
mv_v1.1.patch text/x-patch 73.6 KB
test_create.sql text/x-sql 2.7 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Baroš <baros(dot)p(at)seznam(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: - GSoC - snapshot materialized view (work-in-progress) patch
Date: 2010-07-09 19:33:49
Message-ID: AANLkTin97-WCEY7gkaMMn6AGi3wvBMQNZHyWJkNS0Syc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/8 Pavel Baroš <baros(dot)p(at)seznam(dot)cz>:
> Description of patch:
> 1) can create MV, and is created uninitialized with data
>   CREATE MATERIALIZED VIEW mvname AS SELECT ...

This doesn't seem acceptable. It should populate it on creation.

> 2) can refresh MV
>   ALTER MATERIALIZED VIEW mvname REFRESH
>
> 3) MV cannot be modified by DML commands (INSERT, UPDATE and DELETE are not
> permitted)
>
> 4) index can be created and used with MV
>
> 5) pg_dump is repaired, in previous patch dump threw error, now dont, but it
> is sort of dummy, I want to reach state, where refreshing command will be
> posed after all COPY statements (when all data are in tables). In this patch
> REFRESH command is right behind CREATE MV command.

Hmm... ISTM that you probably need some kind of dependency stuff in
here to make the materialized view get created after the tables it
depends on have been populated with data. It needs to work with
parallel restore, too. I'm not sure exactly how the dependency stuff
in pg_dump works, though.

A subtle point here is that if you dump and restore a database
containing a materialized view, the new database might not be quite
the same as the old one, because the materialized view might have been
out of date before, and when you recreate it, it'll get refreshed.
I'm not sure there's much we can/should do about that, though.

> 6) psql works too, new command \dm[S+] was added to the list
>  \d[S+] [PATTERN]   - lists all db objects like tables, view, materialized
> view and sequences
>  \dm[S+] [PATTERN]  - lists all materialized views
>
> 7) there are some docs too, but I guess it is not enough, at least my
> english will need to correct

If we're going to treat materialized views as a separate object type,
you probably need to break out the docs for CREATE MATERIALIZED VIEW,
ALTER MATERIALIZED VIEW, and DROP MATERIALIZED VIEW into their own
pages, rather than having then mixed up with corresponding pages for
regular views.

> 8) some ALTER TABLE commands works, ie. RENAME TO, OWNER TO, SET SCHEMA, SET
> TABLESPACE
>
> 9) MV and columns can be commented
>
> 10) also some functions behave as expected, but if you know about some I did
> not mention and could fail when used with MV, I appreciate your hints
>     pg_get_viewdef()
>     pg_get_ruledef()
>     pg_relation_filenode()
>     pg_relation_filepath()
>     pg_table_size()
>
>
> In progress:
> - regression tests
> - behavior of various ALTER commands, ie SET STATISTIC, CLUSTER ON,
> ENABLE/DISABLE RULE, etc.

This isn't right:

rhaas=# create view v as select * from t;
CREATE VIEW
rhaas=# alter view v refresh;
ERROR: unrecognized alter table type: 41

Please add your patch here, so that it will be reviewed during the
about-to-begin CommitFest.

https://commitfest.postgresql.org/action/commitfest_view/open

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Baroš <baros(dot)p(at)seznam(dot)cz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: - GSoC - snapshot materialized view (work-in-progress) patch
Date: 2010-07-12 13:05:47
Message-ID: 4C3B132B.5060200@seznam.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 9.7.2010 21:33, Robert Haas napsal(a):
> 2010/7/8 Pavel Baroš<baros(dot)p(at)seznam(dot)cz>:
>> Description of patch:
>> 1) can create MV, and is created uninitialized with data
>> CREATE MATERIALIZED VIEW mvname AS SELECT ...
>
> This doesn't seem acceptable. It should populate it on creation.
>

Yes, it would be better, in addition, true is, this behavior will be
required if is expected to implement incremental MV in the close future.

>> 2) can refresh MV
>> ALTER MATERIALIZED VIEW mvname REFRESH
>>
>> 3) MV cannot be modified by DML commands (INSERT, UPDATE and DELETE are not
>> permitted)
>>
>> 4) index can be created and used with MV
>>
>> 5) pg_dump is repaired, in previous patch dump threw error, now dont, but it
>> is sort of dummy, I want to reach state, where refreshing command will be
>> posed after all COPY statements (when all data are in tables). In this patch
>> REFRESH command is right behind CREATE MV command.
>
> Hmm... ISTM that you probably need some kind of dependency stuff in
> here to make the materialized view get created after the tables it
> depends on have been populated with data. It needs to work with
> parallel restore, too. I'm not sure exactly how the dependency stuff
> in pg_dump works, though.
>

never mind in case MV will be populated on creation.

> A subtle point here is that if you dump and restore a database
> containing a materialized view, the new database might not be quite
> the same as the old one, because the materialized view might have been
> out of date before, and when you recreate it, it'll get refreshed.
> I'm not sure there's much we can/should do about that, though.
>

yes, it is interesting, of course, there can be real-life example, where
population on creating is needed and is not, and I'm thinking of
solution similar to Oracle or DB2. Add some option to creating MV, that
enable/disable population on creating:

http://www.ibm.com/developerworks/data/library/techarticle/dm-0708khatri/

Oracle:
CREATE MATERIALIZED VIEW mvname
[ BUILD [IMMEDIATE | DEFERRED] ]
AS SELECT ..

DB2:
CREATE TABLE mvname
AS SELECT ...
[ INITIALLY DEFERRED | IMMEDIATE ]

>> 6) psql works too, new command \dm[S+] was added to the list
>> \d[S+] [PATTERN] - lists all db objects like tables, view, materialized
>> view and sequences
>> \dm[S+] [PATTERN] - lists all materialized views
>>

I also noticed I forgot handle options \dp and \dpp, this should be OK
in next version of patch.

>> 7) there are some docs too, but I guess it is not enough, at least my
>> english will need to correct
>
> If we're going to treat materialized views as a separate object type,
> you probably need to break out the docs for CREATE MATERIALIZED VIEW,
> ALTER MATERIALIZED VIEW, and DROP MATERIALIZED VIEW into their own
> pages, rather than having then mixed up with corresponding pages for
> regular views.
>

Yeah, that was problem I just solved like that here, but I confess this
would be better.

>> In progress:
>> - regression tests
>> - behavior of various ALTER commands, ie SET STATISTIC, CLUSTER ON,
>> ENABLE/DISABLE RULE, etc.
>
> This isn't right:
>
> rhaas=# create view v as select * from t;
> CREATE VIEW
> rhaas=# alter view v refresh;
> ERROR: unrecognized alter table type: 41
>

I know, cases like that will be more than that. Thats why I work on good
tests now.

> Please add your patch here, so that it will be reviewed during the
> about-to-begin CommitFest.
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>

OK, but will you help me with that form? Do you think I can fill it like
that? I'm not sure about few fields ..

Name: Snapshot materialized views
CommitFest Topic: [ Miscellaneous | SQL Features ] ???
Patch Status: Needs review
Author: me
Reviewers: You?
Commiters: who?

and I quess fields 'Date Closed' and 'Message-ID for Original Patch'
will be filled later.

thanks a lot

Pavel Baros


From: Pavel Baroš <baros(dot)p(at)seznam(dot)cz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: - GSoC - snapshot materialized view (work-in-progress) patch
Date: 2010-07-12 14:16:11
Message-ID: 4C3B23AB.5040600@seznam.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 9.7.2010 21:33, Robert Haas napsal(a):
> 2010/7/8 Pavel Baroš<baros(dot)p(at)seznam(dot)cz>:
>> Description of patch:
>> 1) can create MV, and is created uninitialized with data
>> CREATE MATERIALIZED VIEW mvname AS SELECT ...
>
> This doesn't seem acceptable. It should populate it on creation.
>

Yes, it would be better, in addition, true is, this behavior will be
required if is expected to implement incremental MV in the close future.

>> 2) can refresh MV
>> ALTER MATERIALIZED VIEW mvname REFRESH
>>
>> 3) MV cannot be modified by DML commands (INSERT, UPDATE and DELETE
>> are not
>> permitted)
>>
>> 4) index can be created and used with MV
>>
>> 5) pg_dump is repaired, in previous patch dump threw error, now dont,
>> but it
>> is sort of dummy, I want to reach state, where refreshing command
>> will be
>> posed after all COPY statements (when all data are in tables). In
>> this patch
>> REFRESH command is right behind CREATE MV command.
>
> Hmm... ISTM that you probably need some kind of dependency stuff in
> here to make the materialized view get created after the tables it
> depends on have been populated with data. It needs to work with
> parallel restore, too. I'm not sure exactly how the dependency stuff
> in pg_dump works, though.
>

never mind in case MV will be populated on creation.

> A subtle point here is that if you dump and restore a database
> containing a materialized view, the new database might not be quite
> the same as the old one, because the materialized view might have been
> out of date before, and when you recreate it, it'll get refreshed.
> I'm not sure there's much we can/should do about that, though.
>

yes, it is interesting, of course, there can be real-life example, where
population on creating is needed and is not, and I'm thinking of
solution similar to Oracle or DB2. Add some option to creating MV, that
enable/disable population on creating:

http://www.ibm.com/developerworks/data/library/techarticle/dm-0708khatri/

Oracle:
CREATE MATERIALIZED VIEW mvname
[ BUILD [IMMEDIATE | DEFERRED] ]
AS SELECT ..

DB2:
CREATE TABLE mvname
AS SELECT ...
[ INITIALLY DEFERRED | IMMEDIATE ]

>> 6) psql works too, new command \dm[S+] was added to the list
>> \d[S+] [PATTERN] - lists all db objects like tables, view,
>> materialized
>> view and sequences
>> \dm[S+] [PATTERN] - lists all materialized views
>>

I also noticed I forgot handle options \dp and \dpp, this should be OK
in next version of patch.

>> 7) there are some docs too, but I guess it is not enough, at least my
>> english will need to correct
>
> If we're going to treat materialized views as a separate object type,
> you probably need to break out the docs for CREATE MATERIALIZED VIEW,
> ALTER MATERIALIZED VIEW, and DROP MATERIALIZED VIEW into their own
> pages, rather than having then mixed up with corresponding pages for
> regular views.
>

Yeah, that was problem I just solved like that here, but I confess this
would be better.

>> In progress:
>> - regression tests
>> - behavior of various ALTER commands, ie SET STATISTIC, CLUSTER ON,
>> ENABLE/DISABLE RULE, etc.
>
> This isn't right:
>
> rhaas=# create view v as select * from t;
> CREATE VIEW
> rhaas=# alter view v refresh;
> ERROR: unrecognized alter table type: 41
>

I know, cases like that will be more than that. Thats why I work on good
tests now.

> Please add your patch here, so that it will be reviewed during the
> about-to-begin CommitFest.
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>

OK, but will you help me with that form? Do you think I can fill it like
that? I'm not sure about few fields ..

Name: Snapshot materialized views
CommitFest Topic: [ Miscellaneous | SQL Features ] ???
Patch Status: Needs review
Author: me
Reviewers: You?
Commiters: who?

and I quess fields 'Date Closed' and 'Message-ID for Original Patch'
will be filled later.

thanks a lot

Pavel Baros


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, Pavel Baroš <baros(dot)p(at)seznam(dot)cz>
Subject: Re: - GSoC - snapshot materialized view (work-in-progress) patch
Date: 2010-07-12 15:06:16
Message-ID: 4C3AE9180200002500033434@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Baroš<baros(dot)p(at)seznam(dot)cz> wrote:
> Dne 9.7.2010 21:33, Robert Haas napsal(a):

>> Please add your patch here, so that it will be reviewed during
>> the about-to-begin CommitFest.
>>
>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>
> OK, but will you help me with that form? Do you think I can fill
> it like that? I'm not sure about few fields ..
>
> Name: Snapshot materialized views
> CommitFest Topic: [ Miscellaneous | SQL Features ] ???

SQL Features seems reasonable to me.

> Patch Status: Needs review
> Author: me
> Reviewers: You?

Leave empty. Reviewers will sign up or be assigned.

> Commiters: who?

That comes much later -- when the patch is complete and has a
favorable review, then a committer will pick it up.

> and I quess fields 'Date Closed' and 'Message-ID for Original
> Patch' will be filled later.

Date closed is only set for patches which are committed, returned
with feedback (for a later CommitFest), or rejected. When you make
an entry which references a post to the lists, you should fill in
the Message-ID from the email header of the post. You may be able
to get this from your email software as soon as you send the post;
if not, you can find it on the archive page for the post.

-Kevin


From: Thom Brown <thom(at)linux(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Pavel Baroš <baros(dot)p(at)seznam(dot)cz>
Subject: Re: - GSoC - snapshot materialized view (work-in-progress) patch
Date: 2011-06-17 18:31:00
Message-ID: BANLkTinHXgfPRFjeT=6yneAdPpTSLORL1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/12 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>:
> Pavel Baroš<baros(dot)p(at)seznam(dot)cz> wrote:
>> Dne 9.7.2010 21:33, Robert Haas napsal(a):
>
>>> Please add your patch here, so that it will be reviewed during
>>> the about-to-begin CommitFest.
>>>
>>> https://commitfest.postgresql.org/action/commitfest_view/open
>>>
>>
>> OK, but will you help me with that form? Do you think I can fill
>> it like that? I'm not sure about few fields ..
>>
>> Name:             Snapshot materialized views
>> CommitFest Topic: [ Miscellaneous | SQL Features ] ???
>
> SQL Features seems reasonable to me.
>
>> Patch Status:     Needs review
>> Author:           me
>> Reviewers:        You?
>
> Leave empty.  Reviewers will sign up or be assigned.
>
>> Commiters:        who?
>
> That comes much later -- when the patch is complete and has a
> favorable review, then a committer will pick it up.
>
>> and I quess fields 'Date Closed' and 'Message-ID for Original
>> Patch' will be filled later.
>
> Date closed is only set for patches which are committed, returned
> with feedback (for a later CommitFest), or rejected.  When you make
> an entry which references a post to the lists, you should fill in
> the Message-ID from the email header of the post.  You may be able
> to get this from your email software as soon as you send the post;
> if not, you can find it on the archive page for the post.

This topic hasn't been touched on in nearly a year, but is the work
that's been done so far salvageable? I'm not sure what happens to
GSoC project work that doesn't get finished in time.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company