refresh materialized view concurrently

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: refresh materialized view concurrently
Date: 2013-06-14 16:05:29
Message-ID: 1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
9.4 CF1.  The goal of this patch is to allow a refresh without
interfering with concurrent reads, using transactional semantics.

It is my hope to get this committed during this CF to allow me to
focus on incremental maintenance for the rest of the release cycle.

I didn't need to touch very much outside of matview-specific files
for this.  My biggest concern is that I needed two small functions
which did *exactly* what some static functions in ri_triggers.c
were doing and couldn't see where the best place to share them from
was.  For the moment I just duplicated them, but my hope would be
that they could be put in a suitable location and called from both
places, rather than duplicating the 30-some lines of code.  The
function signatures are:

void quoteOneName(char *buffer, const char *name)
void quoteRelationName(char *buffer, Relation rel)

Comments in the patch describe the technique used for the
transactional refresh, but I'm not sure how easy it is to
understand the technique from the comments.  Here is a
demonstration of the basic technique, using a table to mock the
materialized view so it can be run directly.

-------------------------------------------------------------------

--
-- Setup
--
drop table if exists n, nt, nd cascade;
drop table if exists nm;

create table n (id int not null primary key, val text);
insert into n values
  (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five'),
  (6, null), (7, null), (8, null), (9, null);
-- We use a table to mock this materialized view definition:
--   create materialized view nm as select * from n;
create table nm as select * from n;
insert into n values (10, 'ten'), (11, null);
update n set val = 'zwei' where id = 2;
update n set val = null where id = 3;
update n set id = 44, val = 'forty-four' where id = 4;
update n set val = 'seven' where id = 7;
delete from n where id = 5;
delete from n where id = 8;

vacuum analyze;

--
-- Sample of internal processing for REFRESH MV CONCURRENTLY.
--
begin;
create temp table nt as select * from n;
analyze nt;
create temp table nd as
  SELECT x.ctid as tid, y
    FROM nm x
    FULL JOIN n y ON (y.id OPERATOR(pg_catalog.=) x.id)
    WHERE (y.*) IS DISTINCT FROM (x.*)
    ORDER BY tid;
analyze nd;

delete from nm where ctid in
  (select tid from nd
    where tid is not null and y is not distinct from null);
update nm x set id = (d.y).id, val = (d.y).val from nd d
  where d.tid is not null and x.ctid = d.tid;
insert into nm select (y).* from nd where tid is null;
commit;

--
-- Check that results match.
--
select * from n order by id;
select * from nm order by id;

-------------------------------------------------------------------

I also tried a million-row materialized view with the patch to see
what the performace was like on a large table with just a few
changes.  I was surprised that a small change-set like this was
actually faster than replacing the heap, at least on my machine.
Obviously, when a larger number of rows are affected the
transactional CONCURRENTLY option will be slower, and this is not
intended in any way as a performace-enhancing feature, that was
just a happy surprise in testing.

-------------------------------------------------------------------

-- drop from previous test
drop table if exists testv cascade;

-- create and populate permanent table
create table testv (id int primary key, val text);
insert into testv
  select n, cash_words((floor(random() * 100000000) / 100)::text::money)
  from (select generate_series(1, 2000000, 2)) s(n);
update testv
  set val = NULL
  where id = 547345;

create materialized view matv as select * from testv;
create unique index matv_id on matv (id);
vacuum analyze matv;

delete from testv where id = 16405;
insert into testv
  values (393466, cash_words((floor(random() * 100000000) / 100)::text::money));
update testv
  set val = cash_words((floor(random() * 100000000) / 100)::text::money)
  where id = 1947141;

refresh materialized view concurrently matv;

-------------------------------------------------------------------

People may be surprised to see this using SPI even more than
ri_triggers.c does.  I think this is the safest and most
maintainable approach, although I welcome alternative suggestions.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
refresh-concurrently-v1.patch text/x-diff 36.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-06-14 16:07:36 Re: pg_filedump 9.3: checksums (and a few other fixes)
Previous Message Tom Lane 2013-06-14 15:59:04 Re: pg_filedump 9.3: checksums (and a few other fixes)