record identical operator

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: record identical operator
Date: 2013-09-12 22:27:27
Message-ID: 1379024847.48294.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a patch for a bit of infrastructure I believe to be
necessary for correct behavior of REFRESH MATERIALIZED VIEW
CONCURRENTLY as well as incremental maintenance of matviews.

The idea is that after RMVC or incremental maintenance, the matview
should not be visibly different that it would have been at creation
or on a non-concurrent REFRESH.  The issue is easy to demonstrate
with citext, but anywhere that the = operator allows user-visible
differences between "equal" values it can be an issue.

test=# CREATE TABLE citext_table (
test(#   id serial primary key,
test(#   name citext
test(# );
CREATE TABLE
test=# INSERT INTO citext_table (name)
test-#   VALUES ('one'), ('two'), ('three'), (NULL), (NULL);
INSERT 0 5
test=# CREATE MATERIALIZED VIEW citext_matview AS
test-#   SELECT * FROM citext_table;
SELECT 5
test=# CREATE UNIQUE INDEX citext_matview_id
test-#   ON citext_matview (id);
CREATE INDEX
test=# UPDATE citext_table SET name = 'Two' WHERE name = 'TWO';
UPDATE 1

At this point, the table and the matview have visibly different
values, yet without the patch the query used to find differences
for RMVC would be essentially like this (slightly simplified for
readability):

test=# SELECT *
  FROM citext_matview m
  FULL JOIN citext_table t ON (t.id = m.id AND t = m)
  WHERE t IS NULL OR m IS NULL;
 id | name | id | name
----+------+----+------
(0 rows)

No differences were found, so without this patch, the matview would
remain visibly different from the results generated by a run of its
defining query.

The patch adds an "identical" operator (===) for the record type:

test=# SELECT *
  FROM citext_matview m
  FULL JOIN citext_table t ON (t.id = m.id AND t === m)
  WHERE t IS NULL OR m IS NULL;
 id | name | id | name
----+------+----+------
    |      |  2 | Two
  2 | two  |    |
(2 rows)

The difference is now found, so RMVC makes the appropriate change.

test=# REFRESH MATERIALIZED VIEW CONCURRENTLY citext_matview;
REFRESH MATERIALIZED VIEW
test=# SELECT * FROM citext_matview ORDER BY id;
 id | name  
----+-------
  1 | one
  2 | Two
  3 | three
  4 |
  5 |
(5 rows)

The patch adds all of the functions, operators, and catalog
information to support merge joins using the "identical" operator.

The new operator is logically similar to IS NOT DISTINCT FROM for a
record, although its implementation is very different.  For one
thing, it doesn't replace the operation with column level operators
in the parser.  For another thing, it doesn't look up operators for
each type, so the "identical" operator does not need to be
implemented for each type to use it as shown above.  It compares
values byte-for-byte, after detoasting.  The test for identical
records can avoid the detoasting altogether for any values with
different lengths, and it stops when it finds the first column with
a difference.

I toyed with the idea of supporting hashing of records using this
operator, but could not see how that would be a performance win.

The identical (===) and not identical (!==) operator names were
chosen because of a vague similarity to the "exactly equals"
concepts in JavaScript and PHP, which use that name.  The semantics
aren't quite the same, but it seemed close enough not to be too
surprising.  The additional operator names seemed natural to me
based on the first two, but I'm not really that attached to these
names for the operators if someone has a better idea.

Since the comparison of record values is not documented (only
comparisons involving row value constructors), it doesn't seem like
we should document this special case.  It is intended primarily for
support of matview refresh and maintenance, and it seems likely
that record comparison was not documented on the basis that it is
intended primarily for support of such things as indexing and merge
joins -- so leaving the new operators undocumented seems consistent
with existing policy.  I'm open to arguments that the policy should
change.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
record-identical-v1.patch text/x-diff 28.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2013-09-13 00:13:59 9.4 HEAD: select() failed in postmaster
Previous Message Peter Eisentraut 2013-09-12 21:23:49 Re: ENABLE/DISABLE CONSTRAINT NAME