Optimising Foreign Key checks

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Optimising Foreign Key checks
Date: 2013-06-01 08:41:13
Message-ID: CA+U5nMLM1DaHBC6JXtUMfcG6f7FgV5mPSpufO7GRnbFKkF2f7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

FK checks can be expensive, especially when loading large volumes of
data into an existing table or partition. A couple of ideas for
improving performance are discussed here:

1. Use Case: Bulk loading
COPY pgbench_accounts; --> references pgbench_branches with many
repeated values

Proposal: Transactions that need multiple checks can be optimised by
simply LOCKing the whole referenced table, once. We can then hold the
referenced table as a Hash, like we do with a Hash Join (its almost
exactly the same thing). This works in two ways: it speeds up checks
and it also reduces the locking overhead.

This would require explicit permission of the user, which would be
given by a new table parameter, set on the referenced table.

WITH (foreign_key_lock_level = row | table)

Setting this would lock out changes on that table, so would only be
suitable for read-mostly tables. But that is exactly the most
frequently referenced table in a FK anyway, "reference tables", so the
optimisation is appropriate in probably the majority of cases.

2. Use Case: Transactional repetition
BEGIN;
INSERT INTO order VALUES (ordid, ....)
INSERT INTO order_line VALUES (ordid, 1, .....)
INSERT INTO order_line VALUES (ordid, 2, .....)
INSERT INTO order_line VALUES (ordid, 3, .....)
INSERT INTO order_line VALUES (ordid, 4, .....)
...
COMMIT;
The inserts into order_line repeatedly execute checks against the same
ordid. Deferring and then de-duplicating the checks would optimise the
transaction.

Proposal: De-duplicate multiple checks against same value. This would
be implemented by keeping a hash of rows that we had already either
inserted and/or locked as the transaction progresses, so we can use
the hash to avoid queuing up after triggers.

We could also use this technique to de-duplicate checks within a
single statement.

In both cases we are building up a local hash table with values and
then using those values to avoid queuing constraint triggers. So code
is similar for both.

Thoughts?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-01 20:27:35
Message-ID: 20130601202735.GA256029@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 01, 2013 at 09:41:13AM +0100, Simon Riggs wrote:
> FK checks can be expensive, especially when loading large volumes of
> data into an existing table or partition. A couple of ideas for
> improving performance are discussed here:
>
> 1. Use Case: Bulk loading
> COPY pgbench_accounts; --> references pgbench_branches with many
> repeated values
>
> Proposal: Transactions that need multiple checks can be optimised by
> simply LOCKing the whole referenced table, once. We can then hold the
> referenced table as a Hash, like we do with a Hash Join (its almost
> exactly the same thing). This works in two ways: it speeds up checks
> and it also reduces the locking overhead.

> 2. Use Case: Transactional repetition
> BEGIN;
> INSERT INTO order VALUES (ordid, ....)
> INSERT INTO order_line VALUES (ordid, 1, .....)
> INSERT INTO order_line VALUES (ordid, 2, .....)
> INSERT INTO order_line VALUES (ordid, 3, .....)
> INSERT INTO order_line VALUES (ordid, 4, .....)

> Proposal: De-duplicate multiple checks against same value. This would
> be implemented by keeping a hash of rows that we had already either
> inserted and/or locked as the transaction progresses, so we can use
> the hash to avoid queuing up after triggers.

I find (2) more promising than (1). It helps automatically, and it helps in
more cases. The main advantage of (1) is avoiding the writes of tuple locks
onto the PK table. Therefore, I expect (1) to distinguish itself over (2)
when the referenced values are _not_ repeated too much. If referenced values
are repeated, tuple locking costs would tend to disappear into the noise after
the deduplication of (2).

> In both cases we are building up a local hash table with values and
> then using those values to avoid queuing constraint triggers. So code
> is similar for both.
>
> Thoughts?

Will this add too much cost where it doesn't help? I don't know what to
predict there. There's the obvious case of trivial transactions with no more
than one referential integrity check per FK, but there's also the case of a
transaction with many FK checks all searching different keys. If the hash hit
rate (key duplication rate) is low, the hash can consume considerably more
memory than the trigger queue without preventing many RI queries. What sort
of heuristic could we use to avoid pessimizing such cases?

Same-transaction UPDATE or DELETE of the PK table, as well as subtransaction
abort, potentially invalidates hash entries. I recommend thinking relatively
early about how best to handle that.

Before deciding what to think overall, I needed to see a benchmark. I ran
this simple one based on your scenarios:

BEGIN;
CREATE TABLE "order" (orderid int PRIMARY KEY);
CREATE TABLE order_line (
orderid int,
lineno int,
PRIMARY KEY (orderid, lineno),
FOREIGN KEY (orderid) REFERENCES "order"
);
INSERT INTO "order" VALUES (1);
INSERT INTO order_line SELECT 1, n FROM generate_series(1,1000000) t(n);
ROLLBACK;

See attached output from "perf report -s parent -g graph,5,caller"; I suggest
browsing under "less -S". It confirms that the expensive part is something
your proposals would address.

A different way to help the bulk loading case would be to lock more keys with
a single query. Currently, we issue a query like this for every INSERTed row:

SELECT 1 FROM ONLY pktable WHERE pkcol = $1 FOR KEY SHARE OF x

We could instead consider queued tuples in batches:

SELECT 1 FROM ONLY pktable WHERE pkcol = ANY (ARRAY[$1,$2,...,$100]) FOR KEY SHARE OF x

This would have the advantage of not adding a long-lived, potentially-large
data structure and not depending on the rate of referenced key duplication.
But it would not help non-bulk scenarios. (However, the user could make your
example for (2) become a bulk scenario by deferring the FK constraint.)

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fk-insert-profile.txt text/plain 10.0 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-02 09:45:21
Message-ID: CA+U5nML6BwWtVcfkg3nzr=k7fe8dx7VGKRM8Npk8NjsrYbatcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 June 2013 21:27, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Jun 01, 2013 at 09:41:13AM +0100, Simon Riggs wrote:
>> FK checks can be expensive, especially when loading large volumes of
>> data into an existing table or partition. A couple of ideas for
>> improving performance are discussed here:
>>
>> 1. Use Case: Bulk loading
>> COPY pgbench_accounts; --> references pgbench_branches with many
>> repeated values
>>
>> Proposal: Transactions that need multiple checks can be optimised by
>> simply LOCKing the whole referenced table, once. We can then hold the
>> referenced table as a Hash, like we do with a Hash Join (its almost
>> exactly the same thing). This works in two ways: it speeds up checks
>> and it also reduces the locking overhead.
>
>> 2. Use Case: Transactional repetition
>> BEGIN;
>> INSERT INTO order VALUES (ordid, ....)
>> INSERT INTO order_line VALUES (ordid, 1, .....)
>> INSERT INTO order_line VALUES (ordid, 2, .....)
>> INSERT INTO order_line VALUES (ordid, 3, .....)
>> INSERT INTO order_line VALUES (ordid, 4, .....)
>
>> Proposal: De-duplicate multiple checks against same value. This would
>> be implemented by keeping a hash of rows that we had already either
>> inserted and/or locked as the transaction progresses, so we can use
>> the hash to avoid queuing up after triggers.
>
> I find (2) more promising than (1). It helps automatically, and it helps in
> more cases. The main advantage of (1) is avoiding the writes of tuple locks
> onto the PK table. Therefore, I expect (1) to distinguish itself over (2)
> when the referenced values are _not_ repeated too much. If referenced values
> are repeated, tuple locking costs would tend to disappear into the noise after
> the deduplication of (2).
>
>> In both cases we are building up a local hash table with values and
>> then using those values to avoid queuing constraint triggers. So code
>> is similar for both.
>>
>> Thoughts?

Thanks for your detailed reply.

> Will this add too much cost where it doesn't help? I don't know what to
> predict there. There's the obvious case of trivial transactions with no more
> than one referential integrity check per FK, but there's also the case of a
> transaction with many FK checks all searching different keys. If the hash hit
> rate (key duplication rate) is low, the hash can consume considerably more
> memory than the trigger queue without preventing many RI queries. What sort
> of heuristic could we use to avoid pessimizing such cases?

I've struggled with that for a while now. Probably all we can say is
that there might be one, and if there is not, then manual decoration
of the transaction will be the way to go.

> Same-transaction UPDATE or DELETE of the PK table, as well as subtransaction
> abort, potentially invalidates hash entries. I recommend thinking relatively
> early about how best to handle that.

Thanks, good point.

> Before deciding what to think overall, I needed to see a benchmark. I ran
> this simple one based on your scenarios:
>
> BEGIN;
> CREATE TABLE "order" (orderid int PRIMARY KEY);
> CREATE TABLE order_line (
> orderid int,
> lineno int,
> PRIMARY KEY (orderid, lineno),
> FOREIGN KEY (orderid) REFERENCES "order"
> );
> INSERT INTO "order" VALUES (1);
> INSERT INTO order_line SELECT 1, n FROM generate_series(1,1000000) t(n);
> ROLLBACK;
>
> See attached output from "perf report -s parent -g graph,5,caller"; I suggest
> browsing under "less -S". It confirms that the expensive part is something
> your proposals would address.

Thanks for posting this. It shows nicely that there is a problem with
repeated execution of SQL in constraint triggers. However, that isn't
the only problem since we must also consider memory usage and memory
scrolling. Currently, the after trigger queue builds up in memory and
doesn't scroll to disk, so overall memory usage is a problem in many
cases. Memory scrolling is also a problem since the after trigger
queue records the tids against which we we need to execute triggers;
this causes us to re-access blocks that had scrolled out of memory,
thus defeating the bulk access strategy, as well as spoiling cache.

For clarity the 4 problems are
1. SQL execution overhead
2. Memory usage
3. Memory scrolling
4. Locking overhead, specifically FPWs and WAL records from FK checks
probably in that order or thereabouts.

The above is why I went for a technique that avoided SQL execution
entirely, as well as conserving memory by de-duplicating the values in
a hash table as we go, which avoids all three problems. The fourth was
solved by the more extreme approach to locking.

> A different way to help the bulk loading case would be to lock more keys with
> a single query. Currently, we issue a query like this for every INSERTed row:
>
> SELECT 1 FROM ONLY pktable WHERE pkcol = $1 FOR KEY SHARE OF x
>
> We could instead consider queued tuples in batches:
>
> SELECT 1 FROM ONLY pktable WHERE pkcol = ANY (ARRAY[$1,$2,...,$100]) FOR KEY SHARE OF x
>
> This would have the advantage of not adding a long-lived, potentially-large
> data structure and not depending on the rate of referenced key duplication.
> But it would not help non-bulk scenarios. (However, the user could make your
> example for (2) become a bulk scenario by deferring the FK constraint.)

At the moment we queue the after triggers first, then execute later.
So we can run out of memory before we execute any SQL. Which means we
need to do something to prevent the after trigger queue growing,
rather than simply optimise the way we execute things on the queue.

In terms of formal trigger behaviour, I am suggesting we redefine FK
triggers as if they were executing a WHEN clause that looks like this
WHEN check_not_already_queued()
and where the function would have the side effect of maintaining an
in-memory hash that grows up to a certain size. I suggest work_mem for
most statements, but maintenance_work_mem for COPY.

Thus the WHEN clause gets executed before we queue anything and so we
are able to minimise problems 2 and 3.

Having that formal definition also puts us neatly into an
implementation route in that we are simply adding a WHEN clause to FK
triggers.

The overall execution plan then looks a lot like a hash join, with
skew avoidance. We build up a hash table in memory, up to a certain
size, while letting non-matched rows spill to the after trigger queue
for later execution. Since we skip many of the checks via the WHEN
clause, the after trigger queue should be more manageable in size.

I think it might be worth considering joining the after trigger queue
directly to the referenced table(s), something like this...

CREATE OR REPLACE FUNCTION after_trigger_queue() RETURNS SETOF tid AS $$
...
$$ LANGUAGE SQL;

EXPLAIN
SELECT 1 FROM ONLY "order"
WHERE orderid IN
(SELECT orderid FROM ONLY order_line WHERE ctid IN (SELECT
after_trigger_queue FROM after_trigger_queue() ))
FOR KEY SHARE;

But we could optimise that even further if we had a "BlockScan", which
would be a block-oriented version of the tid scan where we simply
provide a bitmap of blocks needing to be scanned, just like the output
of an BitmapIndexScan. The reason for mentioning that here is that
parallel query will eventually need the ability to do a scan of a
subset of blocks, as does tablesample. So I can see 3 callers of such
a Scan type.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <jim(at)nasby(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-03 18:41:17
Message-ID: 51ACE34D.20409@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/2/13 4:45 AM, Simon Riggs wrote:
>> >Will this add too much cost where it doesn't help? I don't know what to
>> >predict there. There's the obvious case of trivial transactions with no more
>> >than one referential integrity check per FK, but there's also the case of a
>> >transaction with many FK checks all searching different keys. If the hash hit
>> >rate (key duplication rate) is low, the hash can consume considerably more
>> >memory than the trigger queue without preventing many RI queries. What sort
>> >of heuristic could we use to avoid pessimizing such cases?
> I've struggled with that for a while now. Probably all we can say is
> that there might be one, and if there is not, then manual decoration
> of the transaction will be the way to go.

Just an idea... each backend could keep a store that indicates what FKs this would help with. For example, any time we hit a transaction that exercises the same FK more than once, we stick the OID of the FK constraint (or maybe of the two tables) into a hash that's in that backend's top memory context. (Or if we want to be real fancy, shared mem).
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-03 19:00:19
Message-ID: CA+U5nMK=e4PwH2BZCVHjRw_429+YLaPDFG+c6NX9oFEM1W-0WQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 June 2013 19:41, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On 6/2/13 4:45 AM, Simon Riggs wrote:
>>>
>>> >Will this add too much cost where it doesn't help? I don't know what to
>>> >predict there. There's the obvious case of trivial transactions with no
>>> > more
>>> >than one referential integrity check per FK, but there's also the case
>>> > of a
>>> >transaction with many FK checks all searching different keys. If the
>>> > hash hit
>>> >rate (key duplication rate) is low, the hash can consume considerably
>>> > more
>>> >memory than the trigger queue without preventing many RI queries. What
>>> > sort
>>> >of heuristic could we use to avoid pessimizing such cases?
>>
>> I've struggled with that for a while now. Probably all we can say is
>> that there might be one, and if there is not, then manual decoration
>> of the transaction will be the way to go.
>
>
> Just an idea... each backend could keep a store that indicates what FKs this
> would help with. For example, any time we hit a transaction that exercises
> the same FK more than once, we stick the OID of the FK constraint (or maybe
> of the two tables) into a hash that's in that backend's top memory context.
> (Or if we want to be real fancy, shared mem).

Yes, that principle would work. We could just store that on the
relcache entry for a table.

It requires a little bookkeeping to implement that heuristic. I'm sure
other ways exist as well.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-04 00:54:02
Message-ID: 20130604005402.GA362718@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote:
> For clarity the 4 problems are
> 1. SQL execution overhead
> 2. Memory usage
> 3. Memory scrolling
> 4. Locking overhead, specifically FPWs and WAL records from FK checks
> probably in that order or thereabouts.
>
> The above is why I went for a technique that avoided SQL execution
> entirely, as well as conserving memory by de-duplicating the values in
> a hash table as we go, which avoids all three problems. The fourth was
> solved by the more extreme approach to locking.

That nicely frames the benefits of your proposals. Makes sense.

> I think it might be worth considering joining the after trigger queue
> directly to the referenced table(s), something like this...
>
> CREATE OR REPLACE FUNCTION after_trigger_queue() RETURNS SETOF tid AS $$
> ...
> $$ LANGUAGE SQL;
>
> EXPLAIN
> SELECT 1 FROM ONLY "order"
> WHERE orderid IN
> (SELECT orderid FROM ONLY order_line WHERE ctid IN (SELECT
> after_trigger_queue FROM after_trigger_queue() ))
> FOR KEY SHARE;

Agreed.

> But we could optimise that even further if we had a "BlockScan", which
> would be a block-oriented version of the tid scan where we simply
> provide a bitmap of blocks needing to be scanned, just like the output
> of an BitmapIndexScan. The reason for mentioning that here is that
> parallel query will eventually need the ability to do a scan of a
> subset of blocks, as does tablesample. So I can see 3 callers of such
> a Scan type.

Interesting. I was going to say that could lock more keys than needed, but
perhaps you would afterward filter by xmin/cmin.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-04 13:45:17
Message-ID: CA+U5nMKQde1YqbqVy5h00pOpRzMLe3DmerFxHQ8a2aRGd9WAqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4 June 2013 01:54, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote:
>> For clarity the 4 problems are
>> 1. SQL execution overhead
>> 2. Memory usage
>> 3. Memory scrolling
>> 4. Locking overhead, specifically FPWs and WAL records from FK checks
>> probably in that order or thereabouts.
>>
>> The above is why I went for a technique that avoided SQL execution
>> entirely, as well as conserving memory by de-duplicating the values in
>> a hash table as we go, which avoids all three problems. The fourth was
>> solved by the more extreme approach to locking.
>
> That nicely frames the benefits of your proposals. Makes sense.
>
>> I think it might be worth considering joining the after trigger queue
>> directly to the referenced table(s), something like this...
>>
>> CREATE OR REPLACE FUNCTION after_trigger_queue() RETURNS SETOF tid AS $$
>> ...
>> $$ LANGUAGE SQL;
>>
>> EXPLAIN
>> SELECT 1 FROM ONLY "order"
>> WHERE orderid IN
>> (SELECT orderid FROM ONLY order_line WHERE ctid IN (SELECT
>> after_trigger_queue FROM after_trigger_queue() ))
>> FOR KEY SHARE;
>
> Agreed.

Hmm, although the above is cute, it still falls down by requiring lots
of memory (problem 2) and churning memory again (problem 3).

Lets rethink things to put a few options on the table and see what we get...

1. Store all the after trigger events in a big queue, then execute
later. That requires us to scroll it to disk to solve problem2, but it
still falls foul of problem3, which becomes severe on big data.

Implementation: Implement scrolling to disk for the after trigger
queue. Then implement event batching in the RI handler code, similar
to what is suggested above or directly as suggested by Noah on
previous post.

2. Don't store FK events in the after trigger queue at all, but apply
them as we go. That solves problems2 and 3. That could be considered
to be in violation of the SQL standard, which requires us to apply the
checks at the end of the statement. We already violate the standard
with regard to uniqueness checks, so doing it here doesn't seem
unacceptable.

Implementation: Given we know that COPY uses a ring buffer, and given
your earlier thoughts on use of a batched SQL, I have a new
suggestion. Every time the ring buffer fills, we go through the last
buffers accessed, pulling out all the PKs and then issue them as a
single SQL statement (as above). We can do that manually, or using the
block scan mentioned previously. This uses batched SQL to solve
problem1. It doesn't build up a large data structure in memory,
problem2, and it also solves problem3 by accessing data blocks before
they fall out of the ring buffer. If there are no duplicates in the
referenced table, then this behavious will do as much as possible to
make accesses to the referenced table also use a small working set.
(We may even wish to consider making the batched SQL use a separate
ring buffer for RI accesses). That approach doesn't make any
assumptions about duplicates.

3.
Strip the PK values from the rows at access time and store them in a
hash table, de-duplicating as we go. If that gets too big, write
seldom accessed values to a temporary table which will automatically
scroll to disk when it exceeds temp_buffers. We don't create the temp
table until we hit work_mem, so we only create that for larger
statements. Then at the end of the statement, copy the hash table into
the temp table and then join to the referenced table directly. We
don't guarantee the list of PK values is completely unique, but we
will have significantly reduced the number of duplicates to check.

Comparison:
Approach 1 suffers from memory scrolling, problem3. But it follows the
SQL standard.
Approach 2 solves all performance problems but doesn't follow the
letter of the standard (AIUI - anyone see differently?)
Approach 3 solves all performance problems and follows the SQL standard

Perhaps another way would be to avoid very large COPY statements
altogether, breaking down loads into smaller pieces. For example
pg_dump could issue COPY in 10000 row chunks rather than big copy.
That would mostly avoid problem2 and problem3 and is more realistic
with "wild" data.

Based on that thought, implementation option 1 looks most reasonable.
Which in short summary is a) scroll after trigger queue to disk and b)
batch SQL values together as Noah originally suggested, or in the SQL
above.

Let me know what you think of that analysis.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Greg Stark <stark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-05 09:37:51
Message-ID: CAM-w4HPsgKt9eUag8yB44sPnSwS2rUtyN_pF-212e5QxP-2j0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 1, 2013 at 9:41 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> COMMIT;
> The inserts into order_line repeatedly execute checks against the same
> ordid. Deferring and then de-duplicating the checks would optimise the
> transaction.
>
> Proposal: De-duplicate multiple checks against same value. This would
> be implemented by keeping a hash of rows that we had already either
> inserted and/or locked as the transaction progresses, so we can use
> the hash to avoid queuing up after triggers.

Fwiw the reason we don't do that now is that the rows might be later
deleted within the same transaction (or even the same statement I
think). If they are then the trigger needs to be skipped for that row
but still needs to happen for other rows. So you need to do some kind
of book-keeping to keep track of that. The easiest way was just to do
the check independently for each row. I think there's a comment about
this in the code.

I think you're right that this should be optimized because in the vast
majority of cases you don't end up deleting rows and we're currently
doing lots of redundant checks. But you need to make sure you don't
break the unusual case entirely.

--
greg


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-05 09:47:45
Message-ID: 51AF0941.5000704@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/05/2013 11:37 AM, Greg Stark wrote:
> On Sat, Jun 1, 2013 at 9:41 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> COMMIT;
>> The inserts into order_line repeatedly execute checks against the same
>> ordid. Deferring and then de-duplicating the checks would optimise the
>> transaction.
>>
>> Proposal: De-duplicate multiple checks against same value. This would
>> be implemented by keeping a hash of rows that we had already either
>> inserted and/or locked as the transaction progresses, so we can use
>> the hash to avoid queuing up after triggers.
>
> Fwiw the reason we don't do that now is that the rows might be later
> deleted within the same transaction (or even the same statement I
> think). If they are then the trigger needs to be skipped for that row
> but still needs to happen for other rows. So you need to do some kind
> of book-keeping to keep track of that. The easiest way was just to do
> the check independently for each row. I think there's a comment about
> this in the code.
A simple counter on each value should also solve this.
Increment for each row, decrement for each deletion,
then run the tests on values where counter is > 0
> I think you're right that this should be optimized because in the vast
> majority of cases you don't end up deleting rows and we're currently
> doing lots of redundant checks. But you need to make sure you don't
> break the unusual case entirely.
>

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-08 14:30:32
Message-ID: 20130608143032.GA413109@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote:
> > On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote:
> >> For clarity the 4 problems are
> >> 1. SQL execution overhead
> >> 2. Memory usage
> >> 3. Memory scrolling
> >> 4. Locking overhead, specifically FPWs and WAL records from FK checks
> >> probably in that order or thereabouts.

> Lets rethink things to put a few options on the table and see what we get...

> 2. Don't store FK events in the after trigger queue at all, but apply
> them as we go. That solves problems2 and 3. That could be considered
> to be in violation of the SQL standard, which requires us to apply the
> checks at the end of the statement. We already violate the standard
> with regard to uniqueness checks, so doing it here doesn't seem
> unacceptable.

I wouldn't like to see that compliance bug propagate to other constraint
types. What clauses in the standard demand end-of-statement timing, anyway?

What if we followed the example of deferred UNIQUE: attempt FK checks as we go
and enqueue an after-trigger recheck when such an initial test fails?

> Implementation: Given we know that COPY uses a ring buffer, and given
> your earlier thoughts on use of a batched SQL, I have a new
> suggestion. Every time the ring buffer fills, we go through the last
> buffers accessed, pulling out all the PKs and then issue them as a
> single SQL statement (as above). We can do that manually, or using the
> block scan mentioned previously. This uses batched SQL to solve
> problem1. It doesn't build up a large data structure in memory,
> problem2, and it also solves problem3 by accessing data blocks before
> they fall out of the ring buffer. If there are no duplicates in the
> referenced table, then this behavious will do as much as possible to
> make accesses to the referenced table also use a small working set.
> (We may even wish to consider making the batched SQL use a separate
> ring buffer for RI accesses). That approach doesn't make any
> assumptions about duplicates.

If this can be made standard-compliant, it sounds most fruitful.

> Perhaps another way would be to avoid very large COPY statements
> altogether, breaking down loads into smaller pieces.

True. It would be nice for the system to not need such hand-holding, but
that's a largely-adequate tool for coping in the field.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-08 20:39:14
Message-ID: CA+U5nMKuTzoArtjkx4mSgdrpuPkLigyie3Q+oeBKHvc2-WWknA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8 June 2013 15:30, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote:
>> > On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote:
>> >> For clarity the 4 problems are
>> >> 1. SQL execution overhead
>> >> 2. Memory usage
>> >> 3. Memory scrolling
>> >> 4. Locking overhead, specifically FPWs and WAL records from FK checks
>> >> probably in that order or thereabouts.
>
>> Lets rethink things to put a few options on the table and see what we get...
>
>> 2. Don't store FK events in the after trigger queue at all, but apply
>> them as we go. That solves problems2 and 3. That could be considered
>> to be in violation of the SQL standard, which requires us to apply the
>> checks at the end of the statement. We already violate the standard
>> with regard to uniqueness checks, so doing it here doesn't seem
>> unacceptable.
>
> I wouldn't like to see that compliance bug propagate to other constraint
> types. What clauses in the standard demand end-of-statement timing, anyway?
>
> What if we followed the example of deferred UNIQUE: attempt FK checks as we go
> and enqueue an after-trigger recheck when such an initial test fails?

The copy I have access to (2008 draft), 4.17.2 Checking of constraints

"The checking of a constraint depends on its constraint mode within
the current SQL-transaction. Whenever an SQL-statement is executed,
every constraint whose mode is immediate is checked, at a certain
point after any changes to SQL-data and schemas resulting from that
execution have been effected, to see if it is satisfied. A constraint
is satisfied if and only if the applicable <search condition> included
in its descriptor evaluates to True or Unknown. If any constraint is
not satisfied, then any changes to SQL-data or schemas resulting from
executing that statement are canceled. (See the General Rules of
Subclause 13.5, “<SQL procedure statement>”.

NOTE 31 — This includes SQL-statements that are executed as a direct
result or an indirect result of executing a different SQL-statement.
It also includes statements whose effects explicitly or implicitly
include setting the constraint mode to immediate. "

I can't see anything there that stops me applying locks as we go, but
I feel like someone will object...

This point seems crucial to the particular approach we take, so I need
wider input.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-08 21:41:33
Message-ID: 20130608214133.GA445736@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 08, 2013 at 09:39:14PM +0100, Simon Riggs wrote:
> On 8 June 2013 15:30, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote:

> >> 2. Don't store FK events in the after trigger queue at all, but apply
> >> them as we go. That solves problems2 and 3. That could be considered
> >> to be in violation of the SQL standard, which requires us to apply the
> >> checks at the end of the statement. We already violate the standard
> >> with regard to uniqueness checks, so doing it here doesn't seem
> >> unacceptable.
> >
> > I wouldn't like to see that compliance bug propagate to other constraint
> > types. What clauses in the standard demand end-of-statement timing, anyway?
> >
> > What if we followed the example of deferred UNIQUE: attempt FK checks as we go
> > and enqueue an after-trigger recheck when such an initial test fails?
>
> The copy I have access to (2008 draft), 4.17.2 Checking of constraints
>
> "The checking of a constraint depends on its constraint mode within
> the current SQL-transaction. Whenever an SQL-statement is executed,
> every constraint whose mode is immediate is checked, at a certain
> point after any changes to SQL-data and schemas resulting from that
> execution have been effected, to see if it is satisfied. A constraint
> is satisfied if and only if the applicable <search condition> included
> in its descriptor evaluates to True or Unknown. If any constraint is
> not satisfied, then any changes to SQL-data or schemas resulting from
> executing that statement are canceled. (See the General Rules of
> Subclause 13.5, “<SQL procedure statement>”.
>
> NOTE 31 — This includes SQL-statements that are executed as a direct
> result or an indirect result of executing a different SQL-statement.
> It also includes statements whose effects explicitly or implicitly
> include setting the constraint mode to immediate. "

This does appear to specify FK timing semantics like PostgreSQL gives today.
Namely, it does not permit a FK-induced error when later actions of the query
that prompted the check could possibly remedy the violation.

> I can't see anything there that stops me applying locks as we go, but

Likewise; I don't see why we couldn't perform an optimistic check ASAP and
schedule a final after-statement check when an early check fails. That
changes performance characteristics without changing semantics.

> I feel like someone will object...
>
> This point seems crucial to the particular approach we take, so I need
> wider input.

Agreed.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-09 00:20:42
Message-ID: CA+TgmobwWxrnP2-AfLv9pXsRr0ye=mWdT6cw-a_6rphORFvKZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> This does appear to specify FK timing semantics like PostgreSQL gives today.
> Namely, it does not permit a FK-induced error when later actions of the query
> that prompted the check could possibly remedy the violation.

Yeah. Standard or no standard, I think we'd have unhappy users if we
broke that.

>> I can't see anything there that stops me applying locks as we go, but

Not sure I follow that bit but...

> Likewise; I don't see why we couldn't perform an optimistic check ASAP and
> schedule a final after-statement check when an early check fails. That
> changes performance characteristics without changing semantics.

...this seems like it might have some promise; but what if the action
we're performing isn't idempotent? And how do we know?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-09 01:12:59
Message-ID: 20130609011259.GB445736@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote:
> On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > Likewise; I don't see why we couldn't perform an optimistic check ASAP and
> > schedule a final after-statement check when an early check fails. That
> > changes performance characteristics without changing semantics.
>
> ...this seems like it might have some promise; but what if the action
> we're performing isn't idempotent? And how do we know?

The action discussed so far is RI_FKey_check_ins(). It acquires a KEY SHARE
lock (idempotent by nature) on a row that it finds using B-tree equality
(presumed IMMUTABLE, thus idempotent). RI_FKey_check_upd() is nearly the same
action, so the same argument holds. Before treating any other operation in
the same way, one would need to conduct similar analysis.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-09 09:51:43
Message-ID: CA+U5nMKLOauuC8jXzQUH=oHgyAGEU76s7j4NoBLhv0nK5AwOZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 June 2013 02:12, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote:
>> On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > Likewise; I don't see why we couldn't perform an optimistic check ASAP and
>> > schedule a final after-statement check when an early check fails. That
>> > changes performance characteristics without changing semantics.
>>
>> ...this seems like it might have some promise; but what if the action
>> we're performing isn't idempotent? And how do we know?
>
> The action discussed so far is RI_FKey_check_ins(). It acquires a KEY SHARE
> lock (idempotent by nature) on a row that it finds using B-tree equality
> (presumed IMMUTABLE, thus idempotent). RI_FKey_check_upd() is nearly the same
> action, so the same argument holds. Before treating any other operation in
> the same way, one would need to conduct similar analysis.

As long as we are talking about FKs only, then this approach can work.
All we are doing is applying the locks slightly earlier than before.
Once locked they will prevent any later violations, so we are safe
from anybody except *ourselves* from making changes that would
invalidate the earlier check. Trouble is, there are various ways I
can see that as possible, so making a check early doesn't allow you to
avoid making the check later as well.

AFAICS there are weird cases where changing the way FKs execute will
change the way complex trigger applications will execute. I don't see
a way to avoid that other than "do nothing". Currently, we execute the
checks following the normal order of execution rules for triggers.
Every idea we've had so far changes that in some way; variously in
major or minor ways, but changed nonetheless.

Even the approach of deferring checks to allow them to be applied in a
batch mean we might change the way applications execute in detail.
However, since the only possible change there would be to decrease the
number of self-induced failures that seems OK.

So the question is how much change do we want to introduce? I'll guess
"not much", rather than "lots" or "none".

Proposal: Have a WHEN clause that accumulates values to be checked in
a hash table up to work_mem in size, allowing us to eliminate the most
common duplicates (just not *all* duplicates). If the value isn't a
duplicate (or at least the first seen tuple with that value), we will
queue up a check for later. That approach gives us *exactly* what we
have now and works with the two common cases: i) few, mostly
duplicated values, ii) many values, but clustered together. Then apply
changes in batches at end of statement.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-09 12:47:55
Message-ID: 20130609124755.GA24178@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-01 09:41:13 +0100, Simon Riggs wrote:
> FK checks can be expensive, especially when loading large volumes of
> data into an existing table or partition. A couple of ideas for
> improving performance are discussed here:

Another idea would be to optimize away the row level locks if we have a
table level lock that already provides more protection than the asked
for row level lock. In bulk data loading scenarios that's not uncommon
and can matter quite a bit, especially if loading in parallel since
both, the wal traffic and the dirtying of pages, is considerably
reduced.

Should be implementable relatively easily in heap_lock_tuple without
being visible to the outside.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Greg Stark <stark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-09 13:59:21
Message-ID: CAM-w4HNNoUp0PR10bzmj1=pbUvAioGqwJ3QTztT6wkvt2iLjKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 9, 2013 at 10:51 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> AFAICS there are weird cases where changing the way FKs execute will
> change the way complex trigger applications will execute. I don't see
> a way to avoid that other than "do nothing". Currently, we execute the
> checks following the normal order of execution rules for triggers.
> Every idea we've had so far changes that in some way; variously in
> major or minor ways, but changed nonetheless.

The obvious case to handle would be if someone has a trigger to
automatically create any missing references.

--
greg


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-09 17:13:15
Message-ID: CA+U5nMJSnKghxg_pt6e3Txn0a+3nc9j0wt0XQPoHvmiUYUSFuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 June 2013 14:59, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Sun, Jun 9, 2013 at 10:51 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> AFAICS there are weird cases where changing the way FKs execute will
>> change the way complex trigger applications will execute. I don't see
>> a way to avoid that other than "do nothing". Currently, we execute the
>> checks following the normal order of execution rules for triggers.
>> Every idea we've had so far changes that in some way; variously in
>> major or minor ways, but changed nonetheless.
>
> The obvious case to handle would be if someone has a trigger to
> automatically create any missing references.

Exactly my thoughts.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-10 06:06:05
Message-ID: 20130610060605.GD491289@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 09, 2013 at 10:51:43AM +0100, Simon Riggs wrote:
> On 9 June 2013 02:12, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote:
> >> On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >> > Likewise; I don't see why we couldn't perform an optimistic check ASAP and
> >> > schedule a final after-statement check when an early check fails. That
> >> > changes performance characteristics without changing semantics.
> >>
> >> ...this seems like it might have some promise; but what if the action
> >> we're performing isn't idempotent? And how do we know?
> >
> > The action discussed so far is RI_FKey_check_ins(). It acquires a KEY SHARE
> > lock (idempotent by nature) on a row that it finds using B-tree equality
> > (presumed IMMUTABLE, thus idempotent). RI_FKey_check_upd() is nearly the same
> > action, so the same argument holds. Before treating any other operation in
> > the same way, one would need to conduct similar analysis.
>
> As long as we are talking about FKs only, then this approach can work.
> All we are doing is applying the locks slightly earlier than before.
> Once locked they will prevent any later violations, so we are safe
> from anybody except *ourselves* from making changes that would
> invalidate the earlier check. Trouble is, there are various ways I
> can see that as possible, so making a check early doesn't allow you to
> avoid making the check later as well.

This UPDATE or DELETE that invalidates the check by modifying the PK row will
fire the usual RI_FKey_*_{upd,del} trigger on the PK table. That will (a)
fail the transaction, (b) CASCADE to delete the new FK row, or (c) update the
new FK row's key column to NULL/DEFAULT. If (a) happens we're of course fine.
If (b) or (c) happens, the FK's AFTER check already today becomes a no-op due
to the visibility test in RI_FKey_check(). Consequently, I don't think later
actions of the SQL statement can put us in a position to need a second check.

> AFAICS there are weird cases where changing the way FKs execute will
> change the way complex trigger applications will execute. I don't see
> a way to avoid that other than "do nothing". Currently, we execute the
> checks following the normal order of execution rules for triggers.
> Every idea we've had so far changes that in some way; variously in
> major or minor ways, but changed nonetheless.

I've tried to envision a trigger-creates-missing-references scenario that
would notice the difference. The trigger in question would, I'm presuming, be
an AFTER ROW INSERT trigger named such that it fires before the FK trigger.
The initial optimistic FK check would fail, so we would queue a traditional FK
AFTER trigger. From that point, the scenario proceeds exactly as it proceeds
today. Could you detail a problem scenario you have in mind?

> Even the approach of deferring checks to allow them to be applied in a
> batch mean we might change the way applications execute in detail.
> However, since the only possible change there would be to decrease the
> number of self-induced failures that seems OK.
>
> So the question is how much change do we want to introduce? I'll guess
> "not much", rather than "lots" or "none".

The batch would need to fire at the trigger firing position of the *last*
queue entry it covers. If you run a final FK check earlier than that, other
AFTER triggers that expect to run before the FK check and affect its outcome
may not yet have run. In contrast, an FK check running later than usual is
mostly fine; whether a tuple has been locked does not affect the semantics of
later ordinary actions in the transaction. (I say "ordinary" to exclude a
function like pgrowlocks() that makes a point to discern. Also note that the
reasoning about timing only applies to definitive FK checks that can throw
errors; a tentative, soft-failure check is acceptable anytime after the new
row is in place.)

One can, however, at least construct problem cases. When a query calls
functions that perform non-transactional actions, changing the timing of an
ERROR with respect to those calls changes application behavior. Taking locks
in a different order affects the incidence of deadlocks. Does compatibility
to that degree have much value? I'd be happy to file those in the "not much
change" category.

> Proposal: Have a WHEN clause that accumulates values to be checked in
> a hash table up to work_mem in size, allowing us to eliminate the most
> common duplicates (just not *all* duplicates). If the value isn't a
> duplicate (or at least the first seen tuple with that value), we will
> queue up a check for later. That approach gives us *exactly* what we
> have now and works with the two common cases: i) few, mostly
> duplicated values, ii) many values, but clustered together. Then apply
> changes in batches at end of statement.

I'm still fine with this proposal, but it does not dramatically sidestep these
sorts of tricky situations. Suppose a COPY inserts rows with fkcol=1 at TIDs
(0,3), (0,5) and (0,6). You queue the (0,3) check and omit the rest. This
works great so long as (0,3) still satisfies SnapshotSelf by the time the
check actually happens. If (0,3) is dead to SnapshotSelf, the check should
instead happen at the timing of (0,5). If that's dead, at the timing of
(0,6). If that too is dead, the check must not happen at all. By the time
you store enough logistics data to rigorously mirror current behavior, you've
recreated today's trigger queue.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-10 08:05:40
Message-ID: CA+U5nMJE4t8uADDwyP72nS0iE1jh8AjfQte4L9-34nK+GA8CUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10 June 2013 07:06, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sun, Jun 09, 2013 at 10:51:43AM +0100, Simon Riggs wrote:
>> On 9 June 2013 02:12, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote:
>> >> On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> >> > Likewise; I don't see why we couldn't perform an optimistic check ASAP and
>> >> > schedule a final after-statement check when an early check fails. That
>> >> > changes performance characteristics without changing semantics.
>> >>
>> >> ...this seems like it might have some promise; but what if the action
>> >> we're performing isn't idempotent? And how do we know?
>> >
>> > The action discussed so far is RI_FKey_check_ins(). It acquires a KEY SHARE
>> > lock (idempotent by nature) on a row that it finds using B-tree equality
>> > (presumed IMMUTABLE, thus idempotent). RI_FKey_check_upd() is nearly the same
>> > action, so the same argument holds. Before treating any other operation in
>> > the same way, one would need to conduct similar analysis.
>>
>> As long as we are talking about FKs only, then this approach can work.
>> All we are doing is applying the locks slightly earlier than before.
>> Once locked they will prevent any later violations, so we are safe
>> from anybody except *ourselves* from making changes that would
>> invalidate the earlier check. Trouble is, there are various ways I
>> can see that as possible, so making a check early doesn't allow you to
>> avoid making the check later as well.
>
> This UPDATE or DELETE that invalidates the check by modifying the PK row will
> fire the usual RI_FKey_*_{upd,del} trigger on the PK table. That will (a)
> fail the transaction, (b) CASCADE to delete the new FK row, or (c) update the
> new FK row's key column to NULL/DEFAULT. If (a) happens we're of course fine.
> If (b) or (c) happens, the FK's AFTER check already today becomes a no-op due
> to the visibility test in RI_FKey_check(). Consequently, I don't think later
> actions of the SQL statement can put us in a position to need a second check.
>
>> AFAICS there are weird cases where changing the way FKs execute will
>> change the way complex trigger applications will execute. I don't see
>> a way to avoid that other than "do nothing". Currently, we execute the
>> checks following the normal order of execution rules for triggers.
>> Every idea we've had so far changes that in some way; variously in
>> major or minor ways, but changed nonetheless.
>
> I've tried to envision a trigger-creates-missing-references scenario that
> would notice the difference. The trigger in question would, I'm presuming, be
> an AFTER ROW INSERT trigger named such that it fires before the FK trigger.
> The initial optimistic FK check would fail, so we would queue a traditional FK
> AFTER trigger. From that point, the scenario proceeds exactly as it proceeds
> today. Could you detail a problem scenario you have in mind?
>
>> Even the approach of deferring checks to allow them to be applied in a
>> batch mean we might change the way applications execute in detail.
>> However, since the only possible change there would be to decrease the
>> number of self-induced failures that seems OK.
>>
>> So the question is how much change do we want to introduce? I'll guess
>> "not much", rather than "lots" or "none".
>
> The batch would need to fire at the trigger firing position of the *last*
> queue entry it covers. If you run a final FK check earlier than that, other
> AFTER triggers that expect to run before the FK check and affect its outcome
> may not yet have run. In contrast, an FK check running later than usual is
> mostly fine; whether a tuple has been locked does not affect the semantics of
> later ordinary actions in the transaction. (I say "ordinary" to exclude a
> function like pgrowlocks() that makes a point to discern. Also note that the
> reasoning about timing only applies to definitive FK checks that can throw
> errors; a tentative, soft-failure check is acceptable anytime after the new
> row is in place.)
>
> One can, however, at least construct problem cases. When a query calls
> functions that perform non-transactional actions, changing the timing of an
> ERROR with respect to those calls changes application behavior. Taking locks
> in a different order affects the incidence of deadlocks. Does compatibility
> to that degree have much value? I'd be happy to file those in the "not much
> change" category.
>
>> Proposal: Have a WHEN clause that accumulates values to be checked in
>> a hash table up to work_mem in size, allowing us to eliminate the most
>> common duplicates (just not *all* duplicates). If the value isn't a
>> duplicate (or at least the first seen tuple with that value), we will
>> queue up a check for later. That approach gives us *exactly* what we
>> have now and works with the two common cases: i) few, mostly
>> duplicated values, ii) many values, but clustered together. Then apply
>> changes in batches at end of statement.
>
> I'm still fine with this proposal, but it does not dramatically sidestep these
> sorts of tricky situations. Suppose a COPY inserts rows with fkcol=1 at TIDs
> (0,3), (0,5) and (0,6). You queue the (0,3) check and omit the rest. This
> works great so long as (0,3) still satisfies SnapshotSelf by the time the
> check actually happens. If (0,3) is dead to SnapshotSelf, the check should
> instead happen at the timing of (0,5). If that's dead, at the timing of
> (0,6). If that too is dead, the check must not happen at all. By the time
> you store enough logistics data to rigorously mirror current behavior, you've
> recreated today's trigger queue.

Your earlier comments argue that it is OK to make an early check. The
above seems to argue the opposite, not sure.

IIUYC we can do this:

* search hash table for a value, if found, skip check and continue
* if entry in hash not found make an immediate FK check
* if the check passes, store value in hash table, if it fits
* if check does not pass or value doesn't fit, queue up an after
trigger queue entry

except we want to batch things a little, so same algo just with a
little batching.

* search hash table for a value, if found, skip check and continue
* if entry in hash not found add to next batch of checks and continue
* when batch full make immediate FK checks for whole batch in one SQL stmt
* if a check passes, store value in hash table, if it fits
* if check does not pass or value doesn't fit, queue up an after
trigger queue entry
* when executing queue, use batches to reduce number of SQL stmts

Which hopefully is the amalgam of all the good ideas so far

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-11 04:09:34
Message-ID: 20130611040934.GA570619@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 10, 2013 at 09:05:40AM +0100, Simon Riggs wrote:
> Your earlier comments argue that it is OK to make an early check. The
> above seems to argue the opposite, not sure.

I'll attempt to summarize. If we execute a traditional error-throwing FK
check any earlier than we execute it today, applications with certain triggers
will notice a behavior change (probably not OK). However, we *can* safely
execute an optimistic FK check as early as just after ExecInsertIndexTuples().
If the optimistic check is successful, later activity cannot invalidate its
success as concerns that particular inserted tuple.

> IIUYC we can do this:
>
> * search hash table for a value, if found, skip check and continue
> * if entry in hash not found make an immediate FK check
> * if the check passes, store value in hash table, if it fits
> * if check does not pass or value doesn't fit, queue up an after
> trigger queue entry

Why shall doesn't-fit prompt an after-statement recheck?

You do need a mechanism to invalidate table entries or the entire table. As a
first cut at that, perhaps have parent table RI triggers empty any local hash
tables of the same FK relationship. Note that invalidating table entries does
not invalidate skips already done on the strength of those entries.

> except we want to batch things a little, so same algo just with a
> little batching.
>
> * search hash table for a value, if found, skip check and continue
> * if entry in hash not found add to next batch of checks and continue
> * when batch full make immediate FK checks for whole batch in one SQL stmt
> * if a check passes, store value in hash table, if it fits
> * if check does not pass or value doesn't fit, queue up an after
> trigger queue entry
> * when executing queue, use batches to reduce number of SQL stmts

I think this all can be made to work, too.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com