Partitioning performance: cache stringToNode() of pg_constraint.ccbin

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Partitioning performance: cache stringToNode() of pg_constraint.ccbin
Date: 2013-06-03 19:07:27
Message-ID: 20130603190727.GA360354@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A colleague, Korry Douglas, observed a table partitioning scenario where
deserializing pg_constraint.ccbin is a hot spot. The following test case, a
simplification of a typical partitioning setup, spends 28% of its time in
stringToNode() and callees thereof:

\timing on
\set n 600000

BEGIN;

CREATE TABLE bench_check_constr_parent (c int);
CREATE TABLE bench_check_constr_child (
CHECK (c > 0 AND c <= 100000000)
) INHERITS (bench_check_constr_parent);
CREATE FUNCTION trig() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO bench_check_constr_child VALUES (NEW.*);
RETURN NULL;
END
$$;
CREATE TRIGGER redir BEFORE INSERT ON bench_check_constr_parent
FOR EACH ROW EXECUTE PROCEDURE trig();

-- Main benchmark
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;

-- Compare direct insert performance @ 10x volume
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;

ROLLBACK;

The executor caches each CHECK constraint in ResultRelInfo as a planned
expression. That cache is highly effectively for long-running statements, but
the trivial INSERTs effectively work without a cache. Korry devised this
patch to cache the stringToNode() form of the constraint in the relcache. It
improves the benchmark's partitioned scenario by 33%:

-- Timings (seconds) --
master, INSERT parent: 14.2, 14.4, 14.4
patched, INSERT parent: 9.6, 9.7, 9.7

master, INSERT*10 child: 9.9, 9.9, 10.2
patched, INSERT*10 child: 10.0, 10.2, 10.2

There's still not much to like about that tenfold overhead from use of the
partition routing trigger, but this patch makes a nice cut into that overhead
without doing anything aggressive. The profile no longer shows low-hanging
fruit; running an entire SQL statement per row piles on the runtime from a
wide range of sources. For anyone curious, I've attached output from "perf
report -s parent -g graph,1,caller" with the patch applied; I suggest browsing
under "less -S".

Some call sites need to modify the node tree, so the patch has them do
copyObject(). I ran a microbenchmark of copyObject() on the cached node tree
vs. redoing stringToNode(), and copyObject() still won by a factor of four.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
checkconstr-cache-v1.patch text/plain 7.5 KB
partition-trigger-cost.txt text/plain 56.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2013-06-03 19:48:26 Re: Vacuum, Freeze and Analyze: the big picture
Previous Message Simon Riggs 2013-06-03 19:00:19 Re: Optimising Foreign Key checks