domain type smashing is expensive

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
Subject: domain type smashing is expensive
Date: 2017-09-12 17:17:58
Message-ID: CA+Tgmobj72E_tG6w98H0oUbCCUmoC4uRmjocYPbnWC2RxYACeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On short-running queries that return a lot of columns,
SendRowDescriptionMessage's calls to getBaseTypeAndTypmod() are a
noticeable expense. The following change improves performance on a
query returning 100 columns by about 6% when prepared queries are in
use (Mithun Cy and I both tested and got similar results; his testing
was more rigorous than mine).

--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2295,6 +2295,9 @@ getBaseTypeAndTypmod(Oid typid, int32 *typmod)
HeapTuple tup;
Form_pg_type typTup;

+ if (typid < FirstBootstrapObjectId)
+ break;
+
tup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
if (!HeapTupleIsValid(tup))
elog(ERROR, "cache lookup failed for type %u", typid);

I cannot claim to love that hack, but the performance improvement is
nice. There's no real problem with the hack -- it just embeds an
assumption that pg_type.h will never define a domain type, which
doesn't seem like a terribly problematic assumption, and we could
always add a comment someplace to validate it, or even teach the
script that processes the catalog scripts to enforce it. But it's
not, like, the most elegant thing anybody's ever done.

I see two other options:

1. Revisit the decision to smash domain types to base types here.
That change was made by Tom Lane back in 2003
(d9b679c13a820eb7b464a1eeb1f177c3fea13ece) but the commit message only
says *that* we decided to do it, not *why* we decided to do it, and
the one-line comment added by that commit doesn't do any better.

2. Precompute the list of types to be sent to the client during
planning instead of during execution. The point of prepared
statements is supposed to be to do as much of the work as possible at
prepare time so that bind/execute is as fast as possible, but we're
not really adhering to that design philosophy here. However, I don't
have a clear idea of exactly how to do that.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2017-09-12 17:19:22 Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)
Previous Message Tom Lane 2017-09-12 17:13:22 Re: pgbench regression test failure