Improve catcache/syscache performance.

Lists: pgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Improve catcache/syscache performance.
Date: 2017-09-14 06:12:07
Message-ID: 20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

There's plenty workloads where SearchSysCache()/SearchCatCache() shows
up as a major runtime factor. Primarily in workloads with very fast
queries.

A fair while ago, before I had my commit bit, I'd posted [1]. Looking at
the profiles/benchmarks I was easily able to confirm that it still
helps, but that there's also still a lot left on the table.

Attached is a patch that tries to improve sys/catcache performance,
going further than the patch referenced earlier.

This primarily includes four pieces:

1) Avoidance of FunctionCallInfo based function calls, replaced by
more efficient functions with a native C argument interface.
2) Only initializing the ScanKey when necessary, i.e. catcache misses,
reduces cache unnecessary cpu cache misses.
3) Allowing the compiler to specialize critical SearchCatCache for a
specific number of attributes allows to unroll loops and avoid
other nkeys dependant initialization.
4) Split of the heap lookup from the hash lookup, reducing stack
allocations etc in the common case.

There's further potential:
- replace open coded hash with simplehash - the list walk right now
shows up in profiles.
- As oid is the only system column supported, avoid the use of
heap_getsysattr(), by adding an explicit branch for
ObjectIdAttributeNumber. This shows up in profiles.
- move cache initialization out of the search path
- add more proper functions, rather than macros for
SearchSysCacheCopyN etc., but right now they don't show up in profiles.

The reason the macro wrapper for syscache.c/h have to be changed,
rather than just catcache, is that doing otherwise would require
exposing the SysCache array to the outside. That might be a good idea
anyway, but it's for another day.

This patch gives me roughly 8% speedup in a workload that consists out
of a fast query that returns a lot of columns. If I apply a few
other performance patches, this patch itself starts to make a bigger
difference, of around 11%.

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20130905191323.GC490889@alap2.anarazel.de

Attachment Content-Type Size
0004-Add-inline-murmurhash32-int32-function.patch text/x-diff 2.3 KB
0006-Add-pg_noinline-macro-to-c.h.patch text/x-diff 1.4 KB
0007-Improve-sys-catcache-performance.patch text/x-diff 26.1 KB

From: amul sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-09-20 12:56:50
Message-ID: CAAJ_b97qe-djtnrMb6O-K0Q7ebtpGSWEXxLZ-2vSvby_K44CxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Patch 0007:

1:
400 + /*
401 + * XXX: might be worthwhile to only handle oid sysattr, to
reduce
402 + * overhead - it's the most common key.
403 + */

IMHO, let fix that as well. I tested this by fixing (see the attach patch)
but does
not found much gain on my local centos vm (of course, the pgbench load
wasn't big enough).

2: How about have wrapping following condition in SearchCatCacheMiss() by
unlikely():

if (IsBootstrapProcessingMode())
return NULL;

3: Can we have following assert in SearchCatCacheN() instead
SearchSysCacheN(), so that we'll assert direct SearchCatCacheN() call as
well?

Assert(SysCache[cacheId]->cc_nkeys == <N>);

Other than these concern, patch looks pretty reasonable to me.

Regards,
Amul

Attachment Content-Type Size
0007_ex_handle_oid.patch application/octet-stream 969 bytes

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improve catcache/syscache performance.
Date: 2017-09-22 06:15:45
Message-ID: 20170922061545.wkbzt7b6p6x47bzg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-09-13 23:12:07 -0700, Andres Freund wrote:
> Attached is a patch that tries to improve sys/catcache performance,
> going further than the patch referenced earlier.

Here's a variant that cleans up the previous changes a bit, and adds
some further improvements:

Here's the main commit message:

Improve sys/catcache performance.

The following are the individual improvements:
1) Avoidance of FunctionCallInfo based function calls, replaced by
more efficient functions with a native C argument interface.
2) Don't extract columns from a cache entry's tuple whenever matching
entries - instead store them as a Datum array. This also allows to
get rid of having to build dummy tuples for negative & list
entries, and of a hack for dealing with cstring vs. text weirdness.
3) Reorder members of catcache.h struct, so imortant entries are more
likely to be on one cacheline.
4) Allowing the compiler to specialize critical SearchCatCache for a
specific number of attributes allows to unroll loops and avoid
other nkeys dependant initialization.
5) Only initializing the ScanKey when necessary, i.e. catcache misses,
greatly reduces cache unnecessary cpu cache misses.
6) Split of the cache-miss case from the hash lookup, reducing stack
allocations etc in the common case.
7) CatCTup and their corresponding heaptuple are allocated in one
piece.

This results in making cache lookups themselves roughly three times as
fast - full-system benchmarks obviously improve less than that.

I've also evaluated further techniques:
- replace open coded hash with simplehash - the list walk right now
shows up in profiles. Unfortunately it's not easy to do so safely as
an entry's memory location can change at various times, which
doesn't work well with the refcounting and cache invalidation.
- Cacheline-aligning CatCTup entries - helps some with performance,
but the win isn't big and the code for it is ugly, because the
tuples have to be freed as well.
- add more proper functions, rather than macros for
SearchSysCacheCopyN etc., but right now they don't show up in
profiles.

The reason the macro wrapper for syscache.c/h have to be changed,
rather than just catcache, is that doing otherwise would require
exposing the SysCache array to the outside. That might be a good idea
anyway, but it's for another day.

With the attached benchmark for wide tuples and simple queries I get:

pgbench -M prepared -f ~/tmp/pgbench-many-cols.sql

master:
tps = 16112.117859 (excluding connections establishing)
tps = 16192.186504 (excluding connections establishing)
tps = 16091.257399 (excluding connections establishing)

patch:
tps = 18616.116993 (excluding connections establishing)
tps = 18584.036276 (excluding connections establishing)
tps = 18843.246281 (excluding connections establishing)

~17% gain

pgbench -M prepared -f ~/tmp/pgbench-many-cols.sql -c -j 16:
master:
tps = 73277.282455 (excluding connections establishing)
tps = 73078.408303 (excluding connections establishing)
tps = 73432.476550 (excluding connections establishing)

patch:
tps = 89424.043728 (excluding connections establishing)
tps = 89223.731307 (excluding connections establishing)
tps = 87830.665009 (excluding connections establishing)

~21% gain

standard pgbench readonly:
1 client:
master:
tps = 41662.984894 (excluding connections establishing)
tps = 40965.435121 (excluding connections establishing)
tps = 41438.197117 (excluding connections establishing)

patch:
tps = 42657.455818 (excluding connections establishing)
tps = 42834.812173 (excluding connections establishing)
tps = 42784.306987 (excluding connections establishing)

So roughly ~2.3%, much smaller, as expected, because the syscache is
much less of a bottleneck here.

-cj 16:
master:
tps = 204642.558752 (excluding connections establishing)
tps = 205834.493312 (excluding connections establishing)
tps = 207781.943687 (excluding connections establishing)

dev:
tps = 211459.087649 (excluding connections establishing)
tps = 214890.093976 (excluding connections establishing)
tps = 214526.773530 (excluding connections establishing)

So ~3.3%.

I personally find these numbers quite convincing for a fairly localized
microoptimization.

For the attached benchmark, here's the difference in profiles:
before:
single function overhead:
+ 8.10% postgres postgres [.] SearchCatCache
- 7.26% postgres libc-2.24.so [.] __memmove_avx_unaligned_erms
- __memmove_avx_unaligned_erms
+ 59.29% SearchCatCache
+ 23.51% appendBinaryStringInfo
+ 5.56% pgstat_report_activity
+ 4.05% socket_putmessage
+ 2.86% pstrdup
+ 2.65% AllocSetRealloc
+ 0.73% hash_search_with_hash_value
+ 0.68% btrescan
0.67% 0x55c02baea83f
+ 4.97% postgres postgres [.] appendBinaryStringInfo
+ 2.92% postgres postgres [.] ExecBuildProjectionInfo
+ 2.60% postgres libc-2.24.so [.] __strncpy_sse2_unaligned
+ 2.27% postgres postgres [.] hashoid
+ 2.18% postgres postgres [.] fmgr_info
+ 2.02% postgres libc-2.24.so [.] strlen

hierarchical / include child costs:
+ 21.35% 8.86% postgres postgres [.] SearchCatCache

after:
single function overhead:
+ 6.34% postgres postgres [.] appendBinaryStringInfo
+ 5.12% postgres postgres [.] SearchCatCache1
- 4.44% postgres libc-2.24.so [.] __memmove_avx_unaligned_erms
- __memmove_avx_unaligned_erms
+ 60.08% appendBinaryStringInfo
+ 13.88% AllocSetRealloc
+ 11.58% socket_putmessage
+ 6.54% pstrdup
+ 4.67% pgstat_report_activity
+ 1.20% pq_getbytes
+ 1.03% btrescan
1.03% 0x560d35168dab
+ 4.02% postgres postgres [.] fmgr_info
+ 3.18% postgres postgres [.] ExecBuildProjectionInfo
+ 2.43% postgres libc-2.24.so [.] strlen

hierarchical / include child costs:
+ 6.63% 5.12% postgres postgres [.] SearchCatCache1
+ 0.49% 0.49% postgres postgres [.] SearchSysCache1
+ 0.10% 0.10% postgres postgres [.] SearchCatCache3

(Most of the other top entries here are addressed in neirby threads)

- Andres

Attachment Content-Type Size
0001-Add-pg_noinline-macro-to-c.hv2.patch text/x-diff 1.5 KB
0002-Add-inline-murmurhash32-int32-functionv2.patch text/x-diff 2.4 KB
0003-Improve-sys-catcache-performancev2.patch text/x-diff 44.6 KB
pgbench-many-cols.sql application/x-sql 4.0 KB
create_many_cols.sql application/x-sql 6.7 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: amul sul <sulamul(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-09-22 06:17:36
Message-ID: 20170922061736.lvvsuh7awuivfmjq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-09-20 18:26:50 +0530, amul sul wrote:
> Patch 0007:

Thanks for looking!

> 1:
> 400 + /*
> 401 + * XXX: might be worthwhile to only handle oid sysattr, to
> reduce
> 402 + * overhead - it's the most common key.
> 403 + */
>
> IMHO, let fix that as well. I tested this by fixing (see the attach patch)
> but does
> not found much gain on my local centos vm (of course, the pgbench load
> wasn't big enough).

I ended up with a bigger patch, that removes all extractions from
tuples, by storing the extracted column in an array.

> 2: How about have wrapping following condition in SearchCatCacheMiss() by
> unlikely():
>
> if (IsBootstrapProcessingMode())
> return NULL;

Given this is the cache miss case, I can't get excited about it -
there's several 100ks of cycles to access the heap via an indexscan...

> 3: Can we have following assert in SearchCatCacheN() instead
> SearchSysCacheN(), so that we'll assert direct SearchCatCacheN() call
> as well?
>
> Assert(SysCache[cacheId]->cc_nkeys == <N>);

Done, although I kept the others too.

> Other than these concern, patch looks pretty reasonable to me.

I'd appreciate if you could have a look at the new version as well.

Regards,

Andres


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-09-22 11:49:42
Message-ID: CA+Tgmoa01m9cdpztu8QCFkrjkGWX5RJP5e=pSgquZ_p0Eh31hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 22, 2017 at 2:15 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> This results in making cache lookups themselves roughly three times as
> fast - full-system benchmarks obviously improve less than that.

Wow. That's really good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improve catcache/syscache performance.
Date: 2017-09-26 10:41:58
Message-ID: e58c4c27-b69e-2ec4-5307-e0130010250f@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/22/2017 11:45 AM, Andres Freund wrote:
> Here's a variant that cleans up the previous changes a bit, and adds
> some further improvements:

I tested with different pgbench options with master v/s patch and found an improvement. I have applied 001 and 003 patch on PG Head ,patch 0002 was already committed.

Virtual Machine configuration - Centos 6.5 x64 / 16 GB RAM / 8 VCPU core processor

Scaling factor=30

pgbench -M prepared -T 200 postgres

PG Head - tps = 902.225954 (excluding connections establishing).
PG HEAD+patch - tps = 1001.896381 (10.97+% vs. head)

pgbench -M prepared -T 300 postgres

PG Head - tps = 920.108333 (excluding connections establishing).
PG HEAD+patch - tps = 1023.89542 (11.19+% vs. head)

pgbench -M prepared -T 500 postgres

PG Head - tps = 995.178227 (excluding connections establishing)
PG HEAD+patch - tps = 1078.32222 (+8.34% vs. head)

Later I modified the create_many_cols.sql file (previously attached) and instead of
only using int , I mixed it with varchar/int4/numeric/float and run pgbench
with different time duration

pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 300 postgres

PG Head - tps = 5540.143877 (excluding connections establishing).
PG HEAD+patch - tps = 5679.713493 (2.50+% vs. head)

pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 500 postgres

PG Head - tps = 5519.212709 (excluding connections establishing).
PG HEAD+patch - tps = 5967.059155 (8.11+% vs. head)

pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 700 postgres

PG Head - tps = 5640.314495(excluding connections establishing).
PG HEAD+patch - tps = 6012.223147 (6.59+% vs. head)

-- regards,tushar
EnterpriseDBhttps://www.enterprisedb.com/
The Enterprise PostgreSQL Company


From: amul sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-09-26 11:30:28
Message-ID: CAAJ_b97yLQu6xYV+fHcOac-fy9i1ELPrVrVGe1bBE-hRxedi2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 22, 2017 at 11:47 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2017-09-20 18:26:50 +0530, amul sul wrote:
> > Patch 0007:
>

> Other than these concern, patch looks pretty reasonable to me.
>
> I'd appreciate if you could have a look at the new version as well.
>
>
​I have reviewed

the newer version
​, looks good to me​.


Here are the few cosmetic comments ​for the 0003 patch​​:

1
​.
982 + ct->tuple.t_data = (HeapTupleHeader)
983 + MAXALIGN(((char *) ct) + sizeof(CatCTup));

It would be nice if you add a comment for this address alignment​​
​.​

2
​.
947 /*
948 - * If there are any out-of-line toasted fields in the tuple,
expand them
949 - * in-line. This saves cycles during later use of the catcache
entry, and
950 - * also protects us against the possibility of the toast tuples
being
951 - * freed before we attempt to fetch them, in case of something
using a
952 - * slightly stale catcache entry.
953 */

Empty comment block left in the CatalogCacheCreateEntry().

3
​.
​ ​
411 +/*
412 + * CatalogCacheCompareTuple
413 + *
414 + * Compare a tuple to the passed arguments.
415 + */
416 +static inline bool
417 +CatalogCacheCompareTuple(const CatCache *cache, int nkeys,
418 + const Datum *cachekeys,
419 + const Datum *searchkeys)

​Need an adjust to the p
rolog comment
​.​

Regards,
Amul


From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Cc: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-09-26 13:24:14
Message-ID: 6921cef2-4c69-d764-d376-406c41e73e92@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/26/2017 06:41 AM, tushar wrote:
> On 09/22/2017 11:45 AM, Andres Freund wrote:
>> Here's a variant that cleans up the previous changes a bit, and adds
>> some further improvements:
>
> I tested with different pgbench options with  master v/s patch and found
> an improvement.  I have applied 001 and 003 patch on PG Head ,patch 0002
> was already committed.
>
> Virtual Machine configuration - Centos 6.5 x64 / 16 GB RAM / 8 VCPU core
> processor
>
> Scaling factor=30
>
> pgbench -M prepared -T 200 postgres
>
> PG Head       -  tps = 902.225954 (excluding connections establishing).
> PG HEAD+patch -  tps = 1001.896381 (10.97+% vs. head)
>
>
> pgbench -M prepared -T 300 postgres
>
> PG Head       -  tps = 920.108333 (excluding connections establishing).
> PG HEAD+patch -  tps = 1023.89542 (11.19+% vs. head)
>
> pgbench -M prepared -T 500 postgres
>
> PG Head       -  tps = 995.178227 (excluding connections establishing)
> PG HEAD+patch -  tps = 1078.32222 (+8.34% vs. head)
>
>
> Later I modified the create_many_cols.sql file (previously attached) and
> instead of
> only using int  , I mixed it with varchar/int4/numeric/float and run
> pgbench
> with different time duration
>
>
> pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 300  postgres
>
> PG Head       -  tps =  5540.143877 (excluding connections establishing).
> PG HEAD+patch -  tps =  5679.713493 (2.50+% vs. head)
>
>
> pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 500  postgres
>
> PG Head       -  tps = 5519.212709 (excluding connections establishing).
> PG HEAD+patch -  tps = 5967.059155 (8.11+% vs. head)
>
>
> pgbench -M prepared -f /tmp/pgbench-many-cols.sql -T 700  postgres
>
> PG Head       -  tps = 5640.314495(excluding connections establishing).
> PG HEAD+patch -  tps = 6012.223147 (6.59+% vs. head)
>

I'm also seeing a speedup on a 2S/28C/56T/256Gb + 2 x RAID10 SSD machine
using -M prepared, -M prepared -N and -M prepared -S scenarios with
various scale factors, and custom queries.

Small typo in 0002- / commit 791961:

diff --git a/src/include/utils/hashutils.h b/src/include/utils/hashutils.h
index 35281689e8..366bd0e78b 100644
--- a/src/include/utils/hashutils.h
+++ b/src/include/utils/hashutils.h
@@ -22,7 +22,7 @@ hash_combine(uint32 a, uint32 b)

/*
- * Simple inline murmur hash implementation hashing a 32 bit ingeger, for
+ * Simple inline murmur hash implementation hashing a 32 bit integer, for
* performance.
*/
static inline uint32

Best regards,
Jesper


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-10-13 17:06:41
Message-ID: CA+TgmoY=C9t4sRm6U3dtTyQA3T1_ULFC9+RXwEwSXSCnf-9Rsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 14, 2017 at 2:12 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> This patch gives me roughly 8% speedup in a workload that consists out
> of a fast query that returns a lot of columns. If I apply a few
> other performance patches, this patch itself starts to make a bigger
> difference, of around 11%.

I did a read-through of this patch today. I don't see anything really
serious to complain about here. Somebody might question the use of
the no-inline stuff, but it seems sensible to me in this context.

+ /* not as performance critical & "complicated" */

This comment kinda sucks. I don't think it will be clear to somebody
in 3 years what this means. It's clear enough in context but later I
think it won't be. I suggest dumping this altogether and expanding
the comment up above to encompass this:

Hash and equality functions for system types that are used as cache
key fields. In some cases, we just call the regular SQL-callable
functions for the appropriate data type, but that tends to be a little
slow, and the speed of these functions is performance-critical.
Therefore, for data types that frequently occur as catcache keys, we
hard-code the logic here. Avoiding the overhead of
DirectFunctionCallN(...) is a substantial win, and in certain cases
(like int4) we can adopt a faster hash algorithm as well.

+ {
+ return false;
+ }

Excess braces.

+ * The use of argument specific numbers is encouraged, they're faster, and
+ * insulates the caller from changes in the maximum number of keys.

s/, they're faster/. They're faster/

- if (cache->cc_tupdesc == NULL)
+ if (unlikely(cache->cc_tupdesc == NULL))
CatalogCacheInitializeCache(cache);

I don't think it's this patch's job to do it, but it seems like we
ought to just invent some early-initialization step where things like
this can happen, so that we don't have to branch here at all.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-10-13 17:38:47
Message-ID: 20171013173847.tdcfxolwcfrf2jp7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-10-13 13:06:41 -0400, Robert Haas wrote:
> On Thu, Sep 14, 2017 at 2:12 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > This patch gives me roughly 8% speedup in a workload that consists out
> > of a fast query that returns a lot of columns. If I apply a few
> > other performance patches, this patch itself starts to make a bigger
> > difference, of around 11%.
>
> I did a read-through of this patch today.

Thanks!

> + /* not as performance critical & "complicated" */
>
> This comment kinda sucks. I don't think it will be clear to somebody
> in 3 years what this means. It's clear enough in context but later I
> think it won't be. I suggest dumping this altogether and expanding
> the comment up above to encompass this:
>
> Hash and equality functions for system types that are used as cache
> key fields. In some cases, we just call the regular SQL-callable
> functions for the appropriate data type, but that tends to be a little
> slow, and the speed of these functions is performance-critical.
> Therefore, for data types that frequently occur as catcache keys, we
> hard-code the logic here. Avoiding the overhead of
> DirectFunctionCallN(...) is a substantial win, and in certain cases
> (like int4) we can adopt a faster hash algorithm as well.

K.

> - if (cache->cc_tupdesc == NULL)
> + if (unlikely(cache->cc_tupdesc == NULL))
> CatalogCacheInitializeCache(cache);
>
> I don't think it's this patch's job to do it, but it seems like we
> ought to just invent some early-initialization step where things like
> this can happen, so that we don't have to branch here at all.

Yea, that'd be nice - it does show up in profiles. I'd tried to do
that, but it's not exactly trivial, so I decided to delay it. The
ordering when syscache stuff gets initialized is, uh, fragile. During
InitCatalogCache() the catalog is not ready for access. After that, we
access catcaches while not all catalogs are quite ready to be accessed.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-10-13 18:07:54
Message-ID: 1295.1507918074@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-10-13 13:06:41 -0400, Robert Haas wrote:
>> I don't think it's this patch's job to do it, but it seems like we
>> ought to just invent some early-initialization step where things like
>> this can happen, so that we don't have to branch here at all.

> Yea, that'd be nice - it does show up in profiles. I'd tried to do
> that, but it's not exactly trivial, so I decided to delay it. The
> ordering when syscache stuff gets initialized is, uh, fragile. During
> InitCatalogCache() the catalog is not ready for access. After that, we
> access catcaches while not all catalogs are quite ready to be accessed.

Yeah, I think the fragility vs. benefit tradeoff here is not very
promising.

One idea might be to see if we can precalculate all the control data
needed for the caches and set it up as compile-time constants,
a la Gen_fmgrtab.pl, rather than reading it from the catalogs during
startup. That would make the code less dependent on initialization
order rather than more so.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-10-13 18:17:19
Message-ID: 20171013181719.m6f45x6aqs7yxjpl@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-10-13 14:07:54 -0400, Tom Lane wrote:
> One idea might be to see if we can precalculate all the control data
> needed for the caches and set it up as compile-time constants,
> a la Gen_fmgrtab.pl, rather than reading it from the catalogs during
> startup. That would make the code less dependent on initialization
> order rather than more so.

Hm. That sounds somewhat enticing. You're thinking of doing so for
catcaches alone, or something grander, including the relcaches? I'd
assume the former?

For catcaches the hardest part probably is that we need a TupleDesc. Per
type function lookups, oids, should be fairly easy in contrast.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-10-13 18:27:45
Message-ID: 2202.1507919265@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-10-13 14:07:54 -0400, Tom Lane wrote:
>> One idea might be to see if we can precalculate all the control data
>> needed for the caches and set it up as compile-time constants,
>> a la Gen_fmgrtab.pl, rather than reading it from the catalogs during
>> startup. That would make the code less dependent on initialization
>> order rather than more so.

> Hm. That sounds somewhat enticing. You're thinking of doing so for
> catcaches alone, or something grander, including the relcaches? I'd
> assume the former?

Yeah. The relcaches are basically impossible to precalculate as constants
because they contain run-time variable data such as relfilenode. I might
be wrong because I didn't go look, but offhand I think there is nothing in
the catcache control data that couldn't be predetermined.

> For catcaches the hardest part probably is that we need a TupleDesc. Per
> type function lookups, oids, should be fairly easy in contrast.

We already have a mechanism for precalculated TupleDescs for system
catalogs, cf src/backend/catalog/schemapg.h. Right now we only apply
that for a few "bootstrap" system catalogs, but I don't really see
a reason we couldn't use it for every catalog that has a catcache.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve catcache/syscache performance.
Date: 2017-10-13 21:36:45
Message-ID: 20171013213645.fuolk62xaedj3og7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-10-13 10:38:47 -0700, Andres Freund wrote:
> On 2017-10-13 13:06:41 -0400, Robert Haas wrote:
> > On Thu, Sep 14, 2017 at 2:12 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > This patch gives me roughly 8% speedup in a workload that consists out
> > > of a fast query that returns a lot of columns. If I apply a few
> > > other performance patches, this patch itself starts to make a bigger
> > > difference, of around 11%.
> >
> > I did a read-through of this patch today.
>
> Thanks!

Pushed after making the adaptions you suggested, pgindenting, and fixing
one bug (cstring lookups + datumCopy() = not good).

Greetings,

Andres Freund


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: andres(at)anarazel(dot)de
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Improve catcache/syscache performance.
Date: 2017-11-17 04:13:49
Message-ID: 20171117.131349.243586836.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

At Fri, 13 Oct 2017 14:36:45 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in <20171013213645(dot)fuolk62xaedj3og7(at)alap3(dot)anarazel(dot)de>
> Pushed after making the adaptions you suggested, pgindenting, and fixing
> one bug (cstring lookups + datumCopy() = not good).
>
> Greetings,
>
> Andres Freund

The folloing CF entry (in Needs review state) is of this thread
and this is already pushed. May I mark this as committed or
anyone else does it?

https://commitfest.postgresql.org/15/1300/

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Improve catcache/syscache performance.
Date: 2017-11-17 04:16:57
Message-ID: CAB7nPqSTPSDPQFWNe0NdGp3E2oLXdHpyg1UXeEVerVi-UHjc5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 17, 2017 at 1:13 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Fri, 13 Oct 2017 14:36:45 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in <20171013213645(dot)fuolk62xaedj3og7(at)alap3(dot)anarazel(dot)de>
>> Pushed after making the adaptions you suggested, pgindenting, and fixing
>> one bug (cstring lookups + datumCopy() = not good).
>>
>> Greetings,
>>
>> Andres Freund
>
> The folloing CF entry (in Needs review state) is of this thread
> and this is already pushed. May I mark this as committed or
> anyone else does it?
>
> https://commitfest.postgresql.org/15/1300/

Done. Please make sure to mark as committed patches that are
registered once they are committed, this makes the life of a CFM a bit
easier.
--
Michael