Improvement of search for a binary operator

Lists: pgsql-patches
From: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
To: pgsql-patches(at)postgresql(dot)org
Subject: Improvement of search for a binary operator
Date: 2006-04-27 12:27:54
Message-ID: 4450B8CA.8070704@hi-ho.ne.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Here is a gprof result of the pgbench.

% cumulative self self total
time seconds seconds calls s/call s/call name
4.77 4.04 4.04 14000 0.00 0.00 OpernameGetCandidates
4.70 8.02 3.98 2007008 0.00 0.00
HeapTupleSatisfiesSnapshot
4.50 11.83 3.81 14001 0.00 0.00 base_yyparse
3.78 15.03 3.20 2838312 0.00 0.00 AllocSetAlloc
3.73 18.18 3.15 40001 0.00 0.00 heapgetpage

The OpernameGetCandidates called from oper. The function of oper is
search for a binary operator. It does the following processing:

(1)Create candidates of operator that matches operator name and
operator kind by OpernameGetCandidates.
(2)Find an operator that matches exactly ltypeId and rtypeId from
the candidates of operator by binary_oper_exact.
(3)If not found, find an operator from the candidates of operator by
oper_select_candidate. The oper_select_candidate accepts coerce type
and resolve the conflict.

I think that we can search the system catalog cache instead of
retrieval from the candidates of operator in the binary_oper_exact,
and reverse the order of executing (1) and (2) for performance
improvement. Because for a general operator such as '=', the number
of the candidates of operator exceeds 50. And in common cases the
typeIds matches exactly.

I tested following commands:

pgbench -i
pgbench -t 4000 (5 times)

results (tps):
1 2 3 4 5 average
-----------------------------------------------------------
original: 214.33 184.60 192.52 158.62 170.04 184.02
patched : 223.86 198.72 207.48 165.70 179.67 195.09

regards,

---
Atsushi Ogawa

Attachment Content-Type Size
oper.patch text/plain 5.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Improvement of search for a binary operator
Date: 2006-04-27 16:22:47
Message-ID: 11938.1146154967@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> writes:
> The OpernameGetCandidates called from oper. The function of oper is
> search for a binary operator. It does the following processing:

> (1)Create candidates of operator that matches operator name and
> operator kind by OpernameGetCandidates.
> (2)Find an operator that matches exactly ltypeId and rtypeId from
> the candidates of operator by binary_oper_exact.
> (3)If not found, find an operator from the candidates of operator by
> oper_select_candidate. The oper_select_candidate accepts coerce type
> and resolve the conflict.

> I think that we can search the system catalog cache instead of
> retrieval from the candidates of operator in the binary_oper_exact,
> and reverse the order of executing (1) and (2) for performance
> improvement.

AFAICT, this will make things a bit faster if there is an exact match,
and quite a bit slower if there is not (especially if the search path
is long). I've known for awhile that OpernameGetCandidates is a
bottleneck, but I don't want a solution that optimizes some cases at the
price of making others worse. pgbench is not a good model of the real
world for such tradeoffs.

Whatever solution we find, it should be applied to the unary operator
paths as well. It's not appropriate to introduce gratuitous differences
between the binary and unary operator paths. (Again, pgbench is a poor
model of the real world ... I don't think it even uses any unary
operators.)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Improvement of search for a binary operator
Date: 2006-05-01 20:47:19
Message-ID: 8127.1146516439@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> writes:
> I think that we can search the system catalog cache instead of
> retrieval from the candidates of operator in the binary_oper_exact,
> and reverse the order of executing (1) and (2) for performance
> improvement.

I've been thinking more about how this might be improved. In a test
involving the same trivial query repeated many times ("select * from tab
where foo = 'constant'") I see oprofile results like this:

samples % symbol name
95234 5.5458 AllocSetAlloc
89622 5.2190 hash_search
86038 5.0103 OpernameGetCandidates
76747 4.4692 SearchCatCache
64042 3.7294 base_yyparse
39473 2.2987 LWLockAcquire
32195 1.8748 base_yylex
25024 1.4572 nocachegetattr
21913 1.2761 MemoryContextAllocZeroAligned
21268 1.2385 _bt_compare
19665 1.1452 fmgr_info_cxt_security
18430 1.0732 LWLockRelease
17312 1.0081 expression_tree_walker
...
7787 0.4535 SearchCatCacheList
...

The problem with the patch you proposed is that when there *isn't* an
exact match, it costs N extra SearchCatCache probes, where N is the
length of the search_path. We need a solution that limits the damage
in that case.

The solution I'm considering is to add an additional namespace.c routine
defined like
Oid OpnameGetOprid(List *opname, Oid oprleft, Oid oprright)
that returns the first exact match in the search path, or 0 if none.
(parse_oper.c would try this before OpernameGetCandidates.) The secret
is that it should use a SearchCatCacheList lookup instead of repeated
SearchCatCache probles. If the list lookup is successful, it will
normally produce a very short result (typically only one member) and so
finding the first visible member if any is cheap.

With this approach the penalty for failure is just one extra
SearchCatCacheList lookup (which has cost comparable to SearchCatCache,
I believe, once the cache is set up) instead of N SearchCatCache
lookups. In the example above this should add at most about half a
percent of total system runtime. The potential of making a large dent
in OpernameGetCandidates' five percent makes that look like a good trade.

If this works out we might want to think about converting some of the
other namespace.c routines to use similar tactics. I'm not sure I've
ever seen them high in a profile though.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Improvement of search for a binary operator
Date: 2006-05-01 23:09:23
Message-ID: 20720.1146524963@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

I wrote:
> The solution I'm considering is to add an additional namespace.c routine
> defined like
> Oid OpnameGetOprid(List *opname, Oid oprleft, Oid oprright)

I coded this up, and it seems to be a win just on code cleanliness
grounds, because there are several places that want to do a search for
an operator with exact input types given; this routine fits their
needs exactly while OpernameGetCandidates was a poor fit. It seems to
behave as advertised in terms of getting rid of OpernameGetCandidates
overhead in the exact-match case --- this only makes for a few percent
overall improvement in the test case I'm looking at, but that's about
what's expected.

I made a non-exact-match test case by changing char(N) to varchar(N)
in the table definitions, and what I see is

samples % symbol name
189998 11.9853 SearchCatCache
68888 4.3455 AllocSetAlloc
66793 4.2134 hash_search
49284 3.1089 OpernameGetCandidates
46472 2.9315 base_yyparse
28268 1.7832 LWLockAcquire
27487 1.7339 DirectFunctionCall1
27071 1.7077 DLMoveToFront
26375 1.6638 nocachegetattr
24956 1.5743 SearchSysCache
21461 1.3538 base_yylex
20176 1.2727 heap_getsysattr
19466 1.2279 FunctionCall2
18148 1.1448 CatalogCacheComputeHashValue
17608 1.1107 _bt_compare
16606 1.0475 MemoryContextAllocZeroAligned
...
6870 0.4334 SearchCatCacheList

This case is significantly slower than the other one anyway, and the
reason is evidently a lot more SearchCatCache calls. AFAICT those
can all be blamed on the getBaseType() calls that are sprinkled through
parse_func.c and parse_oper.c. I thought those would probably come back
to haunt us from a performance standpoint someday :-(

Anyway, I can't measure any performance difference in the
non-exact-match case. Maybe if we got rid of the getBaseType calls
it'd be worth checking again, but for now this seems like a win.
I'll go ahead and commit it.

regards, tom lane