char() overhead on read-only workloads not so insignifcant as the docs claim it is...

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: char() overhead on read-only workloads not so insignifcant as the docs claim it is...
Date: 2009-06-13 10:14:29
Message-ID: 4A337C05.7000008@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm currently doing some benchmarking on a Nehalem
box(http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html)
with 8.4 and while investigating what looks like issues in pgbench I
also noticed that using char() has more than a negligable overhead on
some (very special) readonly(!) workloads.

for example running sysbench in read-only mode against 8.4 results in a
profile(for the full run) that looks similiar to:

samples % symbol name
981690 11.0656 bcTruelen
359183 4.0487 index_getnext
311128 3.5070 AllocSetAlloc
272330 3.0697 hash_search_with_hash_value
258157 2.9099 LWLockAcquire
195673 2.2056 _bt_compare
190303 2.1451 slot_deform_tuple
168101 1.8948 PostgresMain
164191 1.8508 _bt_checkkeys
126110 1.4215 FunctionCall2
123965 1.3973 SearchCatCache
120629 1.3597 LWLockRelease

the default sysbench mode actually uses a number of different queries
and the ones dealing with char() are actually only a small part of the
full set of queries sent.
The specific query is causing bcTruelen to show up in the profile is:

"SELECT c from sbtest where id between $1 and $2 order by c" where the
parameters are for example
$1 = '5009559', $2 = '5009658' - ie ranges of 100.

benchmarking only that query results in:

samples % symbol name
2148182 23.5861 bcTruelen
369463 4.0565 index_getnext
362784 3.9832 AllocSetAlloc
284198 3.1204 slot_deform_tuple
185279 2.0343 _bt_checkkeys
180119 1.9776 LWLockAcquire
172733 1.8965 appendBinaryStringInfo
144158 1.5828 internal_putbytes
141040 1.5486 AllocSetFree
138093 1.5162 printtup
124255 1.3643 hash_search_with_hash_value
117054 1.2852 heap_form_minimal_tuple

at around 46000 queries/s

changing the fault sysbench schema from:

Table "public.sbtest"
Column | Type | Modifiers

--------+----------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character(120) | not null default ''::bpchar
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)

to
Table "public.sbtest"
Column | Type | Modifiers

--------+-------------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character varying | not null default ''::character varying
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)

results in a near 50%(!) speedup in terms of tps to around 67000
queries/s. This is however an extreme case because the c column actually
contains no data at all (except for an empty string).

the profile for the changed testcase looks like:
430797 5.2222 index_getnext
396750 4.8095 AllocSetAlloc
345508 4.1883 slot_deform_tuple
228222 2.7666 appendBinaryStringInfo
227766 2.7610 _bt_checkkeys
193818 2.3495 LWLockAcquire
179925 2.1811 internal_putbytes
168871 2.0471 printtup
152026 1.8429 AllocSetFree
146333 1.7739 heap_form_minimal_tuple
144305 1.7493 FunctionCall2
128320 1.5555 hash_search_with_hash_value

at the very least we should reconsider this part of our docs:

" There is no performance difference between these three types, apart
from increased storage space when using the blank-padded type, and a few
extra CPU cycles to check the length when storing into a
length-constrained column."

from http://www.postgresql.org/docs/8.4/static/datatype-character.html

regards

Stefan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-06-13 13:08:44 Re: machine-readable explain output
Previous Message David Fetter 2009-06-13 08:33:59 Re: [GENERAL] Using results from DELETE ... RETURNING