Re: [PATCH] backend: compare word-at-a-time in bcTruelen

Lists: pgsql-hackers
From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: char() overhead on read-only workloads not so insignifcant as the docs claim it is...
Date: 2009-06-13 10:14:29
Message-ID: 4A337C05.7000008@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm currently doing some benchmarking on a Nehalem
box(http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html)
with 8.4 and while investigating what looks like issues in pgbench I
also noticed that using char() has more than a negligable overhead on
some (very special) readonly(!) workloads.

for example running sysbench in read-only mode against 8.4 results in a
profile(for the full run) that looks similiar to:

samples % symbol name
981690 11.0656 bcTruelen
359183 4.0487 index_getnext
311128 3.5070 AllocSetAlloc
272330 3.0697 hash_search_with_hash_value
258157 2.9099 LWLockAcquire
195673 2.2056 _bt_compare
190303 2.1451 slot_deform_tuple
168101 1.8948 PostgresMain
164191 1.8508 _bt_checkkeys
126110 1.4215 FunctionCall2
123965 1.3973 SearchCatCache
120629 1.3597 LWLockRelease

the default sysbench mode actually uses a number of different queries
and the ones dealing with char() are actually only a small part of the
full set of queries sent.
The specific query is causing bcTruelen to show up in the profile is:

"SELECT c from sbtest where id between $1 and $2 order by c" where the
parameters are for example
$1 = '5009559', $2 = '5009658' - ie ranges of 100.

benchmarking only that query results in:

samples % symbol name
2148182 23.5861 bcTruelen
369463 4.0565 index_getnext
362784 3.9832 AllocSetAlloc
284198 3.1204 slot_deform_tuple
185279 2.0343 _bt_checkkeys
180119 1.9776 LWLockAcquire
172733 1.8965 appendBinaryStringInfo
144158 1.5828 internal_putbytes
141040 1.5486 AllocSetFree
138093 1.5162 printtup
124255 1.3643 hash_search_with_hash_value
117054 1.2852 heap_form_minimal_tuple

at around 46000 queries/s

changing the fault sysbench schema from:

Table "public.sbtest"
Column | Type | Modifiers

--------+----------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character(120) | not null default ''::bpchar
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)

to
Table "public.sbtest"
Column | Type | Modifiers

--------+-------------------+-----------------------------------------------------
id | integer | not null default
nextval('sbtest_id_seq'::regclass)
k | integer | not null default 0
c | character varying | not null default ''::character varying
pad | character(60) | not null default ''::bpchar
Indexes:
"sbtest_pkey" PRIMARY KEY, btree (id)
"k" btree (k)

results in a near 50%(!) speedup in terms of tps to around 67000
queries/s. This is however an extreme case because the c column actually
contains no data at all (except for an empty string).

the profile for the changed testcase looks like:
430797 5.2222 index_getnext
396750 4.8095 AllocSetAlloc
345508 4.1883 slot_deform_tuple
228222 2.7666 appendBinaryStringInfo
227766 2.7610 _bt_checkkeys
193818 2.3495 LWLockAcquire
179925 2.1811 internal_putbytes
168871 2.0471 printtup
152026 1.8429 AllocSetFree
146333 1.7739 heap_form_minimal_tuple
144305 1.7493 FunctionCall2
128320 1.5555 hash_search_with_hash_value

at the very least we should reconsider this part of our docs:

" There is no performance difference between these three types, apart
from increased storage space when using the blank-padded type, and a few
extra CPU cycles to check the length when storing into a
length-constrained column."

from http://www.postgresql.org/docs/8.4/static/datatype-character.html

regards

Stefan


From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Subject: Re: char() overhead on read-only workloads not so insignifcant as the docs claim it is...
Date: 2009-06-15 12:28:07
Message-ID: 65937bea0906150528h497b8c4ax98876789f713181e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Comments?

On Sat, Jun 13, 2009 at 3:44 PM, Stefan Kaltenbrunner
<stefan(at)kaltenbrunner(dot)cc> wrote:

> I'm currently doing some benchmarking on a Nehalem box(
> http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html)
> with 8.4 and while investigating what looks like issues in pgbench I also
> noticed that using char() has more than a negligable overhead on some (very
> special) readonly(!) workloads.
>
> for example running sysbench in read-only mode against 8.4 results in a
> profile(for the full run) that looks similiar to:
>
> samples % symbol name
> 981690 11.0656 bcTruelen
> 359183 4.0487 index_getnext
> 311128 3.5070 AllocSetAlloc
> 272330 3.0697 hash_search_with_hash_value
> 258157 2.9099 LWLockAcquire
> 195673 2.2056 _bt_compare
> 190303 2.1451 slot_deform_tuple
> 168101 1.8948 PostgresMain
> 164191 1.8508 _bt_checkkeys
> 126110 1.4215 FunctionCall2
> 123965 1.3973 SearchCatCache
> 120629 1.3597 LWLockRelease
>
> the default sysbench mode actually uses a number of different queries and
> the ones dealing with char() are actually only a small part of the full set
> of queries sent.
> The specific query is causing bcTruelen to show up in the profile is:
>
> "SELECT c from sbtest where id between $1 and $2 order by c" where the
> parameters are for example
> $1 = '5009559', $2 = '5009658' - ie ranges of 100.
>
>
> benchmarking only that query results in:
>
> samples % symbol name
> 2148182 23.5861 bcTruelen
> 369463 4.0565 index_getnext
> 362784 3.9832 AllocSetAlloc
> 284198 3.1204 slot_deform_tuple
> 185279 2.0343 _bt_checkkeys
> 180119 1.9776 LWLockAcquire
> 172733 1.8965 appendBinaryStringInfo
> 144158 1.5828 internal_putbytes
> 141040 1.5486 AllocSetFree
> 138093 1.5162 printtup
> 124255 1.3643 hash_search_with_hash_value
> 117054 1.2852 heap_form_minimal_tuple
>
> at around 46000 queries/s
>
> changing the fault sysbench schema from:
>
> Table "public.sbtest"
> Column | Type | Modifiers
>
> --------+----------------+-----------------------------------------------------
> id | integer | not null default
> nextval('sbtest_id_seq'::regclass)
> k | integer | not null default 0
> c | character(120) | not null default ''::bpchar
> pad | character(60) | not null default ''::bpchar
> Indexes:
> "sbtest_pkey" PRIMARY KEY, btree (id)
> "k" btree (k)
>
>
> to
> Table "public.sbtest"
> Column | Type | Modifiers
>
> --------+-------------------+-----------------------------------------------------
> id | integer | not null default
> nextval('sbtest_id_seq'::regclass)
> k | integer | not null default 0
> c | character varying | not null default ''::character varying
> pad | character(60) | not null default ''::bpchar
> Indexes:
> "sbtest_pkey" PRIMARY KEY, btree (id)
> "k" btree (k)
>
> results in a near 50%(!) speedup in terms of tps to around 67000 queries/s.
> This is however an extreme case because the c column actually contains no
> data at all (except for an empty string).
>
> the profile for the changed testcase looks like:
> 430797 5.2222 index_getnext
> 396750 4.8095 AllocSetAlloc
> 345508 4.1883 slot_deform_tuple
> 228222 2.7666 appendBinaryStringInfo
> 227766 2.7610 _bt_checkkeys
> 193818 2.3495 LWLockAcquire
> 179925 2.1811 internal_putbytes
> 168871 2.0471 printtup
> 152026 1.8429 AllocSetFree
> 146333 1.7739 heap_form_minimal_tuple
> 144305 1.7493 FunctionCall2
> 128320 1.5555 hash_search_with_hash_value
>
>
> at the very least we should reconsider this part of our docs:
>
> " There is no performance difference between these three types, apart from
> increased storage space when using the blank-padded type, and a few extra
> CPU cycles to check the length when storing into a length-constrained
> column."
>
> from http://www.postgresql.org/docs/8.4/static/datatype-character.html
>
>
>
> regards
>
> Stefan
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Lets call it Postgres

EnterpriseDB http://www.enterprisedb.com

gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com
Mail sent from my BlackLaptop device


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Subject: Re: char() overhead on read-only workloads not so insignifcant as the docs claim it is...
Date: 2009-06-15 23:58:16
Message-ID: 20090615235816.GO7285@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Sat, Jun 13, 2009 at 3:44 PM, Stefan Kaltenbrunner
> <stefan(at)kaltenbrunner(dot)cc> wrote:

> > The specific query is causing bcTruelen to show up in the profile is:
> >
> > "SELECT c from sbtest where id between $1 and $2 order by c" where the
> > parameters are for example
> > $1 = '5009559', $2 = '5009658' - ie ranges of 100.
> >
> >
> > benchmarking only that query results in:
> >
> > samples % symbol name
> > 2148182 23.5861 bcTruelen
> > 369463 4.0565 index_getnext
> > 362784 3.9832 AllocSetAlloc

Gurjeet Singh escribió:
> Comments?

Maybe bcTruelen could be optimized to step on one word at a time
(perhaps by using XOR against a precomputed word filled with ' '),
instead of one byte at a time ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Subject: Re: char() overhead on read-only workloads not so insignifcant as the docs claim it is...
Date: 2009-06-16 00:43:39
Message-ID: 200906160843.39560.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro,

> Maybe bcTruelen could be optimized to step on one word at a time
> (perhaps by using XOR against a precomputed word filled with ' '),
> instead of one byte at a time ...

I have a patch for this, will send soon.

Regards,

Jeremy


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 01:04:37
Message-ID: 1245114277.71574.860187444713.1.gpush@pingu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>

---
src/backend/utils/adt/varchar.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 5f3c658..6889dff 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -624,16 +624,34 @@ varchartypmodout(PG_FUNCTION_ARGS)
static int
bcTruelen(BpChar *arg)
{
+ const unsigned int spaces = 0x20202020;
+ const int wordsize = sizeof(spaces);
char *s = VARDATA_ANY(arg);
int i;
- int len;

- len = VARSIZE_ANY_EXHDR(arg);
- for (i = len - 1; i >= 0; i--)
+ i = VARSIZE_ANY_EXHDR(arg) - 1;
+
+ /* compare down to an aligned boundary */
+ for (; i >= 0 && i % wordsize != wordsize - 1; i--)
{
if (s[i] != ' ')
+ return i + 1;
+ }
+
+ /* now that we're aligned, compare word at a time */
+ for (; i >= wordsize - 1; i -= wordsize)
+ {
+ if (*(unsigned int *)(s + i - (wordsize - 1)) != spaces)
break;
}
+
+ /* check within the last non-matching word */
+ for (; i >= 0; i--)
+ {
+ if (s[i] != ' ')
+ break;
+ }
+
return i + 1;
}


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 01:28:48
Message-ID: BFF0DC24-9074-40F9-80B7-8C6976A1B776@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 15, 2009, at 9:04 PM, Jeremy Kerr <jk(at)ozlabs(dot)org> wrote:

> Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>
>
> ---
> src/backend/utils/adt/varchar.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/
> varchar.c
> index 5f3c658..6889dff 100644
> --- a/src/backend/utils/adt/varchar.c
> +++ b/src/backend/utils/adt/varchar.c
> @@ -624,16 +624,34 @@ varchartypmodout(PG_FUNCTION_ARGS)
> static int
> bcTruelen(BpChar *arg)
> {
> + const unsigned int spaces = 0x20202020;
> + const int wordsize = sizeof(spaces);

This looks very non-portable to me.

...Robert


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 01:51:16
Message-ID: 200906160951.17386.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert,

> This looks very non-portable to me.

Unsurprisingly, I'm new to postgres hacking and the large number of
supported platforms :)

I was considering something like:

unsigned int spaces;
const unsigned int wordsize = sizeof(unsigned int);

memset(&spaces, ' ', wordsize);

In most cases, the compiler should be able to optimise the memset out,
but it may introduce overhead where this is not possible.

However, are there any supported platforms where sizeof(unsigned int) !=
4 ?

Cheers,

Jeremy


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 02:43:44
Message-ID: 20090616024344.GE20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy,

* Jeremy Kerr (jk(at)ozlabs(dot)org) wrote:
> Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>
>
> ---
> src/backend/utils/adt/varchar.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)

Thanks for the contribution. A couple of comments:

The documentation for submitting a patch to PostgreSQL is here:
http://wiki.postgresql.org/wiki/Submitting_a_Patch

There is also a Developer FAQ available here:
http://wiki.postgresql.org/wiki/Developer_FAQ

The PostgreSQL core folks prefer context diffs (it's not my preference,
but I'm not part of core, nor am I a committer :).

There are a number of things requested to be included with a patch, but
in particular I would point out:

----
Which CVS branch the patch is against (ordinarily this will be HEAD).

Whether it compiles and tests successfully, so we know nothing obvious
is broken.

Whether it contains any platform-specific items and if so, has it been
tested on other platforms.

Describe the effect your patch has on performance, if any. If the patch
is intended to improve performance, it's a good idea to include some
reproducible tests to demonstrate the improvement.
----

You might check out sections 3 & 6 of src/include/c.h. Section 3
defines standard system types, while section 6 defines some widely
useful macros; in particular our custom MemSet and MemSetAligned, which
work on aligned memory structures for improved performance.

Thanks,

Stephen


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 04:24:46
Message-ID: 603c8f070906152124u2b14fc10gec2f3a56a31710d4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 15, 2009 at 9:51 PM, Jeremy Kerr<jk(at)ozlabs(dot)org> wrote:
> I was considering something like:
>
>        unsigned int spaces;
>        const unsigned int wordsize = sizeof(unsigned int);
>
>        memset(&spaces, ' ', wordsize);
>
> In most cases, the compiler should be able to optimise the memset out,
> but it may introduce overhead where this is not possible.

What about just:

static char spaces[4] = { ' ', ' ', ' ', ' ' };

and then ... * (uint32 *) spaces?

There's not much point taking the length of the word when you've
initialized it to contain exactly 4 bytes. What you want to do is
pick the flavor of integer that will be the same length as what you've
initialized, and it turns out we already have that (see
src/include/c.h).

As I look at this, another problem is that it seems to me that you're
assuming that VARDATA_ANY() will return an aligned pointer, which
isn't necessarily the case (see src/include/postgres.h).

The advice in Stephen's email is also very good - in particular,
whatever you come up with, you should submit performance results.
Note that while --enable-profiling is very useful and profiling
numbers are good to submit, you'll also want to make sure you do a
build that is optimized for speed (i.e. no profiling, no casserts, no
debug) and do timings on that.

...Robert


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 06:12:18
Message-ID: 4A3737C2.8090401@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> The advice in Stephen's email is also very good - in particular,
> whatever you come up with, you should submit performance results.
> Note that while --enable-profiling is very useful and profiling
> numbers are good to submit, you'll also want to make sure you do a
> build that is optimized for speed (i.e. no profiling, no casserts, no
> debug) and do timings on that.

I'm particularly interested to see how much the patch hurts performance
in the cases where it's not helping.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 10:30:50
Message-ID: 20090616103050.GF20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> As I look at this, another problem is that it seems to me that you're
> assuming that VARDATA_ANY() will return an aligned pointer, which
> isn't necessarily the case (see src/include/postgres.h).

I believe you need to look at it more carefully. I don't think it's
making any such assumption. Specifically, it has three loops; an "until
we're aligned" loop, then a "while we're aligned", and a "when we've
done all the aligned we could do".

On the flip side, I am curious as to if the arguments to a stored
procedure are always aligned or not. Never had a case to care before,
but if palloc() is always going to return an aligned chunk of memory
(per MemSetAligned in c.h) it makes me wonder.

Thanks,

Stephen


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 12:03:24
Message-ID: 603c8f070906160503n42df8bafo26317e93ee5b5751@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 16, 2009 at 6:30 AM, Stephen Frost<sfrost(at)snowman(dot)net> wrote:
> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> As I look at this, another problem is that it seems to me that you're
>> assuming that VARDATA_ANY() will return an aligned pointer, which
>> isn't necessarily the case (see src/include/postgres.h).
>
> I believe you need to look at it more carefully.  I don't think it's
> making any such assumption.  Specifically, it has three loops; an "until
> we're aligned" loop, then a "while we're aligned", and a "when we've
> done all the aligned we could do".

I see that... but I don't think the test in the first loop is correct.
It's based on the value of i % 4, but I'm not convinced that you know
anything about the alignment at the point where i == 0.

I might be all wet here, I haven't looked at this area of the code in detail.

> On the flip side, I am curious as to if the arguments to a stored
> procedure are always aligned or not.  Never had a case to care before,
> but if palloc() is always going to return an aligned chunk of memory
> (per MemSetAligned in c.h) it makes me wonder.

Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
varlena header...

...Robert


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 12:38:17
Message-ID: 407d949e0906160538i69444dafh45d16fec9987423e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
> I see that... but I don't think the test in the first loop is correct.
>  It's based on the value of i % 4, but I'm not convinced that you know
> anything about the alignment at the point where i == 0.

That's correct. To check the alignment you would have to look at the
actual pointer. I would suggest using the existing macros to handle
alignment. Hm, though the only one I see offhand which is relevant is
the moderately silly PointerIsAligned(). Still it would make the code
clearer even if it's pretty simple.

Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
think, bogus. There would be no alignment guarantee on that array.
Personally I'm find with 0x20202020 with a comment explaining what it
is.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 12:41:52
Message-ID: 20090616124152.GG20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> I see that... but I don't think the test in the first loop is correct.
> It's based on the value of i % 4, but I'm not convinced that you know
> anything about the alignment at the point where i == 0.

Ah, you may be half right there (see below). It does appear to be
assuming that char *s (or s[i == 0]) is aligned, which isn't a
guarentee (in fact, it might never be right..). If having it actually
aligned is an important bit (as opposed to just doing the comparisons in
larger but possibly unaligned blocks) then that'd make a difference.

If the code as-is showed performance improvment even when it's working
on less-than-aligned blocks, I'd be curious what would happen if it was
actually aligned. Unfortunately, the results of such would probably be
heavily architecture dependent..

> > On the flip side, I am curious as to if the arguments to a stored
> > procedure are always aligned or not.  Never had a case to care before,
> > but if palloc() is always going to return an aligned chunk of memory
> > (per MemSetAligned in c.h) it makes me wonder.
>
> Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
> varlena header...

Right, but I'm talking about the base of the argument itself, not
the start of the data. If every variable length argument to a stored
procedure is palloc()'d independently, and palloc()'s always return
aligned memory, we'd at least know that the base of the argument is
aligned and could figure out the header size and then do the
comparisons accordingly. This would also mean, of course, that we'd
almost(?) never have s[i == 0] on an aligned boundary due to the
header.

Thanks,

Stephen


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 12:42:50
Message-ID: 407d949e0906160542r76d49529rf7a75f6e0e0dedc1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>
>> On the flip side, I am curious as to if the arguments to a stored
>> procedure are always aligned or not.  Never had a case to care before,
>> but if palloc() is always going to return an aligned chunk of memory
>> (per MemSetAligned in c.h) it makes me wonder.
>
> Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
> varlena header...

There are two points here that kind of cancel each other out :)

If the data is in fact returned from a palloc because it was the
result of some other function call then it will almost certainly have
a 4-byte header and that'll be aligned. There are some exceptions
where functions are just returning copies and copy the whole datum
though, but the point is we normally don't toast or pack varlenas
unless they're being stored on disk.

However that's all irrelevant because there's no guarantee the data
being passed will have been palloced at all. You could get a pointer
to data in a shared buffer. Ie, data on disk. That will be aligned
based on how tuples are packed on disk which is precisely where we go
out of our way to avoid wasting space on alignment.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 12:46:30
Message-ID: 20090616124630.GH20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Greg Stark (gsstark(at)mit(dot)edu) wrote:
> There are two points here that kind of cancel each other out :)

Thanks for the insight. :)

Stephen


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 12:47:08
Message-ID: 407d949e0906160547w28578245m8e937ce791ad2b83@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 16, 2009 at 1:41 PM, Stephen Frost<sfrost(at)snowman(dot)net> wrote:
>
> Ah, you may be half right there (see below).  It does appear to be
> assuming that char *s (or s[i == 0]) is aligned, which isn't a
> guarentee (in fact, it might never be right..).  If having it actually
> aligned is an important bit (as opposed to just doing the comparisons in
> larger but possibly unaligned blocks) then that'd make a difference.

On some architectures like intel accessing unaligned ints is just
slow. On others (Alpha and PPC iirc?) it is an immediate bus error.

I would actually be more curious whether we can do th e comparison
without having to pre-scan for the spaces at the end than trying to
opimize that prescan.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 13:04:39
Message-ID: 603c8f070906160604u3b9d6102y4b952c9e47926fef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 16, 2009 at 8:38 AM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
> On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> I see that... but I don't think the test in the first loop is correct.
>>  It's based on the value of i % 4, but I'm not convinced that you know
>> anything about the alignment at the point where i == 0.
>
> That's correct. To check the alignment you would have to look at the
> actual pointer. I would suggest using the existing macros to handle
> alignment. Hm, though the only one I see offhand which is relevant is
> the moderately silly PointerIsAligned(). Still it would make the code
> clearer even if it's pretty simple.
>
> Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
> think, bogus. There would be no alignment guarantee on that array.
> Personally I'm find with 0x20202020 with a comment explaining what it
> is.

Ooh, good point. I still don't like the 0x20 thing, but using uint32
instead of int or long is the main point, unless we support any
platforms where 0x20 != ' '.

...Robert


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 13:09:46
Message-ID: 4A37999A.5020805@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> Ooh, good point. I still don't like the 0x20 thing, but using uint32
> instead of int or long is the main point, unless we support any
> platforms where 0x20 != ' '.
>
>
>

All our server encodings are strictly ASCII supersets. So 0x20 is always
the space character.

cheers

andrew


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 13:23:41
Message-ID: 200906162123.42513.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

> That's correct. To check the alignment you would have to look at the
> actual pointer. I would suggest using the existing macros to handle
> alignment. Hm, though the only one I see offhand which is relevant is
> the moderately silly PointerIsAligned(). Still it would make the code
> clearer even if it's pretty simple.

Yes, the code (incorrectly) assumes that any multiple-of-4 index into
the char array is aligned, and so the array itself must be aligned for
this to work.

I'll rework the patch, testing the pointer alignment directly instead.

> Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
> think, bogus. There would be no alignment guarantee on that array.
> Personally I'm find with 0x20202020 with a comment explaining what it
> is.

The variable is called 'spaces', but I can add extra comments if
preferred.

Cheers,

Jeremy


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 14:23:56
Message-ID: 13985.1245162236@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On some architectures like intel accessing unaligned ints is just
> slow. On others (Alpha and PPC iirc?) it is an immediate bus error.

To a first approximation, Intel is the *only* popular architecture that
doesn't bus-error on unaligned accesses. (And I'm sure their chip
designers rue the day that their predecessors chose to allow that.)

There are some systems where the kernel trap handler then proceeds to
emulate the unaligned access for you, but that gives new meaning to the
word "slow". You definitely don't want to be doing it in a patch that's
alleged to give a performance improvement.

Speaking of which, what about some performance numbers? Like Heikki,
I'm quite suspicious of whether there is any real-world gain to be had
from this approach.

regards, tom lane


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 14:53:35
Message-ID: 200906162253.36182.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tom,

> Speaking of which, what about some performance numbers? Like Heikki,
> I'm quite suspicious of whether there is any real-world gain to be
> had from this approach.

Will send numbers tomorrow, with the reworked patch.

Cheers,

Jeremy


From: <genie(dot)japo(at)gmail(dot)com>
To: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Uninstallation error
Date: 2009-06-16 15:16:58
Message-ID: 4a37b770.25bc720a.2eef.ffffb044@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I've found the uninstallation error...

# make uninstall

:

n/man7/truncate.7 /usr/local/pgsql/share/man/man7/unlisten.7
/usr/local/pgsql/share/man/man7/update.7
/usr/local/pgsql/share/man/man7/vacuum.7
/usr/local/pgsql/share/man/man7/values.7
rm: cannot remove `/usr/local/pgsql/share/man/man1/': Is a directory
rm: cannot remove `/usr/local/pgsql/share/man/man7/': Is a directory
make[1]: *** [uninstall] Error 1
make[1]: Leaving directory `/usr/local/src/postgresql-8.4rc1/doc'
make: *** [uninstall] Error 2

Maybe, it is solved by the change in the following. (adding -r option)

doc/Makefile
100c100
< rm -f $(addprefix $(DESTDIR)$(mandir)/, $(shell gunzip -c
$(srcdir)/man.tar.gz | tar tf - | sed -e 's,man7/,man$(sqlmansectnum)/,' -e
's/.7$$/.$(sqlmansect)/'))
---
> rm -rf $(addprefix $(DESTDIR)$(mandir)/, $(shell gunzip -c
$(srcdir)/man.tar.gz | tar tf - | sed -e 's,man7/,man$(sqlmansectnum)/,' -e
's/.7$$/.$(sqlmansect)/'))

Regards,
Genie Japo


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 15:49:32
Message-ID: 4A37BF0C.2070802@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Kerr wrote:
> Hi Tom,
>
>> Speaking of which, what about some performance numbers? Like Heikki,
>> I'm quite suspicious of whether there is any real-world gain to be
>> had from this approach.
>
> Will send numbers tomorrow, with the reworked patch.

I can easily redo my testing as well if required.

Stefan


From: Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-16 17:57:48
Message-ID: 2106D8DC89010842BABA5CD03FEA4061CA7CFA16@EXVMBX018-10.exch018.msoutlookonline.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Stephen Frost
> Sent: Tuesday, June 16, 2009 5:47 AM
> To: Greg Stark
> Cc: Robert Haas; Jeremy Kerr; <pgsql-hackers(at)postgresql(dot)org>; Alvaro
> Herrera; Stefan Kaltenbrunner; Gurjeet Singh
> Subject: Re: [HACKERS] [PATCH] backend: compare word-at-a-time in
> bcTruelen
>

On 64-bit machines, the native word size is 64-bits (obviously), and comparing 32 bits at a time is much slower than comparing 64 bits at a time.

You might want to consider this.


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-17 09:31:06
Message-ID: 200906171731.07149.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

> Speaking of which, what about some performance numbers?

OK, benchmarks done:

http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/

Summary: small increase in performance (~1-2% on my machine), at about
1.5 standard deviations from the mean. Profiles show a decent drop in
hits within bcTruelen.

However: Sysbench seems to be quite heavy with the fixed-width char
types, so may end up calling bcTruelen more than most workloads. Would
be nice to get some x86 numbers too, but I don't have a suitable machine
here.

So: The increase in performance is positive on this workload, albeit
fairly minor. Downside is increased code complexity.

Will re-send the patch once I work out how to get git to create a
context diff...

Cheers,

Jeremy


From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-17 09:42:12
Message-ID: 65937bea0906170242y75de9f44vdb56c50463690674@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 17, 2009 at 3:01 PM, Jeremy Kerr <jk(at)ozlabs(dot)org> wrote:

> Will re-send the patch once I work out how to get git to create a
> context diff...
>

http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git

--
Lets call it Postgres

EnterpriseDB http://www.enterprisedb.com

gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com
Mail sent from my BlackLaptop device


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-17 09:51:29
Message-ID: 4A38BCA1.1020701@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Kerr wrote:
>> Speaking of which, what about some performance numbers?
>
> OK, benchmarks done:
>
> http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
>
> Summary: small increase in performance (~1-2% on my machine), at about
> 1.5 standard deviations from the mean. Profiles show a decent drop in
> hits within bcTruelen.
>
> However: Sysbench seems to be quite heavy with the fixed-width char
> types, so may end up calling bcTruelen more than most workloads. Would
> be nice to get some x86 numbers too, but I don't have a suitable machine
> here.
>
> So: The increase in performance is positive on this workload, albeit
> fairly minor. Downside is increased code complexity.

Based on this benchmark, I don't think this patch is worth it..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-17 10:00:25
Message-ID: 65937bea0906170300w5c5e84d0g8e44b40c0e4b1ab7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 17, 2009 at 3:26 PM, Jeremy Kerr <jk(at)ozlabs(dot)org> wrote:

> Hurjeet,
>
> > http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_G
> >it
>
> Awesome, thanks.
>
> I'm going to wait for a decision before reformatting and sending the
> diff.
>

You might want to submit the patch (if modified from previous submission,
which I think it is), if you want others to benchmark on other archs.

Best regards,
--
Lets call it Postgres

EnterpriseDB http://www.enterprisedb.com

gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com
Mail sent from my BlackLaptop device


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-17 10:55:31
Message-ID: 4A38CBA3.6000504@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Jeremy Kerr wrote:
>>> Speaking of which, what about some performance numbers?
>>
>> OK, benchmarks done:
>>
>> http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
>>
>> Summary: small increase in performance (~1-2% on my machine), at about
>> 1.5 standard deviations from the mean. Profiles show a decent drop in
>> hits within .
>>
>> However: Sysbench seems to be quite heavy with the fixed-width char
>> types, so may end up calling bcTruelen more than most workloads. Would
>> be nice to get some x86 numbers too, but I don't have a suitable
>> machine here.
>>
>> So: The increase in performance is positive on this workload, albeit
>> fairly minor. Downside is increased code complexity.
>
> Based on this benchmark, I don't think this patch is worth it..

what exactly is getting graphed here - is the what sysbench reports as
transactions/s or the operations/queries per second? if yes it is
important to notice that sysnbench by default executes multiple queries
per transaction and and several different kind of queries. Only some of
those queries have bcTruelen() showing up on top of the profile.

Stefan


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-17 12:41:56
Message-ID: 4A38E494.1020303@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Kerr wrote:
> Hi all,
>
>> Speaking of which, what about some performance numbers?
>
> OK, benchmarks done:
>
> http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
>
> Summary: small increase in performance (~1-2% on my machine), at about
> 1.5 standard deviations from the mean. Profiles show a decent drop in
> hits within bcTruelen.
>
> However: Sysbench seems to be quite heavy with the fixed-width char
> types, so may end up calling bcTruelen more than most workloads. Would
> be nice to get some x86 numbers too, but I don't have a suitable machine
> here.

This improves my numbers from 46000 queries/s to 52000 queries/s for:

--num-threads=16 --test=oltp --db-ps-mode=auto --pgsql-user=postgres
--test=oltp --oltp-read-only=on --oltp-sum-ranges=0
--oltp-simple-ranges=0 --oltp-order-ranges=10 --oltp-point-selects=0
--oltp-distinct-ranges=0 --oltp-skip-trx=off run

with a profile 16 connections like:

samples % symbol name
1925976 23.7136 bcTruelen
329453 4.0564 AllocSetAlloc
292176 3.5974 slot_deform_tuple
278996 3.4351 index_getnext
165397 2.0365 _bt_checkkeys
155621 1.9161 appendBinaryStringInfo
141295 1.7397 LWLockAcquire
136197 1.6769 internal_putbytes
130474 1.6065 AllocSetFree
121341 1.4940 printtup
116413 1.4333 hash_search_with_hash_value
109573 1.3491 FunctionCall2
95101 1.1709 heap_form_minimal_tuple
91475 1.1263 enlargeStringInfo
90841 1.1185 heap_fill_tuple
89926 1.1072 ExecProcNode
86112 1.0603 varstr_cmp

and after the patch applied:

samples % symbol name
2225769 7.8378 bcTruelen
1335050 4.7013 index_getnext
1296272 4.5647 AllocSetAlloc
1132026 3.9863 slot_deform_tuple
702219 2.4728 _bt_checkkeys
640675 2.2561 appendBinaryStringInfo
611783 2.1543 LWLockAcquire
599402 2.1107 AllocSetFree
590557 2.0796 internal_putbytes
525526 1.8506 printtup
476077 1.6765 hash_search_with_hash_value
459660 1.6186 FunctionCall2
423569 1.4916 ExecProcNode
391476 1.3785 heap_form_minimal_tuple
385689 1.3582 heap_fill_tuple
383231 1.3495 enlargeStringInfo
368390 1.2973 comparetup_heap
363481 1.2800 varstr_cmp
360261 1.2686 ExecProject
356696 1.2561 pq_putmessage
354193 1.2473 MemoryContextAlloc
341262 1.2017 LWLockRelease
337970 1.1901 pq_begintypsend
321069 1.1306 pfree
310806 1.0945 tuplesort_gettuple_common

Stefan


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 07:33:08
Message-ID: 1245310388.3895.163.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2009-06-16 at 00:24 -0400, Robert Haas wrote:

> There's not much point taking the length of the word

Not sure why we need to be calculating the length here anyway. ISTM that
there is no need to reconfirm the length of the data, since it is
already checked to be that length at insert.

Why is bcTruelen being called so many *more* times?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 11:12:55
Message-ID: 1245323575.3895.194.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2009-06-16 at 10:23 -0400, Tom Lane wrote:

> Speaking of which, what about some performance numbers? Like Heikki,
> I'm quite suspicious of whether there is any real-world gain to be had
> from this approach.

It has been "lore" for some time that VARCHAR is cheaper than
VARCHAR(n), so I'm looking forward to this improvement as a real-world
gain. (If done right).

I've looked at the code and the thing that bothers me is that I can't
see where or why bcTruelen would be called more often for VARCHAR(n)
than it would be for VARCHAR, on a Select/Sort only workload.

Are we tuning the right thing? Is there some code we can completely
avoid?

If not, does this mean it is a generic effect? Does this imply that
NUMERIC(n) is somehow worse than NUMERIC? etc..

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 12:14:41
Message-ID: e51f66da0906180514i1de3b6cdvc04e5134d416c200@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/18/09, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, 2009-06-16 at 10:23 -0400, Tom Lane wrote:
> > Speaking of which, what about some performance numbers? Like Heikki,
> > I'm quite suspicious of whether there is any real-world gain to be had
> > from this approach.
>
>
> It has been "lore" for some time that VARCHAR is cheaper than
> VARCHAR(n), so I'm looking forward to this improvement as a real-world
> gain. (If done right).
>
> I've looked at the code and the thing that bothers me is that I can't
> see where or why bcTruelen would be called more often for VARCHAR(n)
> than it would be for VARCHAR, on a Select/Sort only workload.

I'd guess plain VARCHAR simply does not have blanks at the end,
so Truelen is cheap.

> Are we tuning the right thing? Is there some code we can completely
> avoid?
>
> If not, does this mean it is a generic effect? Does this imply that
> NUMERIC(n) is somehow worse than NUMERIC? etc..

Probably not. For numeric the (n) seems to be only checked input time.

--
marko


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 13:08:05
Message-ID: 1245330485.3895.207.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2009-06-18 at 15:14 +0300, Marko Kreen wrote:
> On 6/18/09, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Tue, 2009-06-16 at 10:23 -0400, Tom Lane wrote:
> > > Speaking of which, what about some performance numbers? Like Heikki,
> > > I'm quite suspicious of whether there is any real-world gain to be had
> > > from this approach.
> >
> >
> > It has been "lore" for some time that VARCHAR is cheaper than
> > VARCHAR(n), so I'm looking forward to this improvement as a real-world
> > gain. (If done right).
> >
> > I've looked at the code and the thing that bothers me is that I can't
> > see where or why bcTruelen would be called more often for VARCHAR(n)
> > than it would be for VARCHAR, on a Select/Sort only workload.
>
> I'd guess plain VARCHAR simply does not have blanks at the end,
> so Truelen is cheap.

If we were comparing CHAR(n) with VARCHAR then I could agree. But we are
comparing VARCHAR(n) and VARCHAR. There is no blank padding with
VARCHAR, and the two have identical on-disk representations so the cost
of bcTruelen looks like it should be identical in each case.

Is bcTruelen being called too many times, or is there, as Marko
suggests, an explanation as to why calling bcTruelen costs more when we
have a typmod set?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 13:09:58
Message-ID: 4A3A3CA6.2080106@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2009-06-18 at 15:14 +0300, Marko Kreen wrote:
>> On 6/18/09, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> On Tue, 2009-06-16 at 10:23 -0400, Tom Lane wrote:
>>> > Speaking of which, what about some performance numbers? Like Heikki,
>>> > I'm quite suspicious of whether there is any real-world gain to be had
>>> > from this approach.
>>>
>>>
>>> It has been "lore" for some time that VARCHAR is cheaper than
>>> VARCHAR(n), so I'm looking forward to this improvement as a real-world
>>> gain. (If done right).
>>>
>>> I've looked at the code and the thing that bothers me is that I can't
>>> see where or why bcTruelen would be called more often for VARCHAR(n)
>>> than it would be for VARCHAR, on a Select/Sort only workload.
>> I'd guess plain VARCHAR simply does not have blanks at the end,
>> so Truelen is cheap.
>
> If we were comparing CHAR(n) with VARCHAR then I could agree. But we are
> comparing VARCHAR(n) and VARCHAR. There is no blank padding with
> VARCHAR, and the two have identical on-disk representations so the cost
> of bcTruelen looks like it should be identical in each case.

the testcase discusses here is indeed CHAR(n) vs. VARCHAR. To reiterate
- my numbers(8core/16 thread Nehalem xeon with 16 processes) are
46000qps for CHAR(n), 52000 qps for CHAR(n) with Jeremys patch(thout
bcTrueLen is still on top in the profile) and 67000 qps for VARCHAR.

Stefan


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 13:21:39
Message-ID: 1245331299.3895.210.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2009-06-18 at 15:09 +0200, Stefan Kaltenbrunner wrote:

> the testcase discusses here is indeed CHAR(n) vs. VARCHAR.

OK, thanks for pointing out my error.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "\<pgsql-hackers\(at)postgresql(dot)org\>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 13:58:09
Message-ID: 82prd1k5ji.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Simon Riggs:

> On Thu, 2009-06-18 at 15:09 +0200, Stefan Kaltenbrunner wrote:
>
>> the testcase discusses here is indeed CHAR(n) vs. VARCHAR.
>
> OK, thanks for pointing out my error.

But I think your point still makes sense. Is it really necessary to
determine the unpadded length for a query such as this one?

"SELECT c from sbtest where id between $1 and $2 order by c"

(See the start of the thread.)

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 13:59:57
Message-ID: 6829.1245333597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Why is bcTruelen being called so many *more* times?

I think you have misunderstood the context. The char(n) code is defined
to store trailing blanks (up to n) but to disregard the trailing blanks
during comparisons. bcTrueLen is invoked during comparisons (not during
storage) to figure out what the "valid" string length is for comparing.

varchar considers any trailing blanks to be real, comparable data,
so it simply hasn't got any equivalent code.

It would be way nicer if we could strip trailing blanks on storage,
and then figure a way to either reconstitute them on output (problem
here is the output function doesn't have access to typmod) or
language-lawyer our way to deciding we don't have to.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, "Jeremy Kerr" <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 14:07:10
Message-ID: 4A3A03BE0200002500027D07@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> It would be way nicer if we could strip trailing blanks on storage,
> and then figure a way to either reconstitute them on output

How about pushing it even farther back -- always keep them with
trimmed trailing spaces and add trailing spaces as required in
operator functions?

-Kevin


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 14:44:52
Message-ID: 1245336292.3895.221.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2009-06-18 at 09:59 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Why is bcTruelen being called so many *more* times?
>
> I think you have misunderstood the context.

err, no, I just misread the original text. Possibly a worse error :-?

> It would be way nicer if we could strip trailing blanks on storage,
> and then figure a way to either reconstitute them on output (problem
> here is the output function doesn't have access to typmod) or
> language-lawyer our way to deciding we don't have to.

Is there a spare bit on the varhdr that can be set by datatypes? It
would be useful to have a bit meaning "there is a typmod stored on this
Datum", that would allow the output function to do some special
processing.

I notice we lose on tuple access also. CHAR(n) is fixed length, but is
treated as variable length for offsets.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 14:58:54
Message-ID: 20090618145854.GB5102@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs escribió:

> I notice we lose on tuple access also. CHAR(n) is fixed length, but is
> treated as variable length for offsets.

Fixed character length != fixed byte length

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: genie(dot)japo(at)gmail(dot)com
Subject: Re: Uninstallation error
Date: 2009-06-18 15:11:50
Message-ID: 200906181811.51014.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 16 June 2009 18:16:58 genie(dot)japo(at)gmail(dot)com wrote:
> Hi,
>
> I've found the uninstallation error...
>
>
> # make uninstall
>
>
>
> n/man7/truncate.7 /usr/local/pgsql/share/man/man7/unlisten.7
> /usr/local/pgsql/share/man/man7/update.7
> /usr/local/pgsql/share/man/man7/vacuum.7
> /usr/local/pgsql/share/man/man7/values.7
> rm: cannot remove `/usr/local/pgsql/share/man/man1/': Is a directory
> rm: cannot remove `/usr/local/pgsql/share/man/man7/': Is a directory
> make[1]: *** [uninstall] Error 1
> make[1]: Leaving directory `/usr/local/src/postgresql-8.4rc1/doc'
> make: *** [uninstall] Error 2
>
>
> Maybe, it is solved by the change in the following. (adding -r option)
>
> doc/Makefile
> 100c100
> < rm -f $(addprefix $(DESTDIR)$(mandir)/, $(shell gunzip -c
> $(srcdir)/man.tar.gz | tar tf - | sed -e 's,man7/,man$(sqlmansectnum)/,' -e
> 's/.7$$/.$(sqlmansect)/'))
> ---
>
> > rm -rf $(addprefix $(DESTDIR)$(mandir)/, $(shell gunzip -c
>
> $(srcdir)/man.tar.gz | tar tf - | sed -e 's,man7/,man$(sqlmansectnum)/,' -e
> 's/.7$$/.$(sqlmansect)/'))

I have fixed this problem.

Next time, please start a new thread for a new issue, so your issue doesn't
get lost.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, genie(dot)japo(at)gmail(dot)com
Subject: Re: Uninstallation error
Date: 2009-06-18 15:26:53
Message-ID: 20090618152653.GE5102@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut escribió:

> Next time, please start a new thread for a new issue, so your issue doesn't
> get lost.

Gmail has gotten the idiotic idea that changing the subject makes the
message appear as a new thread :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: <genie(dot)japo(at)gmail(dot)com>
To: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Uninstallation error
Date: 2009-06-18 16:54:29
Message-ID: 4a3a7147.23bb720a.762e.ffff8110@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I have fixed this problem.

Thanks a lot for the fix !

>> Next time, please start a new thread for a new issue, so your issue
doesn't
>> get lost.
> Gmail has gotten the idiotic idea that changing the subject makes the
> message appear as a new thread :-(

Sorry.
This is the first posting in hackers-ML for me.
I don't know why this issue occurs.
Gmail is a little funny :~(~~~

Regards,
Genie Japo


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, "Jeremy Kerr" <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 16:58:39
Message-ID: 10944.1245344319@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It would be way nicer if we could strip trailing blanks on storage,
>> and then figure a way to either reconstitute them on output

> How about pushing it even farther back -- always keep them with
> trimmed trailing spaces and add trailing spaces as required in
> operator functions?

I think that's what I said. AFAIK there isn't any place where we'd
need to add back the spaces except the output function. All the
operators would be just as happy if the spaces weren't there.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, "Jeremy Kerr" <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 17:15:52
Message-ID: 4A3A2FF80200002500027D8B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> It would be way nicer if we could strip trailing blanks on
>>> storage, and then figure a way to either reconstitute them on
>>> output
>
>> How about pushing it even farther back -- always keep them with
>> trimmed trailing spaces and add trailing spaces as required in
>> operator functions?
>
> I think that's what I said. AFAIK there isn't any place where we'd
> need to add back the spaces except the output function. All the
> operators would be just as happy if the spaces weren't there.

Sorry, I misread it.

If this can be done, it might ease the transition from other products
for some people. In Sybase there were significant performance
benefits to using char() instead of varchar() in certain
circumstances. In our first test conversion from Sybase we mapped all
domains and columns which were char(n) in Sybase to the same in
PostgreSQL, which caused some pain. It was no big deal to remap those
to varchar() and re-run, but on the face of it, it seems like there
wouldn't be a significant performance hit for char() with this change,
so it would be one less thing for people to stub their toes on during
migration.

-Kevin


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 21:03:45
Message-ID: 1245359025.3895.307.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2009-06-18 at 12:58 -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> It would be way nicer if we could strip trailing blanks on storage,
> >> and then figure a way to either reconstitute them on output
>
> > How about pushing it even farther back -- always keep them with
> > trimmed trailing spaces and add trailing spaces as required in
> > operator functions?
>
> I think that's what I said. AFAIK there isn't any place where we'd
> need to add back the spaces except the output function. All the
> operators would be just as happy if the spaces weren't there.

The overall problem is that we expect the Datum's of a datatype to know
how to display themselves without any access to metadata.

Another way of looking at this might be that we need a default FORMAT
specifier associated with a column. Teradata used the FORMAT specifier
on a column to allow you to specify a default format. That allowed you
to specify leading/trailing zeros/spaces, decimal points and other
characters.

It would be good to have a generic way to specify formatting to a data
type at the point you are manipulating it, rather than globally.

That would be helpful because we currently have some very ugly things
like the DateStyle parameter. That effects the Input and Output of data
into Time/Date columns, forcing you to switch parameter values
constantly as you load/select data. The current way makes it impossible
to load data that has different date formats in different columns, for
example.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 22:54:47
Message-ID: 407d949e0906181554o952f7bs9d84532d26d7a1d4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 18, 2009 at 10:03 PM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>
> The overall problem is that we expect the Datum's of a datatype to know
> how to display themselves without any access to metadata.

Yes

> Another way of looking at this might be that we need a default FORMAT
> specifier associated with a column. Teradata used the FORMAT specifier
> on a column to allow you to specify a default format. That allowed you
> to specify leading/trailing zeros/spaces, decimal points and other
> characters.

I've suggested this in the past in the context of getting rid of the
"cash" data type.

However I'm not sure it solves this problem. In the case where we're
just outputing a column we could arrange to have the typmod pretty
easily. but what would you do if you had some complex expression which
happened to result in a char(). Say something like:

coalesce(nickname, name)

where both nickname and name are, say, char(50). Or for that matter
where they're two different length chars, like char(50) and char(100).

<brainstorming>We could add an integer prefix to the char() datatype
with the "total" length and then just not include the spaces. But that
would make it not binary compatible with text -- which would mean
implementing a whole bunch of casts and operators. Perhaps. I think we
already have rtrim as the cast in one direction and I think we already
have operators which is the whole trigger for the bcTrueLen thing
which started this thread. maybe it wouldn't really be that much of a
pain.</brainstorming>

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Jeremy Kerr <jk(at)ozlabs(dot)org>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-18 23:09:47
Message-ID: 19859.1245366587@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> <brainstorming>We could add an integer prefix to the char() datatype
> with the "total" length and then just not include the spaces. But that
> would make it not binary compatible with text -- which would mean
> implementing a whole bunch of casts and operators.

Um, it's already not binary compatible with text. The space hit would
be a bigger issue ... at least for cases where you don't save much of
anything by eliminating the spaces, which would be exactly the cases
where sane Postgres users use char(n) today.

regards, tom lane


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 01:05:03
Message-ID: 1245805503.62025.32374376046.1.gpush@pingu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The current bcTruelen function uses a simple reverse array scan to
find the legth of a space-padded string. On some workloads (it shows
in sysbench), this can result in a lot of time spent in this function.

This change introduces a word-at-a-time comparison in bcTruelen, aiming
to reduce the number of compares where possible. Word-size compares
are performed on aligned sections of the string.

Results in a small performance increase; around 1-2% on my POWER6 test
box.

Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>

---
Resend: context diff this time

---
src/backend/utils/adt/varchar.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
***************
*** 624,639 **** varchartypmodout(PG_FUNCTION_ARGS)
static int
bcTruelen(BpChar *arg)
{
char *s = VARDATA_ANY(arg);
int i;
- int len;

! len = VARSIZE_ANY_EXHDR(arg);
! for (i = len - 1; i >= 0; i--)
{
if (s[i] != ' ')
break;
}
return i + 1;
}

--- 624,657 ----
static int
bcTruelen(BpChar *arg)
{
+ const uint32_t spaces = 0x20202020;
+ const int wordsize = sizeof(spaces);
char *s = VARDATA_ANY(arg);
int i;

! i = VARSIZE_ANY_EXHDR(arg) - 1;
!
! /* compare down to an aligned boundary */
! for (; i > 0 && !PointerIsAligned(s + i - (wordsize - 1), uint32_t); i--)
{
if (s[i] != ' ')
+ return i + 1;
+ }
+
+ /* now that we're aligned, compare word at a time */
+ for (; i >= wordsize - 1; i -= wordsize)
+ {
+ if (*(uint32_t *)(s + i - (wordsize - 1)) != spaces)
break;
}
+
+ /* check within the last non-matching word */
+ for (; i >= 0; i--)
+ {
+ if (s[i] != ' ')
+ break;
+ }
+
return i + 1;
}


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 02:25:20
Message-ID: 603c8f070906231925r649d1aceq6235eb6094365afb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 23, 2009 at 9:05 PM, Jeremy Kerr<jk(at)ozlabs(dot)org> wrote:
> Results in a small performance increase; around 1-2% on my POWER6 test
> box.

I'd still like to know the workload and exact numbers.

...Robert


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 02:46:45
Message-ID: 200906241046.45653.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert,

> I'd still like to know the workload and exact numbers.

From up-thread:

http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/

- or were there other details you were looking for?

Cheers,

Jeremy


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 03:47:48
Message-ID: 603c8f070906232047m794a6fb7h9069028e12835cda@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 23, 2009 at 10:46 PM, Jeremy Kerr<jk(at)ozlabs(dot)org> wrote:
> Robert,
>
>> I'd still like to know the workload and exact numbers.
>
> From up-thread:
>
>  http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
>
> - or were there other details you were looking for?

Oh!

Very nice, sorry, missed the original message.

You probably want to add your patch here:
http://wiki.postgresql.org/wiki/CommitFestOpen

Thanks,

...Robert


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 05:29:45
Message-ID: 4A41B9C9.3040608@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Tue, Jun 23, 2009 at 10:46 PM, Jeremy Kerr<jk(at)ozlabs(dot)org> wrote:
>> Robert,
>>
>>> I'd still like to know the workload and exact numbers.
>> From up-thread:
>>
>> http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
>>
>> - or were there other details you were looking for?
>
> Oh!
>
> Very nice, sorry, missed the original message.
>
> You probably want to add your patch here:
> http://wiki.postgresql.org/wiki/CommitFestOpen

FWIW: I'm able to measure an even more significant improvement of around
10%:

http://archives.postgresql.org/pgsql-hackers/2009-06/msg01041.php on a
specific query.

Stefan


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 11:11:11
Message-ID: 20090624111111.GE20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Stefan Kaltenbrunner (stefan(at)kaltenbrunner(dot)cc) wrote:
> FWIW: I'm able to measure an even more significant improvement of around
> 10%:

What would be really useful would be "best case" and "worst case"
scenarios. Ideally, with profile information for this specific function
(in addition to full benchmark runs since those can show minimal
performance improvments due to other constraints, locking, etc).

Have you tried pulling this function out and running tests with all
alignment possibilities to see how it compares to the original? There's
only so many options and showing that this change always improves
performance, or at least doesn't degrade it in the worst case, would
certainly go a long way towards getting it accepted..

Thanks,

Stephen


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 11:23:47
Message-ID: 4A420CC3.3080702@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost wrote:
> * Stefan Kaltenbrunner (stefan(at)kaltenbrunner(dot)cc) wrote:
>> FWIW: I'm able to measure an even more significant improvement of around
>> 10%:
>
> What would be really useful would be "best case" and "worst case"
> scenarios. Ideally, with profile information for this specific function
> (in addition to full benchmark runs since those can show minimal
> performance improvments due to other constraints, locking, etc).

not sure what you are after here - my testcase is one specific query run
in parallel on 16 processes (running it in a single process yields
similiar improvements our scaling is pretty good here).

>
> Have you tried pulling this function out and running tests with all
> alignment possibilities to see how it compares to the original? There's
> only so many options and showing that this change always improves
> performance, or at least doesn't degrade it in the worst case, would
> certainly go a long way towards getting it accepted.

again I'm not exactly sure on what you want to get tested specifically -
the original problem report was because I noticed a significant
performance improvement from using varchar() instead of char(n) and that
showed bcTruelen() on the very top of the profile.
I was simply testing the patch Jeremy provided on that workload(the
upthread mentioned query is mostly affected by that on others there is
no measurable difference)

Stefan


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-24 11:32:59
Message-ID: 20090624113259.GF20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stefan,

* Stefan Kaltenbrunner (stefan(at)kaltenbrunner(dot)cc) wrote:
> Stephen Frost wrote:
>> What would be really useful would be "best case" and "worst case"
>> scenarios. Ideally, with profile information for this specific function
>> (in addition to full benchmark runs since those can show minimal
>> performance improvments due to other constraints, locking, etc).
>
> not sure what you are after here - my testcase is one specific query run
> in parallel on 16 processes (running it in a single process yields
> similiar improvements our scaling is pretty good here).

What I'm suggesting is to test what happens when strings sent to
bcTruelen are at different memory addresses (there's only 4 or 8
different possibilities here, then different lengths should be tested of
less-than-1-word, 1-word, >1-word, and then different ending points at
different memory addresses) and compare the new function to the old.

The main point here being to try and find out if there are any
pathological cases where the new function is much worse, or even
somewhat worse, so we can weigh that against the places where it
performs better.

This isn't something you can do from in PG. :)

> I was simply testing the patch Jeremy provided on that workload(the
> upthread mentioned query is mostly affected by that on others there is
> no measurable difference)

Certainly, your performance numbers are good justification if there's no
downside.

Thanks,

Stephen


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 02:18:58
Message-ID: 200906261018.58495.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Stephen,

> What would be really useful would be "best case" and "worst case"
> scenarios.

I've put together some data from a microbenchmark of the bcTrulen
function, patched and unpatched.

As for best-case, if you have a long string of trailing spaces, we can
go through them at theoretically one quarter of cost (a test benchmark
on x86 shows an actual reduction of 11 to 3 sec with a string of 100
spaces).

Worst-case behaviour is with smaller numbers of spaces. Here are the
transition points (ie, where doing the word-wise comparison is faster
than byte-wise) that I see from my benchmark:

align spaces
3 7
2 6
1 5
0 4

- where 'align' is the alignment of the first byte to compare (ie, at
the end of the string). This is pretty much as-expected, as these
transition points are the first opportunity that the new function has to
do a word compare.

In the worst cases, I see a 53% cost increase on x86 (with the string
'aaa ') and a 97% increase on PowerPC ('a ').

So, it all depends on the number of padding spaces we'd expect to see on
workload data. Fortunately, we see the larger reductions on the more
expensive operations (ie, longer strings).

Cheers,

Jeremy


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 02:48:58
Message-ID: 22169.1245984538@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Kerr <jk(at)ozlabs(dot)org> writes:
> I've put together some data from a microbenchmark of the bcTrulen
> function, patched and unpatched.

Uh, where's the data?

> In the worst cases, I see a 53% cost increase on x86 (with the string
> 'aaa ') and a 97% increase on PowerPC ('a ').
> So, it all depends on the number of padding spaces we'd expect to see on
> workload data. Fortunately, we see the larger reductions on the more
> expensive operations (ie, longer strings).

Unfortunately, the cases with lots of padding spaces are probably much
less probable than the cases with fewer. It would be unpleasant for
example if this patch resulted in a severe performance degradation for
a "canonical" example of char(n) being used properly, such as char(2)
for US state abbreviations.

regards, tom lane


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 03:20:39
Message-ID: 200906261120.39718.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

> > I've put together some data from a microbenchmark of the bcTrulen
> > function, patched and unpatched.
>
> Uh, where's the data?

If you're after the raw data for a run, I've put it up:

http://ozlabs.org/~jk/projects/db/data/bctruelen.csv

I've also packaged up the quick-and-dirty benchmark I've been using:

http://ozlabs.org/~jk/projects/db/bctrulen.tar.gz

to recreate as-needed. ./benchmark.sh will run tests for varying-length
strings (so changing the alignment) and varying numbers of trailing
spaces. ./benchmark-bctruelen <str> will perform an individual old/new
comparison.

> Unfortunately, the cases with lots of padding spaces are probably
> much less probable than the cases with fewer. It would be unpleasant
> for example if this patch resulted in a severe performance
> degradation for a "canonical" example of char(n) being used properly,
> such as char(2) for US state abbreviations.

Yep, makes sense. The other consideration is stock-ticker symbols, I
assume they may also be stored in CHAR(small n) columns.

Cheers,

Jeremy


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 07:16:29
Message-ID: 2E117E25-084A-4B8D-9C59-39F5FD4363DE@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Le 26 juin 09 à 05:20, Jeremy Kerr a écrit :
>> Unfortunately, the cases with lots of padding spaces are probably
>> much less probable than the cases with fewer. It would be unpleasant
>> for example if this patch resulted in a severe performance
>> degradation for a "canonical" example of char(n) being used properly,
>> such as char(2) for US state abbreviations.
>
> Yep, makes sense. The other consideration is stock-ticker symbols, I
> assume they may also be stored in CHAR(small n) columns.

Could this optimisation only kicks in when n is "big enough"?
I'm don't know if this part of the code knows the typmod...

Regards,
--
dim


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 12:03:04
Message-ID: 20090626120304.GH20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Dimitri Fontaine (dfontaine(at)hi-media(dot)com) wrote:
> Le 26 juin 09 à 05:20, Jeremy Kerr a écrit :
>>> Unfortunately, the cases with lots of padding spaces are probably
>>> much less probable than the cases with fewer. It would be unpleasant
>>> for example if this patch resulted in a severe performance
>>> degradation for a "canonical" example of char(n) being used properly,
>>> such as char(2) for US state abbreviations.
>>
>> Yep, makes sense. The other consideration is stock-ticker symbols, I
>> assume they may also be stored in CHAR(small n) columns.
>
> Could this optimisation only kicks in when n is "big enough"?
> I'm don't know if this part of the code knows the typmod...

Is it just the size that matters, or is it when there are few spaces at
the end? We do know the overall length, but I didn't see a definite
that if it's larger than X words, doing the by-word comparison is a win
regardless of how many actual spaces are at the end (apologies to Jeremy
if it's in his more detailed report, I havn't had a chance to look yet).

Thanks,

Stpehen


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 12:07:41
Message-ID: 200906262007.42636.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen,

> Is it just the size that matters, or is it when there are few spaces
> at the end?

It's the number of spaces at the end. If we knew this number, then we
wouldn't have to do any comparisons at all :)

Cheers,

Jeremy


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 12:12:12
Message-ID: 20090626121211.GI20436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Jeremy Kerr (jk(at)ozlabs(dot)org) wrote:
> > Is it just the size that matters, or is it when there are few spaces
> > at the end?
>
> It's the number of spaces at the end. If we knew this number, then we
> wouldn't have to do any comparisons at all :)

I meant in terms of affecting the performance of this function.. We
know the total length of the string, including spaces, coming into the
function. If the updated function is always faster when the overall
string is at least, say, 16 characters long, then Dimitri's suggestion
to just check the overall length coming in and decide which approach to
use might make sense. If the new function is always slower, regardless
of overall string length, when there's only 1 extra space at the end,
then chances are it's not worth it.

Thanks,

Stephen


From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 12:47:00
Message-ID: 200906262047.00727.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen,

> If the updated function is always faster when the overall string is at
> least, say, 16 characters long,

But that's not the case - the cost of the function (and the speedup from
the previous version) depends on the number of spaces that there are at
the end.

For the new function to be faster, we need to know that there are more
than 6 (on average, depending on alignment) trailing spaces. The number
of non-space characters in the string won't affect performance (other
than that there is probably a correlation between total string length
and number of spaces, but we can't be certain of that).

> If the new function is always slower, regardless of overall string
> length, when there's only 1 extra space at the end

Yes, that is correct.

Cheers,

Jeremy


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 13:41:07
Message-ID: 4621.1246023667@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Kerr <jk(at)ozlabs(dot)org> writes:
> Stephen,
>> If the updated function is always faster when the overall string is at
>> least, say, 16 characters long,

> But that's not the case - the cost of the function (and the speedup from
> the previous version) depends on the number of spaces that there are at
> the end.

Right, but there are certainly not more spaces than there are string
characters ;-)

I think Dimitri's idea is eminently worth trying. In a string of less
than, say, 16 bytes, the prospects of being able to win anything get
much smaller compared to the prospects of wasting the extra loop
overhead. There is also a DBA psychology angle to it. If you've got
CHAR(n) for very small n, it's likely that the type is being used in the
"canonical" fashion and there won't be many trailing blanks. The case
where we can hope to win is where we have CHAR(255) or some other
plucked-from-the-air limit.

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Jeremy Kerr <jk(at)ozlabs(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 15:03:11
Message-ID: 80F3A08D-54AC-4E6B-AAF9-D64F14FD324C@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le 26 juin 09 à 14:47, Jeremy Kerr a écrit :
> For the new function to be faster, we need to know that there are more
> than 6 (on average, depending on alignment) trailing spaces.

It's becoming somewhat tricky, but maybe the test to do for the
optimisation to get used is n >= threshold && str[n-6] == 0x20, àla
Boyer/Moore?

I call it tricky because you could have a space here which isn't
followed by spaces, but still, if it's not a space here, you're saying
we should not even try the optimisation.

--
dim


From: tomas(at)tuxteam(dot)de
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 15:29:06
Message-ID: 20090626152906.GA948@tomas
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Jun 26, 2009 at 05:03:11PM +0200, Dimitri Fontaine wrote:
> Le 26 juin 09 à 14:47, Jeremy Kerr a écrit :
>> For the new function to be faster, we need to know that there are more
>> than 6 (on average, depending on alignment) trailing spaces.
>
> It's becoming somewhat tricky, but maybe the test to do for the
> optimisation to get used is n >= threshold && str[n-6] == 0x20, àla
> Boyer/Moore?

That's cute. What about comparing the last aligned word which completely
fits in the buffer? Something along the lines of (assuming four-byte
words)

* (int*) (4 * ((int) &buf[0]) / 4)

(now that's an ugly one, but you know what I mean?)

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKROlCBcgs9XrR2kYRArHPAJ9VhT+RfK5/5BxwA0nxaOmK4nfuWACdFtFL
iKtvPaZY/KhDJMOf4hyzmQI=
=yd05
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: tomas(at)tuxteam(dot)de
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 15:39:51
Message-ID: 7337.1246030791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tomas(at)tuxteam(dot)de writes:
> On Fri, Jun 26, 2009 at 05:03:11PM +0200, Dimitri Fontaine wrote:
>> It's becoming somewhat tricky, but maybe the test to do for the
>> optimisation to get used is n >= threshold && str[n-6] == 0x20, la
>> Boyer/Moore?

> That's cute. What about comparing the last aligned word which completely
> fits in the buffer? Something along the lines of (assuming four-byte
> words)
> * (int*) (4 * ((int) &buf[0]) / 4)

We're trying to avoid adding cycles to the optimization-is-useless case.
The more expensive this test gets, the slower the unoptimizable case
becomes.

regards, tom lane


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: tomas(at)tuxteam(dot)de, "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>, "Jeremy Kerr" <jk(at)ozlabs(dot)org>, "Stephen Frost" <sfrost(at)snowman(dot)net>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2009-06-26 16:15:44
Message-ID: 49669.71.76.58.95.1246032944.squirrel@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, June 26, 2009 11:39 am, Tom Lane wrote:
> tomas(at)tuxteam(dot)de writes:
>> On Fri, Jun 26, 2009 at 05:03:11PM +0200, Dimitri Fontaine wrote:
>>> It's becoming somewhat tricky, but maybe the test to do for the
>>> optimisation to get used is n >= threshold && str[n-6] == 0x20, àla
>>> Boyer/Moore?
>
>> That's cute. What about comparing the last aligned word which completely
>> fits in the buffer? Something along the lines of (assuming four-byte
>> words)
>> * (int*) (4 * ((int) &buf[0]) / 4)
>
> We're trying to avoid adding cycles to the optimization-is-useless case.
> The more expensive this test gets, the slower the unoptimizable case
> becomes.
>

Yeah. Like you, I like the idea of a switch based on string length. I
would suggest a cutoff of something like 36 (length of the string
representation of a UUID). But maybe that will miss lots of optimisable
cases like address fields. I guess those people should really be using
varchar(n) anyway.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 02:35:41
Message-ID: 201002230235.o1N2Zfk14002@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Jeremy Kerr <jk(at)ozlabs(dot)org> writes:
> > Stephen,
> >> If the updated function is always faster when the overall string is at
> >> least, say, 16 characters long,
>
> > But that's not the case - the cost of the function (and the speedup from
> > the previous version) depends on the number of spaces that there are at
> > the end.
>
> Right, but there are certainly not more spaces than there are string
> characters ;-)
>
> I think Dimitri's idea is eminently worth trying. In a string of less
> than, say, 16 bytes, the prospects of being able to win anything get
> much smaller compared to the prospects of wasting the extra loop
> overhead. There is also a DBA psychology angle to it. If you've got
> CHAR(n) for very small n, it's likely that the type is being used in the
> "canonical" fashion and there won't be many trailing blanks. The case
> where we can hope to win is where we have CHAR(255) or some other
> plucked-from-the-air limit.

What ever happened to this patch?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 18:20:20
Message-ID: 603c8f071002231020h5217a8d1xfe827fe355511523@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 22, 2010 at 9:35 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Tom Lane wrote:
>> Jeremy Kerr <jk(at)ozlabs(dot)org> writes:
>> > Stephen,
>> >> If the updated function is always faster when the overall string is at
>> >> least, say, 16 characters long,
>>
>> > But that's not the case - the cost of the function (and the speedup from
>> > the previous version) depends on the number of spaces that there are at
>> > the end.
>>
>> Right, but there are certainly not more spaces than there are string
>> characters ;-)
>>
>> I think Dimitri's idea is eminently worth trying.  In a string of less
>> than, say, 16 bytes, the prospects of being able to win anything get
>> much smaller compared to the prospects of wasting the extra loop
>> overhead.  There is also a DBA psychology angle to it.  If you've got
>> CHAR(n) for very small n, it's likely that the type is being used in the
>> "canonical" fashion and there won't be many trailing blanks.  The case
>> where we can hope to win is where we have CHAR(255) or some other
>> plucked-from-the-air limit.
>
> What ever happened to this patch?

I think it's unclear that all of the best and worst cases have been
sufficiently tested and that the results are satisfactory. We have
everything from massive performance gains to no obvious benefit at
all, and it's very unclear that anyone has made a serious effort to
find a benchmark the worst-case scenarios. I think we should drop
this for now. *If* someone wants to put some work into more thorough
analysis for 9.1, we can revisit it then.

...Robert


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 18:23:56
Message-ID: 20100223182356.GJ3672@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas escribió:
> On Mon, Feb 22, 2010 at 9:35 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> > What ever happened to this patch?
>
> I think it's unclear that all of the best and worst cases have been
> sufficiently tested and that the results are satisfactory. We have
> everything from massive performance gains to no obvious benefit at
> all, and it's very unclear that anyone has made a serious effort to
> find a benchmark the worst-case scenarios. I think we should drop
> this for now. *If* someone wants to put some work into more thorough
> analysis for 9.1, we can revisit it then.

+1 -- it's not like it hasn't been a problem all along.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 18:29:27
Message-ID: 4B841E87.8090905@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Robert Haas escribió:
>> On Mon, Feb 22, 2010 at 9:35 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>>> What ever happened to this patch?
>> I think it's unclear that all of the best and worst cases have been
>> sufficiently tested and that the results are satisfactory. We have
>> everything from massive performance gains to no obvious benefit at
>> all, and it's very unclear that anyone has made a serious effort to
>> find a benchmark the worst-case scenarios. I think we should drop
>> this for now. *If* someone wants to put some work into more thorough
>> analysis for 9.1, we can revisit it then.
>
> +1 -- it's not like it hasn't been a problem all along.

hmm I tend to disagree, this patch was specifically done to address a
hotspot I noticed under a given workload and it helped a lot for that
workload(like getting 6000qps more is pretty neat imho).
While people might not use fixed width chars that often(which especially
for migrated database is imho not true) it is an issue that can be seen
with a rather popular database related benchmarking tool so we should
not really dismiss it easily.

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 18:39:56
Message-ID: 29879.1266950396@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> hmm I tend to disagree, this patch was specifically done to address a
> hotspot I noticed under a given workload and it helped a lot for that
> workload(like getting 6000qps more is pretty neat imho).
> While people might not use fixed width chars that often(which especially
> for migrated database is imho not true) it is an issue that can be seen
> with a rather popular database related benchmarking tool so we should
> not really dismiss it easily.

Nobody suggested dismissing it. The point was that it hasn't been
tested adequately to justify applying it now.

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 18:42:34
Message-ID: 4B84219A.9000602@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> hmm I tend to disagree, this patch was specifically done to address a
>> hotspot I noticed under a given workload and it helped a lot for that
>> workload(like getting 6000qps more is pretty neat imho).
>> While people might not use fixed width chars that often(which especially
>> for migrated database is imho not true) it is an issue that can be seen
>> with a rather popular database related benchmarking tool so we should
>> not really dismiss it easily.
>
> Nobody suggested dismissing it. The point was that it hasn't been
> tested adequately to justify applying it now.

not sure what testing people want to get done though (there are a fair
amount of results and profiles in the thread)?

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 18:51:22
Message-ID: 205.1266951082@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> Tom Lane wrote:
>> Nobody suggested dismissing it. The point was that it hasn't been
>> tested adequately to justify applying it now.

> not sure what testing people want to get done though (there are a fair
> amount of results and profiles in the thread)?

Robert was complaining that the worst case hadn't been characterized
adequately, which I agree with. We know it helps a lot on certain
cases, but what's the downside?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Kerr <jk(at)ozlabs(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Date: 2010-02-23 21:21:47
Message-ID: 201002232121.o1NLLlr03268@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> > Tom Lane wrote:
> >> Nobody suggested dismissing it. The point was that it hasn't been
> >> tested adequately to justify applying it now.
>
> > not sure what testing people want to get done though (there are a fair
> > amount of results and profiles in the thread)?
>
> Robert was complaining that the worst case hadn't been characterized
> adequately, which I agree with. We know it helps a lot on certain
> cases, but what's the downside?

Added to TODO list so we don't forget about it:

Considering improving performance of computing CHAR() value lengths

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +