Re: sortsupport for text

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sortsupport for text
Date: 2012-06-16 21:50:27
Message-ID: CAEYLb_Wud3m49qKh_4Ki+7AAkabYqmVTM0Zw18bM9jZkT-nprg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 March 2012 15:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> However, it occurred to me that we could pretty easily jury-rig
> something that would give us an idea about the actual benefit available
> here.  To wit: make a C function that wraps strxfrm, basically
> strxfrm(text) returns bytea.  Then compare the performance of
> ORDER BY text_col to ORDER BY strxfrm(text_col).
>
> (You would need to have either both or neither of text and bytea
> using the sortsupport code paths for this to be a fair comparison.)

I thought this was an interesting idea, so decided to try it out for
myself. I tried this out against master (not Robert's patch, per Tom's
direction). The results were interesting:

[peter(at)peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_strxfrm.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2795
tps = 46.563970 (including connections establishing)
tps = 46.568234 (excluding connections establishing)
[peter(at)peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_reg.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2079
tps = 34.638838 (including connections establishing)
tps = 34.640665 (excluding connections establishing)

The first test executed the following query against the dellstore database:

select * from products order by strxfrm_test(actor) offset 10001;

The second:

select * from products order by actor offset 10001;

So, this was pretty good - an improvement that is completely
independent of Robert's. Bear in mind, this simple demonstration adds
additional fmgr overhead, which we have plenty of reason to believe
could hurt things, besides which each call must allocate memory that
could perhaps be avoided. In addition, I don't know enough about
locale-aware sorting and related algorithms to have devised a test
that would stress strxfrm()/ strcoll() - these were all strings that
could be represented as ASCII.

In light of this, I think there is a pretty strong case to be made for
pre-processing text via strxfrm() as part of this patch.

Thoughts?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
strxfrm_test.tar.gz application/x-gzip 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2012-06-16 22:25:15 Re: measuring spinning
Previous Message Tom Lane 2012-06-16 21:38:18 s/UNSPECIFIED/SIMPLE/ in foreign key code?