texteq/byteaeq: avoid detoast [REVIEW]

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: texteq/byteaeq: avoid detoast [REVIEW]
Date: 2011-01-16 19:05:11
Message-ID: 4D334167.7060108@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is a review of:
https://commitfest.postgresql.org/action/patch_view?id=468

Purpose:
========
Equal and not-equal _may_ be quickly determined if their lengths are different. This _may_ be a huge speed up if we dont have to detoat.

The Patch:
==========
I was able to read and understand the patch, its a simple change and looked correct to me (a non PG hacker).
It applies clean to git head, compiles and runs fine with debug enabled.

make check passes

Usability:
==========
I used _may_ above. The benchmark included with the patch, showing huge speedups, is really contrived. It uses a where clause with a thousand character constant: (where c = 'long...long...long...long...ConstantText...etc'). In my opinion this is very uncommon (the author does note this is a "best case"). If you have a field large enough to be toasted you are not going to be using that to search on, you are going to have an ID field that is indexed. (select c where id = 7)

This also only touches = and <>. > < and like wont be touched. So I think the scope of this is limited.

THAT being said, the patch is simple, and if you do happen to hit the code, it will speed things up. As a user of PG I'd like to have this included. Its a corner case, but a big corner, and its a small, simple change, and it wont slow anything else down.

Performance:
============
I created myself a more real world test, with a table with indexes and id's and a large toasted field.

create table junk(id serial primary key, xgroup integer, c text);
create index junk_group on junk(xgroup);

I filled it full of junk:

do $$
declare i integer;
declare j integer;
begin
for i in 1..100 loop
for j in 1..500 loop
insert into junk(xgroup, c) values (j, 'c'||i);
insert into junk (xgroup, c) select j, repeat('abc', 2000)|| n from generate_series(1, 5) n;
end loop;
end loop;
end$$;

This will make about 600 records within the same xgroup. As well as a simple 'c15' type of value in c we can search for. My thinking is you may not know the exact unique id, but you do know what group its in, so that'll cut out 90% of the records, and then you'll have to add " and c = 'c15'" to get the exact one you want.

I still saw a nice performance boost.

Old PG:
$ psql < bench3.sql
Timing is on.
DO
Time: 2010.412 ms

Patched:
$ psql < bench3.sql
Timing is on.
DO
Time: 184.602 ms

bench3.sql:
do $$
declare i integer;
begin
for i in 1..400 loop
perform count(*) from junk where xgroup = i and c like 'c' || i;
end loop;
end$$;

Summary:
========
Performance speed-up: Oh yeah! If you just happen to hit it, and if you do hit it, you might want to re-think your layout a little bit.

Do I want it? Yes please.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-01-16 19:13:36 Re: Include WAL in base backup
Previous Message Simone Aiken 2011-01-16 18:47:11 Re: ToDo List Item - System Table Index Clustering