Re: 1 char in the world

From: alien <alien(at)spaceship(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 1 char in the world
Date: 2003-02-02 18:09:29
Message-ID: 3E3D5ED9.3050608@spaceship.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> I don't believe those numbers for a moment. All else being equal,
> comparing a "char" field to a literal should be exactly the same speed
> as comparing a bool field to a literal (and if you'd just said "where bool",
> the bool field would be faster). Both ought to be markedly faster than
> text.
>
> Look for errors in your test procedure. One thing I'd particularly
> wonder about is whether the query plans are the same. In the absence of
> any VACUUM ANALYZE data, I'd fully expect the planner to pick a
> different plan for a bool field than text/char --- because even without
> ANALYZE data, it knows that a bool column has only two possible values.

Well, the previous test was done on REAL data. Everything was indexed
and vacuum analyzed as it should be.

However, I generated some test data under "controlled" circumstances and
did get different results. Bear in mind, though, that the data is no
longer "real", and doesn't represent the system I am concerned about.

[Someone requested some tests with int4/int8, too, so I included them,
as well. However, I would never use 4 or 8 bytes to store one bit.
Since a byte is platform-atomic, however, I will use a whole byte for a
single bit, as bit packing is too expensive.]

create table booltest (
boo boolean,
cha "char",
txt text,
in4 int4,
in8 int8
);

Insert lots of data here, but stay consistent between fields. [If you
insert a TRUE into a boolean, put a 'Y' into a text or "char" field and
a 1 into an int type.] So, I basically had 2 different insert
statements (one for true and one for false), and I used a random number
generator to get a good distribution of them.

create index booidx on booltest(boo);
create index chaidx on booltest(cha);
create index txtidx on booltest(txt);
create index in4idx on booltest(in4);
create index in8idx on booltest(in8);

vacuum full verbose analyze booltest;
INFO: --Relation public.booltest--
INFO: Pages 6897: Changed 0, reaped 0, Empty 0, New 0; Tup 1000000: Vac
0, Keep/VTL 0/0, UnUsed 0, MinLen 52, MaxLen 52; Re-using: Free/Avail.
Space 362284/362284; EndEmpty/Avail. Pages 0/6897.
CPU 0.53s/0.41u sec elapsed 18.69 sec.
INFO: Index booidx: Pages 2193; Tuples 1000000.
CPU 0.24s/0.11u sec elapsed 3.33 sec.
INFO: Index chaidx: Pages 2193; Tuples 1000000.
CPU 0.23s/0.19u sec elapsed 4.01 sec.
INFO: Index txtidx: Pages 2745; Tuples 1000000.
CPU 0.51s/0.14u sec elapsed 4.07 sec.
INFO: Index in4idx: Pages 2193; Tuples 1000000.
CPU 0.20s/0.17u sec elapsed 3.51 sec.
INFO: Index in8idx: Pages 2745; Tuples 1000000.
CPU 0.26s/0.04u sec elapsed 1.92 sec.
INFO: Rel booltest: Pages: 6897 --> 6897; Tuple(s) moved: 0.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: --Relation pg_toast.pg_toast_4327226--
INFO: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space
0/0; EndEmpty/Avail. Pages 0/0.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Index pg_toast_4327226_index: Pages 1; Tuples 0.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO: Analyzing public.booltest
VACUUM

Count our test set:
select count(*) from booltest; [ALL] output:1000000
select count(*) from booltest where boo; [TRUES] output:498649

TESTS .....

1) INT8=1
explain analyze select count(*) from booltest where in8 = '1';
Aggregate (cost=1342272.26..1342272.26 rows=1 width=0) (actual
time=3434.37..3434.37 rows=1 loops=1)
-> Index Scan using in8idx on booltest (cost=0.00..1340996.42
rows=510333 width=0) (actual time=6.96..2704.45 rows=498649 loops=1)
Index Cond: (in8 = 1::bigint)
Total runtime: 3434.50 msec

2) INT4=1
explain analyze select count(*) from booltest where in4 = 1;
Aggregate (cost=1341990.26..1341990.26 rows=1 width=0) (actual
time=3219.24..3219.24 rows=1 loops=1)
-> Index Scan using in4idx on booltest (cost=0.00..1340714.42
rows=510333 width=0) (actual time=12.92..2548.20 rows=498649 loops=1)
Index Cond: (in4 = 1)
Total runtime: 3219.35 msec

3) TEXT='Y'
explain analyze select count(*) from booltest where txt = 'Y';
Aggregate (cost=1342272.26..1342272.26 rows=1 width=0) (actual
time=4820.06..4820.06 rows=1 loops=1)
-> Index Scan using txtidx on booltest (cost=0.00..1340996.42
rows=510333 width=0) (actual time=15.83..4042.07 rows=498649 loops=1)
Index Cond: (txt = 'Y'::text)
Total runtime: 4820.18 msec

4) BOOLEAN=true
explain analyze select count(*) from booltest where boo = true;
Aggregate (cost=1341990.26..1341990.26 rows=1 width=0) (actual
time=3437.30..3437.30 rows=1 loops=1)
-> Index Scan using booidx on booltest (cost=0.00..1340714.42
rows=510333 width=0) (actual time=28.16..2751.38 rows=498649 loops=1)
Index Cond: (boo = true)
Total runtime: 3437.42 msec

5) BOOLEAN [implied = true]
explain analyze select count(*) from booltest where boo;
Aggregate (cost=100018172.83..100018172.83 rows=1 width=0) (actual
time=2775.40..2775.40 rows=1 loops=1)
-> Seq Scan on booltest (cost=100000000.00..100016897.00 rows=510333
width=0) (actual time=0.10..2138.11 rows=498649 loops=1)
Filter: boo
Total runtime: 2775.50 msec

6) "char"='Y'
explain analyze select count(*) from booltest where cha = 'Y';
Aggregate (cost=1341990.26..1341990.26 rows=1 width=0) (actual
time=3379.71..3379.71 rows=1 loops=1)
-> Index Scan using chaidx on booltest (cost=0.00..1340714.42
rows=510333 width=0) (actual time=32.77..2695.77 rows=498649 loops=1)
Index Cond: (cha = 'Y'::"char")
Total runtime: 3379.82 msec

Average ms over 42 attempts per test, some one-after-the-other, others
mixed with other queries, was:

1) INT8=1 3229.76
2) INT4=1 3194.45
3) TEXT='Y' 4799.23
4) BOOLEAN=true 3283.30
5) BOOLEAN 2801.83
6) "char"='Y' 3290.15

The straight boolean test was the fastest at 2.8 secs, and the TEXT was
the slowest at 4.8 secs. Everything else settled in the same pot at
3.25 secs.

I wasn't too impressed with any of these times, actually, but I'm
bearing in mind that we are talking about an aggregate, which I have
learned much about in the last few days from the mailing list, and which
I expect to be slow in PG.

Since straight-BOOLEAN [not adding "= true" to the condition] is about
15% faster than "char", I will stick with BOOLEAN.

My immediate future plans also include recoding my system to never use
aggregates in "live" queries. I will be tracking statistics in real
time in statistics tables instead of hitting the database with an
aggregate. It is much cheaper to add a few milliseconds per insert than
to slow my whole system down for several seconds during an aggregate query.

In fact, if you have a sizable table, and especially if you are running
an OLTP server, unless you are manually investigating something in the
table, I recommend never using aggregates. Your SQL queries should be
aggregate-free for large tables, if possible.

Thanks!

--
Matt Mello

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2003-02-02 19:31:30 Re: Postgres 7.3.1 poor insert/update/search performance
Previous Message Curt Sampson 2003-02-02 17:14:51 Re: Postgres 7.3.1 poor insert/update/search performance