bytea, index and like operator

Lists: pgsql-bugs
From: Alvar Freude <alvar(at)a-blast(dot)org>
To: pgsql-bugs(at)postgresql(dot)org
Subject: bytea, index and like operator
Date: 2003-12-03 17:20:48
Message-ID: 1657180000.1070472048@gnarzelwicht.delirium-arts.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

there is a bug in handling bytea columns with index and the like-operator.
At least in 7.3.4. When the FreeBSD Port for 7.4 is ready, I'll test this
... ;-)

When an index scan is active, a query dosn't give the correct result:

select version();
version
---------------------------------------------------------------------
PostgreSQL 7.3.4 on i386-portbld-freebsd4.8, compiled by GCC 2.95.4

begin;
create table test (b bytea);
create index tst_idx on test(b);
insert into test values ('\001abc\006');
insert into test values ('\001xabc\006');
insert into test values ('\001\002abc\006');
insert into test values ('\000\001\002abc\006');
insert into test values ('\002\003abc\006');

select * from test where b like '\001%';

Result:

b
---
(0 Zeilen) [0 rows]

explain analyze select * from test where b like '\001%';
QUERY PLAN
- ---------------------------------------------------------------------------
- -------------------
Seq Scan on test (cost=0.00..22.50 rows=5 width=32) (actual
time=0.05..0.08 rows=3 loops=1)
Filter: (b ~~ '\\001%'::bytea)
Total runtime: 0.16 msec
(3 Zeilen)

explain analyze select * from test where b like '\001%';
QUERY PLAN
------------------------------------------------------------------------
Index Scan using tst_idx on test (cost=0.00..17.07 rows=5 width=32)
(actual time=0.04..0.04 rows=0 loops=1)
Index Cond: (b = '0'::bytea)
Filter: (b ~~ '\\001%'::bytea)
Total runtime: 0.14 msec

But with seq scan (after vacuuming, creating index later, ...) it works as
expected.

drop index tst_idx;
online_demo=> select * from test where b like '\001%';
b
-----------------
\001abc\006
\001xabc\006
\001\002abc\006
(3 Zeilen)

explain analyze select * from test where b like '\001%';
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on test (cost=0.00..22.50 rows=5 width=32) (actual
time=0.05..0.08 rows=3 loops=1)
Filter: (b ~~ '\\001%'::bytea)
Total runtime: 0.16 msec

hmmm ...

It seems, that bytea is no good idea for production use?

Ciao
Alvar

- --
** Alvar C.H. Freude -- http://alvar.a-blast.org/
** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
** ODEM.org-Tour: http://tour.odem.org/
** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/zhtwOndlH63J86wRAh9VAJ9tjx/MrvbMPjlqhQqvhbXLaIG5owCfRbAn
S65xELFQ6I9ObdzAXOTjIWM=
=7DuO
-----END PGP SIGNATURE-----


From: Joe Conway <mail(at)joeconway(dot)com>
To: Alvar Freude <alvar(at)a-blast(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: bytea, index and like operator
Date: 2003-12-03 21:08:41
Message-ID: 3FCE50D9.5060707@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvar Freude wrote:
> PostgreSQL 7.3.4 on i386-portbld-freebsd4.8, compiled by GCC 2.95.4
>
> begin;
> create table test (b bytea);
> create index tst_idx on test(b);
> insert into test values ('\001abc\006');
> insert into test values ('\001xabc\006');
> insert into test values ('\001\002abc\006');
> insert into test values ('\000\001\002abc\006');
> insert into test values ('\002\003abc\006');

Note that bytea input strings should be escaped with doubled
backslashes, because the string literal parser consumes 1 layer, and the
byteain function consumes another. See:
http://www.postgresql.org/docs/7.3/static/datatype-binary.html

In the strings above, the string literal parser will turn, e.g., "\001"
into the single octet '\1' anyway, and byteain will accept it just fine.
However "\000" will become '\0', and since byteain requires a null byte
terminator, you are actually inserting an empty string into test.b for
that row:

regression=# select b, b = '' from test;
b | ?column?
-----------------+----------
\001abc\006 | f
\001xabc\006 | f
\001\002abc\006 | f
| t
\002\003abc\006 | f
(5 rows)

> select * from test where b like '\001%';

This is weird. I'm sure it worked at one time -- will research.

Joe


From: Alvar Freude <alvar(at)a-blast(dot)org>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: bytea, index and like operator
Date: 2003-12-03 21:26:40
Message-ID: 1865030000.1070486800@gnarzelwicht.delirium-arts.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- Joe Conway <mail(at)joeconway(dot)com> wrote:

>
> Note that bytea input strings should be escaped with doubled backslashes,
> because the string literal parser consumes 1 layer, and the byteain
> function consumes another. See:
> http://www.postgresql.org/docs/7.3/static/datatype-binary.html

oh, shi.... -- Yes, I read this, but forgot it ... ;-(

So, yes, you are right.

Hmmm, there are a lot of pitfalls with bytea, *puh*!

Sorry for confusion!

Ciao
Alvar

- --
** Alvar C.H. Freude -- http://alvar.a-blast.org/
** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
** ODEM.org-Tour: http://tour.odem.org/
** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/zlUROndlH63J86wRArzBAKDC8hXj0GLZrIFlaaIQBt8pk4yi3gCcCDxy
kVnHzc3mklt6/8IcI6ZvD38=
=yUYU
-----END PGP SIGNATURE-----


From: Alvar Freude <alvar(at)a-blast(dot)org>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: bytea, index and like operator
Date: 2003-12-03 22:59:01
Message-ID: 1925690000.1070492341@gnarzelwicht.delirium-arts.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- Alvar Freude <alvar(at)a-blast(dot)org> wrote:

> there is a bug in handling bytea columns with index and the like-operator.
> At least in 7.3.4. When the FreeBSD Port for 7.4 is ready, I'll test this
> ... ;-)

sorry, shame on me: I have to put two \ in the query (and inserts) for
bytea.

But it's interesting that without and with index search the handling of
wrong data is different ... :)

Ciao
Alvar

- --
** Alvar C.H. Freude -- http://alvar.a-blast.org/
** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
** ODEM.org-Tour: http://tour.odem.org/
** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/zmq2OndlH63J86wRAsUvAJ9pTFwNNQAsG5cNXGdUcTwmkpIjeQCeNldH
aJuVsiY2juZqrG73VeSERdc=
=NJ6+
-----END PGP SIGNATURE-----


From: Alvar Freude <alvar(at)a-blast(dot)org>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: bytea, index and like operator
Date: 2003-12-03 23:25:01
Message-ID: 1948450000.1070493901@gnarzelwicht.delirium-arts.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- Joe Conway <mail(at)joeconway(dot)com> wrote:

> Note that bytea input strings should be escaped with doubled backslashes,
> because the string literal parser consumes 1 layer, and the byteain
> function consumes another. See:
> http://www.postgresql.org/docs/7.3/static/datatype-binary.html

hmmm, but there remains some confusing stuff. Perhaps I made some other
mistake, but:

I've a filled table with bytea strings (converted from former text column),
and want to make a "like" comparison on it. So, now I'll make double
backslashes for each byte: like doesn't select the rows, when a index
exists. With "=" one row is selected. When the index is deleted, the
correct three rows are returned.

See below. Whats going wrong?

The same happens in my application, when the strings are given unescaped.

db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%';
bytea_col
- -----
(0 rows)

db=> SELECT bytea_col FROM table WHERE bytea_col = '\\001\\012\\010';
bytea_col
- --------------
\001\012\010
(1 row)

db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%';
bytea_col
- -----
(0 rows)

db=> begin;
BEGIN
db=> drop index table_bytea_col_idx;
DROP INDEX
db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%';
bytea_col
- --------------------------
\001\012\010\001\001\001
\001\012\010
\001\012\010\001\001\002
(3 rows)

db=> rollback;
ROLLBACK
db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%';
bytea_col
- -----
(0 rows)

db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010';
bytea_col
- -----
(0 rows)

Ciao
Alvar

- --
** Alvar C.H. Freude -- http://alvar.a-blast.org/
** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
** ODEM.org-Tour: http://tour.odem.org/
** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/znDNOndlH63J86wRAtAyAKDJYq/KPSH7W4rJvO+VJQGe0OQi3wCfQOfr
HmWZiQdc4MW5JecTG0dqwSg=
=h3ag
-----END PGP SIGNATURE-----