Re: Zero-length character breaking query?

Lists: pgsql-general
From: Doug Gorley <dgorley(at)aihs(dot)ca>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Zero-length character breaking query?
Date: 2012-03-15 23:09:32
Message-ID: 571E120A87CC684288FDD20126E4B4D674DE93CA47@HEXMBVS13.hostedmsx.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

G'day,

I believe I've got some bad data in a table, but I'm not sure how it got there, or how this scenario is possible.

The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH".

"select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
"select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
"select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
"select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

So, it's as if there is a zero-length character at the end of the value that is preventing a match. Is this possible? If so, how could this data have been created?

Thanks,

Doug Gorley
dgorley(at)aihs(dot)ca


From: David Johnston <polobo(at)yahoo(dot)com>
To: Doug Gorley <dgorley(at)aihs(dot)ca>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Zero-length character breaking query?
Date: 2012-03-16 13:16:22
Message-ID: D2A05F98-BC01-4F2D-B29C-A877A72C09CD@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mar 15, 2012, at 19:09, Doug Gorley <dgorley(at)aihs(dot)ca> wrote:

> G'day,
>
> I believe I've got some bad data in a table, but I'm not sure how it got there, or how this scenario is possible.
>
> The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH".
>
> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".
>
> So, it's as if there is a zero-length character at the end of the value that is preventing a match. Is this possible? If so, how could this data have been created?
>
> Thanks,
>
> Doug Gorley
> dgorley(at)aihs(dot)ca
>

Try (in the regexp) adding '\r?\n' after SMITH and see what happens.

How did you enter the SMITH record into the table in the first place?

David J.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Doug Gorley <dgorley(at)aihs(dot)ca>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Zero-length character breaking query?
Date: 2012-03-16 13:32:49
Message-ID: 6983.1331904769@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Doug Gorley <dgorley(at)aihs(dot)ca> writes:
> The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH".

> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

I'd check EXPLAIN (with the actual problematic string, not SMITH).
The planner is probably trying to build an index range condition from
the regex pattern --- is it doing the right thing given your locale?

If the plan looks okay, maybe you need to reindex whatever index it's
using.

regards, tom lane


From: Peter Bex <Peter(dot)Bex(at)xs4all(dot)nl>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Zero-length character breaking query?
Date: 2012-03-16 17:04:42
Message-ID: 20120316170442.GA882@frohike.homeunix.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Mar 15, 2012 at 05:09:32PM -0600, Doug Gorley wrote:
> G'day,
>
> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

What does octet_length return? Perhaps this is some nonprintable
control character. Unicode is full of those. If it differs from
the string length, then you can be pretty sure that's the case.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth


From: Bosco Rama <postgres(at)boscorama(dot)com>
To: Doug Gorley <dgorley(at)aihs(dot)ca>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Zero-length character breaking query?
Date: 2012-03-16 19:09:12
Message-ID: 4F638FD8.6070206@boscorama.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Doug Gorley wrote:
>
> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

what does the following return?
select str_name_l::bytea from tdt_unsent where str_name_l ~ '^SMITH';

It should show you any chars that don't usually display in your locale as
hex chars (i.e. \xNN).

Bosco.


From: Doug Gorley <dgorley(at)aihs(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Zero-length character breaking query?
Date: 2012-03-19 01:22:03
Message-ID: 571E120A87CC684288FDD20126E4B4D674DE93CD43@HEXMBVS13.hostedmsx.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks Tom; this is at a client site, so I have limited access, but it looks like a REINDEX resolves the issue.

Doug

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: March 16, 2012 6:33 AM
To: Doug Gorley
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Zero-length character breaking query?

Doug Gorley <dgorley(at)aihs(dot)ca> writes:
> The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH".

> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

I'd check EXPLAIN (with the actual problematic string, not SMITH).
The planner is probably trying to build an index range condition from the regex pattern --- is it doing the right thing given your locale?

If the plan looks okay, maybe you need to reindex whatever index it's using.

regards, tom lane