Lists: | pgsql-bugs |
---|
From: | "Marek Lewczuk" <marek(at)lewczuk(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Date: | 2009-09-23 12:37:54 |
Message-ID: | 200909231237.n8NCbsi2015381@wwwmaster.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
The following bug has been logged online:
Bug reference: 5075
Logged by: Marek Lewczuk
Email address: marek(at)lewczuk(dot)com
PostgreSQL version: 8.4.0
Operating system: All
Description: Text Search parser does not identify xml tag when
attribute name's contains underscore
Details:
Please execute following example:
select * from ts_debug('english', '<img width="182" height="120"
align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>')
As the result you will see, that <img/> is not identified as XML tag, but
rather splitted as words, blank spaces etc. The reason for that is the fact,
that last attribute "test_aa" contains underscore in its name - when the
underscore is removed, then img tag is properly identified as XML tag.
XML definition allows using underscore in tag and attribute names.
From: | Euler Taveira de Oliveira <euler(at)timbira(dot)com> |
---|---|
To: | Marek Lewczuk <marek(at)lewczuk(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Date: | 2009-09-23 23:31:20 |
Message-ID: | 4ABAAFC8.7030108@timbira.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Marek Lewczuk escreveu:
> Please execute following example:
> select * from ts_debug('english', '<img width="182" height="120"
> align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>')
>
> As the result you will see, that <img/> is not identified as XML tag, but
> rather splitted as words, blank spaces etc. The reason for that is the fact,
> that last attribute "test_aa" contains underscore in its name - when the
> underscore is removed, then img tag is properly identified as XML tag.
>
> XML definition allows using underscore in tag and attribute names.
>
The problem is we already allow it in tag names but not in attribute names. So
the proper fix is to allow underscore when the state is TPS_InTag; according
to XML spec [1], the underscore is a valid character in attribute names.
A possible downside is that we don't have underscores in HTML attribute names.
In this case, should it fail? I don't think so but...
The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there
isn't a problem to back-patch it.
[1] http://www.w3.org/TR/REC-xml/#sec-common-syn
--
Euler Taveira de Oliveira
http://www.timbira.com/
Attachment | Content-Type | Size |
---|---|---|
ts.diff | text/plain | 687 bytes |
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Euler Taveira de Oliveira <euler(at)timbira(dot)com> |
Cc: | Marek Lewczuk <marek(at)lewczuk(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Date: | 2009-09-28 02:49:16 |
Message-ID: | 603c8f070909271949r4c3874e9y85f1bad2fdd7eb20@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Wed, Sep 23, 2009 at 7:31 PM, Euler Taveira de Oliveira
<euler(at)timbira(dot)com> wrote:
> Marek Lewczuk escreveu:
>> Please execute following example:
>> select * from ts_debug('english', '<img width="182" height="120"
>> align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>')
>>
>> As the result you will see, that <img/> is not identified as XML tag, but
>> rather splitted as words, blank spaces etc. The reason for that is the fact,
>> that last attribute "test_aa" contains underscore in its name - when the
>> underscore is removed, then img tag is properly identified as XML tag.
>>
>> XML definition allows using underscore in tag and attribute names.
>>
> The problem is we already allow it in tag names but not in attribute names. So
> the proper fix is to allow underscore when the state is TPS_InTag; according
> to XML spec [1], the underscore is a valid character in attribute names.
>
> A possible downside is that we don't have underscores in HTML attribute names.
> In this case, should it fail? I don't think so but...
>
> The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there
> isn't a problem to back-patch it.
This patch should probably be added to
https://commitfest.postgresql.org/action/commitfest_view/open so that
we don't lose track of it.
...Robert
From: | Selena Deckelmann <selenamarie(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Marek Lewczuk <marek(at)lewczuk(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Date: | 2009-09-28 03:19:53 |
Message-ID: | 2b5e566d0909272019s27df7254rb192b0b36493f508@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Sun, Sep 27, 2009 at 7:49 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Sep 23, 2009 at 7:31 PM, Euler Taveira de Oliveira
> <euler(at)timbira(dot)com> wrote:
>
>> The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there
>> isn't a problem to back-patch it.
>
> This patch should probably be added to
> https://commitfest.postgresql.org/action/commitfest_view/open so that
> we don't lose track of it.
Done.
--
http://chesnok.com/daily - me
http://endpoint.com - work
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Euler Taveira de Oliveira <euler(at)timbira(dot)com> |
Cc: | Marek Lewczuk <marek(at)lewczuk(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Date: | 2009-11-15 13:56:05 |
Message-ID: | 1258293365.14314.28.camel@vanquo.pezone.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On ons, 2009-09-23 at 20:31 -0300, Euler Taveira de Oliveira wrote:
> Marek Lewczuk escreveu:
> > Please execute following example:
> > select * from ts_debug('english', '<img width="182" height="120"
> > align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>')
> >
> > As the result you will see, that <img/> is not identified as XML tag, but
> > rather splitted as words, blank spaces etc. The reason for that is the fact,
> > that last attribute "test_aa" contains underscore in its name - when the
> > underscore is removed, then img tag is properly identified as XML tag.
> >
> > XML definition allows using underscore in tag and attribute names.
> >
> The problem is we already allow it in tag names but not in attribute names. So
> the proper fix is to allow underscore when the state is TPS_InTag; according
> to XML spec [1], the underscore is a valid character in attribute names.
>
> A possible downside is that we don't have underscores in HTML attribute names.
> In this case, should it fail? I don't think so but...
>
> The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there
> isn't a problem to back-patch it.
Fix committed to 8.3, 8.4, 8.5.
From: | Marek Lewczuk <newsy(at)lewczuk(dot)com> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | Euler Taveira de Oliveira <euler(at)timbira(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Date: | 2009-11-17 22:50:02 |
Message-ID: | 4B03289A.4010409@lewczuk.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
W dniu 2009-11-15 14:56, Peter Eisentraut pisze:
> On ons, 2009-09-23 at 20:31 -0300, Euler Taveira de Oliveira wrote:
>> Marek Lewczuk escreveu:
>>> Please execute following example:
>>> select * from ts_debug('english', '<img width="182" height="120"
>>> align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>')
> Fix committed to 8.3, 8.4, 8.5.
Great. Thanks.
ML