Re: [pgsql-packagers] Palle Girgensohn's ICU patch

Lists: pgsql-hackers
From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: PostgreSQL Packagers <pgsql-packagers(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Cc: Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Palle Girgensohn's ICU patch
Date: 2014-11-26 07:31:10
Message-ID: 18C8A481-33A6-4483-8C24-B8CE70DB7F27@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When packaging PostgreSQL for Postgres.app, I discovered a problem: strcoll doesn't work for multibyte encodings on OS X. As a consequence, text sorting in PostgreSQL doesn't work. The only workaround seemed to be to use a legacy encoding like latin1, which is inacceptable.

I discovered that OS X shares this limitation with FreeBSD, and there exists a patch written by Palle Girgensohn that uses the ICU library for collating strings instead of the std-c strcoll function. You can find it at http://people.freebsd.org/~girgen/postgresql-icu/README.html <http://people.freebsd.org/~girgen/postgresql-icu/README.html>

I applied the patch, and according to preliminary testing with 9.4rc1 it seems to work flawlessly on OS X as well.
See https://github.com/PostgresApp/PostgresApp/releases/tag/9.4rc1 <https://github.com/PostgresApp/PostgresApp/releases/tag/9.4rc1>

I have two questions:

1) Does anybody else have experience with this patch? Is it safe to release PostgreSQL binaries with this patch applied to the public?

2) Is there a reason why this patch hasn't been merged into core over the years? Since it requires setting a configure switch (--with-icu) it shouldn't break anything?

Best regards,
Jakob Egger


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: PostgreSQL Packagers <pgsql-packagers(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 07:41:49
Message-ID: 505C2FEA-C175-430E-9B8D-EF215B9223E9@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

This is indeed a very well tested patch as we've run it in production for 8+ years on 20+ systems.

It is not included upstreams mainly because I did ask for it to happen. I've been aiming to do it but haven't got around to it. Also, since 9.2 (?) there is support in PostgreSQL for setting collate locale per column. This is not yet supported by the patch, which makes it non-complete. You could argue that this is not as important as supporting the primary locale, but it would be hard to argue about that, it would have to be added for it to reach inclusion upstreams.

So, I can vouch for it, it does the job just fine. Upstreams support will happen eventually.

Palle

> 26 nov 2014 kl. 08yt:31 skrev Jakob Egger <jakob(at)eggerapps(dot)at>:
>
> When packaging PostgreSQL for Postgres.app, I discovered a problem: strcoll doesn't work for multibyte encodings on OS X. As a consequence, text sorting in PostgreSQL doesn't work. The only workaround seemed to be to use a legacy encoding like latin1, which is inacceptable.
>
> I discovered that OS X shares this limitation with FreeBSD, and there exists a patch written by Palle Girgensohn that uses the ICU library for collating strings instead of the std-c strcoll function. You can find it at http://people.freebsd.org/~girgen/postgresql-icu/README.html
>
> I applied the patch, and according to preliminary testing with 9.4rc1 it seems to work flawlessly on OS X as well.
> See https://github.com/PostgresApp/PostgresApp/releases/tag/9.4rc1
>
> I have two questions:
>
> 1) Does anybody else have experience with this patch? Is it safe to release PostgreSQL binaries with this patch applied to the public?
>
> 2) Is there a reason why this patch hasn't been merged into core over the years? Since it requires setting a configure switch (--with-icu) it shouldn't break anything?
>
> Best regards,
> Jakob Egger
>
>


From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: PostgreSQL Packagers <pgsql-packagers(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 08:01:42
Message-ID: F1F1DCA6-CD76-4979-B0C3-1491605C165B@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Palle,

thanks for the extremely quick response!

In that case I will include the patch in Postgres.app. Missing support for per-column collations is preferable to missing support for the standard locale!

I'll have a look at the per-column collation support, it would be great if PostgreSQL on OS X would work out of the box at some point.

Best regards,
Jakob

> Am 26.11.2014 um 08:41 schrieb Palle Girgensohn <girgen(at)pingpong(dot)net>:
>
> Hi!
>
> This is indeed a very well tested patch as we've run it in production for 8+ years on 20+ systems.
>
> It is not included upstreams mainly because I did ask for it to happen. I've been aiming to do it but haven't got around to it. Also, since 9.2 (?) there is support in PostgreSQL for setting collate locale per column. This is not yet supported by the patch, which makes it non-complete. You could argue that this is not as important as supporting the primary locale, but it would be hard to argue about that, it would have to be added for it to reach inclusion upstreams.
>
> So, I can vouch for it, it does the job just fine. Upstreams support will happen eventually.
>
> Palle
>
>
>
> 26 nov 2014 kl. 08yt:31 skrev Jakob Egger <jakob(at)eggerapps(dot)at <mailto:jakob(at)eggerapps(dot)at>>:
>
>> When packaging PostgreSQL for Postgres.app, I discovered a problem: strcoll doesn't work for multibyte encodings on OS X. As a consequence, text sorting in PostgreSQL doesn't work. The only workaround seemed to be to use a legacy encoding like latin1, which is inacceptable.
>>
>> I discovered that OS X shares this limitation with FreeBSD, and there exists a patch written by Palle Girgensohn that uses the ICU library for collating strings instead of the std-c strcoll function. You can find it at http://people.freebsd.org/~girgen/postgresql-icu/README.html <http://people.freebsd.org/~girgen/postgresql-icu/README.html>
>>
>> I applied the patch, and according to preliminary testing with 9.4rc1 it seems to work flawlessly on OS X as well.
>> See https://github.com/PostgresApp/PostgresApp/releases/tag/9.4rc1 <https://github.com/PostgresApp/PostgresApp/releases/tag/9.4rc1>
>>
>> I have two questions:
>>
>> 1) Does anybody else have experience with this patch? Is it safe to release PostgreSQL binaries with this patch applied to the public?
>>
>> 2) Is there a reason why this patch hasn't been merged into core over the years? Since it requires setting a configure switch (--with-icu) it shouldn't break anything?
>>
>> Best regards,
>> Jakob Egger
>>
>>


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, PostgreSQL Packagers <pgsql-packagers(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 08:58:28
Message-ID: CABUevEyNnHvpX2TCXYXi9e_ZvXXrdmFhkTrfcsW7ZsTQHTCDfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2014 at 8:41 AM, Palle Girgensohn <girgen(at)pingpong(dot)net> wrote:
> Hi!
>
> This is indeed a very well tested patch as we've run it in production for 8+
> years on 20+ systems.
>
> It is not included upstreams mainly because I did ask for it to happen. I've
> been aiming to do it but haven't got around to it. Also, since 9.2 (?) there
> is support in PostgreSQL for setting collate locale per column. This is not
> yet supported by the patch, which makes it non-complete. You could argue
> that this is not as important as supporting the primary locale, but it would
> be hard to argue about that, it would have to be added for it to reach
> inclusion upstreams.
>
> So, I can vouch for it, it does the job just fine. Upstreams support will
> happen eventually.

We did also discuss this back when we did the Windows port. One of the
big arguments against bringing it in then (because it worked) was that
we'd bring in another compile time dependency that's actually larger
than PostgreSQL itself. For example,the ICU .tgz file of the latest
version is 24.3Mb, and the latest postgresql .tgz is 21.8Mb. If we add
it as a requirement, we more than double the size of PostgreSQL. (Part
of that was specifically a concern on Windows of course, since no
dependencies can be expected to exist there - icu is a lot more likely
to already exist packaged up on linux/bsd)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 09:14:27
Message-ID: 64E82EF0-0A13-4D9A-8695-1B18F2BC1D2C@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 09:58 skrev Magnus Hagander <magnus(at)hagander(dot)net>:
>
> On Wed, Nov 26, 2014 at 8:41 AM, Palle Girgensohn <girgen(at)pingpong(dot)net> wrote:
>> Hi!
>>
>> This is indeed a very well tested patch as we've run it in production for 8+
>> years on 20+ systems.
>>
>> It is not included upstreams mainly because I did ask for it to happen. I've
>> been aiming to do it but haven't got around to it. Also, since 9.2 (?) there
>> is support in PostgreSQL for setting collate locale per column. This is not
>> yet supported by the patch, which makes it non-complete. You could argue
>> that this is not as important as supporting the primary locale, but it would
>> be hard to argue about that, it would have to be added for it to reach
>> inclusion upstreams.
>>
>> So, I can vouch for it, it does the job just fine. Upstreams support will
>> happen eventually.
>
>
> We did also discuss this back when we did the Windows port. One of the
> big arguments against bringing it in then (because it worked) was that
> we'd bring in another compile time dependency that's actually larger
> than PostgreSQL itself. For example,the ICU .tgz file of the latest
> version is 24.3Mb, and the latest postgresql .tgz is 21.8Mb. If we add
> it as a requirement, we more than double the size of PostgreSQL. (Part
> of that was specifically a concern on Windows of course, since no
> dependencies can be expected to exist there - icu is a lot more likely
> to already exist packaged up on linux/bsd)

For windows, that is very good argument. ICU is huge and takes forever to build. But as you say, it is a lot more likely to already be installed or at least packaged.

Also, you where, rightly, reluctant to use the ICU patch at that time because it required a memcopy (from utf-8 to ICUs internal utf-16) of every column it was to compare. This requirement is of course long gone, as ICU soon after fixed built in optimizations for utf-8, a very reasonable development step for the ICU platform... :-)

Jakob, including the patch in PostgreSQL.app seems pretty reasonable. There's is only a small fraction of ICU that is used, a couple of libraries I believe.

As I said, the missing feature will probably be fixed some time in the future, after which I will suggest the patch for inclusion. But it is not even near the top of my to-do list. :-/

Cheers,
Palle


From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 09:36:17
Message-ID: 2BC2B388-6500-4CFF-94BB-3FACBBE9F6B1@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> One of the
> big arguments against bringing it in then (because it worked) was that
> we'd bring in another compile time dependency that's actually larger
> than PostgreSQL itself.

Magnus: I don't see how this is a problem as long as using ICU is *optional*. On systems with a working strcoll there is no problem with using the stdc functions (except that ICU might offer more collations).

> Jakob, including the patch in PostgreSQL.app seems pretty reasonable. There's is only a small fraction of ICU that is used, a couple of libraries I believe.

Palle: The ICU libraries themselves aren't that big, but the required data files (also packaged as a dynamic library) are big (around 25MB uncompressed). However, I'd rather increase the download size by 30% than ship a broken database.

Best regards,
Jakob


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 09:45:53
Message-ID: 9C4F854F-8471-4718-86FC-962E8D81B71F@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 10:36 skrev Jakob Egger <jakob(at)eggerapps(dot)at>:
>
>> One of the
>> big arguments against bringing it in then (because it worked) was that
>> we'd bring in another compile time dependency that's actually larger
>> than PostgreSQL itself.
>
> Magnus: I don't see how this is a problem as long as using ICU is *optional*. On systems with a working strcoll there is no problem with using the stdc functions (except that ICU might offer more collations).
>

In windows, it was primarily about packaging, I believe. Mind you, this was many years ago... ;)

>
>> Jakob, including the patch in PostgreSQL.app seems pretty reasonable. There's is only a small fraction of ICU that is used, a couple of libraries I believe.
>
> Palle: The ICU libraries themselves aren't that big, but the required data files (also packaged as a dynamic library) are big (around 25MB uncompressed). However, I'd rather increase the download size by 30% than ship a broken database.

Bear in mind that this might alter the way indexes are built. From the top of my head, I just can't remember if this is true or not. I'm probably wrong? Magnus? You would have to try.

It does change the order by to properly handle utf-8 *AND* order by becomes case insensitve. I'm not sure this is correct SQL? I know that in Oracle, this is optional (NLS_COMP=LINGUISTIC and/or NLS_SORT=BINARY_CI), and SQL Server has something similar.


From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 09:48:23
Message-ID: 5CA4E4D0-FB0A-49BD-8DF9-57E0CF878DA4@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Bear in mind that this might alter the way indexes are built. From the top of my head, I just can't remember if this is true or not. I'm probably wrong? Magnus? You would have to try.

That's why I want to include it in the first version of 9.4, when people need to dump & reload their database anyway (I'll make a note not to use pg_upgrade)


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 09:48:56
Message-ID: ABA54B2D-7D45-4575-969C-6C008B723D6B@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 10:48 skrev Jakob Egger <jakob(at)eggerapps(dot)at>:
>
>
>> Bear in mind that this might alter the way indexes are built. From the top of my head, I just can't remember if this is true or not. I'm probably wrong? Magnus? You would have to try.
>
> That's why I want to include it in the first version of 9.4, when people need to dump & reload their database anyway (I'll make a note not to use pg_upgrade)

Good point.


From: Dave Page <dpage(at)postgresql(dot)org>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 10:05:44
Message-ID: CA+OCxoxS3wjcAJpax-aziLoyVTUO6kDtZk3=RZLKGB5KR5O8eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2014 at 9:48 AM, Jakob Egger <jakob(at)eggerapps(dot)at> wrote:
>
>> Bear in mind that this might alter the way indexes are built. From the top of my head, I just can't remember if this is true or not. I'm probably wrong? Magnus? You would have to try.
>
> That's why I want to include it in the first version of 9.4, when people need to dump & reload their database anyway (I'll make a note not to use pg_upgrade)

You may want to bear in mind that postgres.app is on the main PG
downloads page on the website. If you're patching Postgres to add a
feature like this, it would become a fork and would have to be moved
out of the "PostgreSQL Core Distribution" section of the download area
as we only include "pure" distributions there.

--
Dave Page
PostgreSQL Core Team
http://www.postgresql.org/


From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: Dave Page <dpage(at)postgresql(dot)org>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 10:13:43
Message-ID: 2FB2367D-02FC-4427-8413-5231A1072F6B@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am 26.11.2014 um 11:05 schrieb Dave Page <dpage(at)postgresql(dot)org>:
> You may want to bear in mind that postgres.app is on the main PG
> downloads page on the website. If you're patching Postgres to add a
> feature like this, it would become a fork and would have to be moved
> out of the "PostgreSQL Core Distribution" section of the download area
> as we only include "pure" distributions there.

I wasn't aware of this. I'll have to bring this up on the Postgres.app Github page.

Personally, I don't think that shipping a database with broken text sorting is acceptable; but I can't speak on behalf of the other contributors to Postgres.app without consulting them first.


From: Dave Page <dpage(at)postgresql(dot)org>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 10:20:23
Message-ID: CA+OCxowDGDxjya8LDKKO_NS4f2OpwKp5npQsAQZ0w8xx5jc0fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2014 at 10:13 AM, Jakob Egger <jakob(at)eggerapps(dot)at> wrote:
> Am 26.11.2014 um 11:05 schrieb Dave Page <dpage(at)postgresql(dot)org>:
>
> You may want to bear in mind that postgres.app is on the main PG
> downloads page on the website. If you're patching Postgres to add a
> feature like this, it would become a fork and would have to be moved
> out of the "PostgreSQL Core Distribution" section of the download area
> as we only include "pure" distributions there.
>
>
> I wasn't aware of this. I'll have to bring this up on the Postgres.app
> Github page.
>
> Personally, I don't think that shipping a database with broken text sorting
> is acceptable; but I can't speak on behalf of the other contributors to
> Postgres.app without consulting them first.

Right - but the correct course of action would be to get the problem
fixed in PostgreSQL itself, not to fork the code which could lead to
other problems for users.

--
Dave Page
PostgreSQL Core Team
http://www.postgresql.org/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Dave Page <dpage(at)postgresql(dot)org>, Palle Girgensohn <girgen(at)pingpong(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 10:23:21
Message-ID: CABUevExrpiiVM15GwYc=h7ohk=S1fQBXhmoKPsgxa5nKQ-arKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2014 at 11:13 AM, Jakob Egger <jakob(at)eggerapps(dot)at> wrote:
> Am 26.11.2014 um 11:05 schrieb Dave Page <dpage(at)postgresql(dot)org>:
>
> You may want to bear in mind that postgres.app is on the main PG
> downloads page on the website. If you're patching Postgres to add a
> feature like this, it would become a fork and would have to be moved
> out of the "PostgreSQL Core Distribution" section of the download area
> as we only include "pure" distributions there.
>
>
> I wasn't aware of this. I'll have to bring this up on the Postgres.app
> Github page.
>
> Personally, I don't think that shipping a database with broken text sorting
> is acceptable; but I can't speak on behalf of the other contributors to
> Postgres.app without consulting them first.

Is it broken *worse* in 9.4 than it was in previous versions?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: Dave Page <dpage(at)postgresql(dot)org>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 10:44:02
Message-ID: 876D6DAD-4DD9-44CF-89E2-5F1AED2F0236@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Am 26.11.2014 um 11:20 schrieb Dave Page <dpage(at)postgresql(dot)org>:
>
> On Wed, Nov 26, 2014 at 10:13 AM, Jakob Egger <jakob(at)eggerapps(dot)at> wrote:
>> Am 26.11.2014 um 11:05 schrieb Dave Page <dpage(at)postgresql(dot)org>:
>>
>> You may want to bear in mind that postgres.app is on the main PG
>> downloads page on the website. If you're patching Postgres to add a
>> feature like this, it would become a fork and would have to be moved
>> out of the "PostgreSQL Core Distribution" section of the download area
>> as we only include "pure" distributions there.
>>
>>
>> I wasn't aware of this. I'll have to bring this up on the Postgres.app
>> Github page.
>>
>> Personally, I don't think that shipping a database with broken text sorting
>> is acceptable; but I can't speak on behalf of the other contributors to
>> Postgres.app without consulting them first.
>
> Right - but the correct course of action would be to get the problem
> fixed in PostgreSQL itself, not to fork the code which could lead to
> other problems for users.

Agreed. Since this isn't a priority for Palle I'll have a look at the patch to see if I can extend it to make it suitable for submitting it, but since I have never contributed source to PostgreSQL I don't know yet if I can handle it.

I've opened an issue on Github to discuss what to do about Postgres.app and the upcoming 9.4 release:
https://github.com/PostgresApp/PostgresApp/issues/233 <https://github.com/PostgresApp/PostgresApp/issues/233>

Best regards,
Jakob


From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dave Page <dpage(at)postgresql(dot)org>, Palle Girgensohn <girgen(at)pingpong(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 10:50:24
Message-ID: 1D4D2992-FC1A-4921-82B8-017D7190581E@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Is it broken *worse* in 9.4 than it was in previous versions?

No.

Because the indices need to be rebuilt, the only realistic opportunity for applying this patch to Postgres.app is when releasing a major new version, since then people need to migrate their data anyway. That's why I wanted to apply the patch when 9.4 is released.

I'm starting to see that maybe not all bugs can be fixed right now; I'm now waiting for input from the other contributors on Github.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Jakob Egger <jakob(at)eggerapps(dot)at>, PostgreSQL Packagers <pgsql-packagers(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 11:41:41
Message-ID: CAFj8pRA5bwifP0yAbA1iYg+QYh_CCxvx8X4C_X4=y03Nw_BBbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-11-26 9:58 GMT+01:00 Magnus Hagander <magnus(at)hagander(dot)net>:

> On Wed, Nov 26, 2014 at 8:41 AM, Palle Girgensohn <girgen(at)pingpong(dot)net>
> wrote:
> > Hi!
> >
> > This is indeed a very well tested patch as we've run it in production
> for 8+
> > years on 20+ systems.
> >
> > It is not included upstreams mainly because I did ask for it to happen.
> I've
> > been aiming to do it but haven't got around to it. Also, since 9.2 (?)
> there
> > is support in PostgreSQL for setting collate locale per column. This is
> not
> > yet supported by the patch, which makes it non-complete. You could argue
> > that this is not as important as supporting the primary locale, but it
> would
> > be hard to argue about that, it would have to be added for it to reach
> > inclusion upstreams.
> >
> > So, I can vouch for it, it does the job just fine. Upstreams support will
> > happen eventually.
>
>
> We did also discuss this back when we did the Windows port. One of the
> big arguments against bringing it in then (because it worked) was that
> we'd bring in another compile time dependency that's actually larger
> than PostgreSQL itself. For example,the ICU .tgz file of the latest
> version is 24.3Mb, and the latest postgresql .tgz is 21.8Mb. If we add
> it as a requirement, we more than double the size of PostgreSQL. (Part
> of that was specifically a concern on Windows of course, since no
> dependencies can be expected to exist there - icu is a lot more likely
> to already exist packaged up on linux/bsd)
>

24MB is not problem for mostly Windows users. I don't propose ICU as main
solution for us, but it can be good alternative for some companies, that
should to fix inconsistency in collation implementation between Windows and
Linux. Czech collation in Windows and Linux can produces different results
in some corner cases.

Regards

Pavel

>
>
>
> --
> Magnus Hagander
> Me: http://www.hagander.net/
> Work: http://www.redpill-linpro.com/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 13:06:36
Message-ID: D9AC014E-7016-4014-9726-8BB3B726B541@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 11:44 skrev Jakob Egger <jakob(at)eggerapps(dot)at>:
>
>
>> Am 26.11.2014 um 11:20 schrieb Dave Page <dpage(at)postgresql(dot)org>:
>>
>> On Wed, Nov 26, 2014 at 10:13 AM, Jakob Egger <jakob(at)eggerapps(dot)at> wrote:
>>> Am 26.11.2014 um 11:05 schrieb Dave Page <dpage(at)postgresql(dot)org>:
>>>
>>> You may want to bear in mind that postgres.app is on the main PG
>>> downloads page on the website. If you're patching Postgres to add a
>>> feature like this, it would become a fork and would have to be moved
>>> out of the "PostgreSQL Core Distribution" section of the download area
>>> as we only include "pure" distributions there.
>>>
>>>
>>> I wasn't aware of this. I'll have to bring this up on the Postgres.app
>>> Github page.
>>>
>>> Personally, I don't think that shipping a database with broken text sorting
>>> is acceptable; but I can't speak on behalf of the other contributors to
>>> Postgres.app without consulting them first.
>>
>> Right - but the correct course of action would be to get the problem
>> fixed in PostgreSQL itself, not to fork the code which could lead to
>> other problems for users.
>
> Agreed. Since this isn't a priority for Palle I'll have a look at the patch to see if I can extend it to make it suitable for submitting it, but since I have never contributed source to PostgreSQL I don't know yet if I can handle it.

Well, this discussion actually pushes the priority quite a bit for me -- someone else actually beeing interested about the patch... I thought it was just me... :)=

Just for reference, the Linux collation is actaully also broken wrt to utf-8. It is better than others, but not correct. And lower()/upper() has many rather common cases where it is not working with "wide characters". For example, the towupper only looks at one character at the time, but proper handling needs to look at adjacent characters in some languages.

Either way, getting it into core would not happen before 9.5 anyway.

>
> I've opened an issue on Github to discuss what to do about Postgres.app and the upcoming 9.4 release:
> https://github.com/PostgresApp/PostgresApp/issues/233
>
> Best regards,
> Jakob


From: Greg Stark <stark(at)mit(dot)edu>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 14:21:02
Message-ID: CAM-w4HOKKN0=1n=4DgJeh+tMK18ZGaWQ=R1kk6dDiZ17f+AubA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I find it hard to believe the original premise of this thread. We knew
there were some problems with OSX and FreeBSD but surely they can't be
completely broken? What happens if you run "ls" with your locale set
to something like fr_FR.UTF8 ? Does Apple not sell Macs in countries
other than the US?

There were a number of problems with using ICU including the large
dependency and the limitations of the iterator model but the main
issue was that it's fundamentally a choice between being consistent
with every other application on your system and being consistent with
other Postgres databases running on other OSes. Most people run
multiple applications on one OS, not many databases on many OSes on
their own with no other applications. If Postgres used ICU then its
output would be inconsistent with things like "sort" or "ls" or your
application programming language's comparison operators.


From: Neil Tiffin <neilt(at)neiltiffin(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Jakob Egger <jakob(at)eggerapps(dot)at>, Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 14:56:08
Message-ID: CF601C63-DBFC-4FEE-B065-45A0BEBD2E3C@neiltiffin.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Nov 26, 2014, at 8:21 AM, Greg Stark <stark(at)mit(dot)edu> wrote:

> I find it hard to believe the original premise of this thread. We knew
> there were some problems with OSX and FreeBSD but surely they can't be
> completely broken?

Ever tried to use Spotlight for searching (English) on the Mac, not completely broken, just not reliable. This does not surprise me in the least for OSX. The Mac has, in recent history, become a “looks good", but the details may or may not be really correct platform.

I thought FreeBSD was a preferred OS for PostgreSQL? This does surprise me.

> What happens if you run "ls" with your locale set
> to something like fr_FR.UTF8 ? Does Apple not sell Macs in countries
> other than the US?

Neil
Daily Mac user for a long time.


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 15:10:32
Message-ID: 15C9D821-9D55-4E14-8854-FA769BC7DDA6@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 15:21 skrev Greg Stark <stark(at)mit(dot)edu>:
>
> I find it hard to believe the original premise of this thread. We knew
> there were some problems with OSX and FreeBSD but surely they can't be
> completely broken? What happens if you run "ls" with your locale set
> to something like fr_FR.UTF8 ? Does Apple not sell Macs in countries
> other than the US?

Hi,

On Mac OS X, ls -l is completely broken wrt utf-8 collation. Really. Horribly broken. The sorting it produces for the Swedish locale is just nonexisting, completely unaccetable, unusable. Compare it to sorting Z just after S or something, just to get the scale of how bad it is.

Application languages like Java have their own sorting. C based stuff like perl have their own way to do it. python, well depends on the version, haven't checked. C applications, well, it depends on if they use ICU or not, I guess. :)

Apples sells computers, but does not really promote using locales in Terminal.app... :)=

>
> There were a number of problems with using ICU including the large
> dependency and the limitations of the iterator model but the main
> issue was that it's fundamentally a choice between being consistent
> with every other application on your system and being consistent with
> other Postgres databases running on other OSes. Most people run
> multiple applications on one OS, not many databases on many OSes on
> their own with no other applications. If Postgres used ICU then its
> output would be inconsistent with things like "sort" or "ls" or your
> application programming language's comparison operators.

I think most people don't care about getting postgresql collation consistent with sort or ls, they just want it to work properly for real life applications, so users who really don't care about ls or sort get the result they expect. Or, they give up and sort it in the application instead (=fail). But I guess that depends on which applications you use. We've used the patch for 8+ years. For us, Linux built-in collation would not have been enough either -- if memory serves it fails to sort 'ß' together with 'ss', and also fails to upper('ß') => 'SS', which would be expected in the real world.


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Neil Tiffin <neilt(at)neiltiffin(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Jakob Egger <jakob(at)eggerapps(dot)at>, Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 15:16:14
Message-ID: 2923AA8F-2256-450A-BF1B-441D8CBD8077@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 15:56 skrev Neil Tiffin <neilt(at)neiltiffin(dot)com>:
>
>
> On Nov 26, 2014, at 8:21 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>
>> I find it hard to believe the original premise of this thread. We knew
>> there were some problems with OSX and FreeBSD but surely they can't be
>> completely broken?
>
> Ever tried to use Spotlight for searching (English) on the Mac, not completely broken, just not reliable. This does not surprise me in the least for OSX. The Mac has, in recent history, become a “looks good", but the details may or may not be really correct platform.
>
> I thought FreeBSD was a preferred OS for PostgreSQL? This does surprise me.

It works fine if you use the English language, or if you don't use utf-8. And it works fine with utf-8 if you don't care about "real world sorting", or if you do the sorting in your application anyway (most OS:es collations are really broken for non-english locales anyway).

So for a great number of people, it works great. For the rest of us, well, I use ICU... :)

>
>> What happens if you run "ls" with your locale set
>> to something like fr_FR.UTF8 ? Does Apple not sell Macs in countries
>> other than the US?
>
> Neil
> Daily Mac user for a long time.
>
> --
> Sent via pgsql-packagers mailing list (pgsql-packagers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-packagers


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 15:17:45
Message-ID: 32347925-F97A-4A65-80BB-701BAE377DCB@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 14:06 skrev Palle Girgensohn <girgen(at)pingpong(dot)net>:
>
> Well, this discussion actually pushes the priority quite a bit for me -- someone else actually beeing interested about the patch... I thought it was just me... :)=

By "pushes the priority", I mean it gets more prioritized, in case that was unclear. :)


From: Geoff Montee <geoff(dot)montee(at)gmail(dot)com>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 16:46:11
Message-ID: CAA7biFNa3LSZki8MgfJtMJty6bp_3mKFp0ow7q3aEmHSutdgJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2014 at 10:17 AM, Palle Girgensohn <girgen(at)pingpong(dot)net> wrote:
>
>
> > 26 nov 2014 kl. 14:06 skrev Palle Girgensohn <girgen(at)pingpong(dot)net>:
> >
> > Well, this discussion actually pushes the priority quite a bit for me -- someone else actually beeing interested about the patch... I thought it was just me... :)=
>
> By "pushes the priority", I mean it gets more prioritized, in case that was unclear. :)

This topic reminds me of a thread from a couple months ago:

http://www.postgresql.org/message-id/F8268DB6-B50F-429F-8289-DA8FFA5F22BA@tripadvisor.com

It sounds like adding ICU support to core may also allow for adding
collation versioning to indexes.

Geoff


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Jakob Egger <jakob(at)eggerapps(dot)at>, Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 17:38:18
Message-ID: CAM3SWZSuuhgg_9hgx7+NQQXFDcQ+1Mzk9aGyEpfNguCPAm8yfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2014 at 6:21 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> There were a number of problems with using ICU including the large
> dependency and the limitations of the iterator model but the main
> issue was that it's fundamentally a choice between being consistent
> with every other application on your system and being consistent with
> other Postgres databases running on other OSes. Most people run
> multiple applications on one OS, not many databases on many OSes on
> their own with no other applications. If Postgres used ICU then its
> output would be inconsistent with things like "sort" or "ls" or your
> application programming language's comparison operators.

Unless your application programming language is written in Java, as many are.

--
Peter Geoghegan


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Dave Page <dpage(at)postgresql(dot)org>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 17:46:45
Message-ID: CAM3SWZSOysy6ufQzMFKJvw8m=9KPYDA3kBe=ZsWgDUHyRRAnuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2014 at 2:05 AM, Dave Page <dpage(at)postgresql(dot)org> wrote:
> You may want to bear in mind that postgres.app is on the main PG
> downloads page on the website. If you're patching Postgres to add a
> feature like this, it would become a fork and would have to be moved
> out of the "PostgreSQL Core Distribution" section of the download area
> as we only include "pure" distributions there.

Doesn't the existing FreeBSD link go to the ports collection? And
doesn't the PostgreSQL package automatically use this very ICU patch?

It seems like the FreeBSD people were working around their poor OS
locale support here. While I think we should officially adopt ICU, it
seems a little unfair to call what they've done a fork.
--
Peter Geoghegan


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Dave Page <dpage(at)postgresql(dot)org>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 19:42:48
Message-ID: 54762D38.1050604@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/26/14 12:46 PM, Peter Geoghegan wrote:
> On Wed, Nov 26, 2014 at 2:05 AM, Dave Page <dpage(at)postgresql(dot)org> wrote:
>> You may want to bear in mind that postgres.app is on the main PG
>> downloads page on the website. If you're patching Postgres to add a
>> feature like this, it would become a fork and would have to be moved
>> out of the "PostgreSQL Core Distribution" section of the download area
>> as we only include "pure" distributions there.
>
> Doesn't the existing FreeBSD link go to the ports collection? And
> doesn't the PostgreSQL package automatically use this very ICU patch?
>
> It seems like the FreeBSD people were working around their poor OS
> locale support here. While I think we should officially adopt ICU, it
> seems a little unfair to call what they've done a fork.

I would welcome the addition of support for ICU and possibly other
locale libraries. The features were designed with that in mind.

But I think what is being proposed here needs to be reigned in from time
to time. Search the archives at various times for "debian", "gentoo",
or even "mandrake" for examples of what can happen when this goes too far.

It's a sliding scale. FreeBSD ports are notionally a build-from-source
system targeted as experts. Someone who installs a port has a chance to
look at the port definition and learn what will be installed. (A build
option and a more explicit warning might be nice.) Postgres.app is a
binary distribution apparently targeted at inexperienced or casual users
at a much bigger scale. Users won't have an option to learn about this
unofficial feature or a chance to disable it. Also, Postgres.app is not
the only distribution for this platform, so this could create a lot of
confusion.

It's open source, and I don't want to discourage people from
experimenting and sharing. But I'm with Dave: listing a distribution
among the primary download options should imply that the software is as
pristine as possible.


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Dave Page <dpage(at)postgresql(dot)org>, Jakob Egger <jakob(at)eggerapps(dot)at>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 20:42:08
Message-ID: 5D74D783-027A-48F1-815A-68659A97D197@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 26 nov 2014 kl. 20:42 skrev Peter Eisentraut <peter_e(at)gmx(dot)net>:
>
> (A build
> option and a more explicit warning might be nice.)

In the freebsd ports, it is an option, default is off. :-)


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Dave Page <dpage(at)postgresql(dot)org>, Jakob Egger <jakob(at)eggerapps(dot)at>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-26 20:45:06
Message-ID: 54763BD2.4000908@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/26/14 3:42 PM, Palle Girgensohn wrote:
>
>> 26 nov 2014 kl. 20:42 skrev Peter Eisentraut <peter_e(at)gmx(dot)net>:
>>
>> (A build
>> option and a more explicit warning might be nice.)
>
> In the freebsd ports, it is an option, default is off. :-)

That's even better.

Sorry, I looked at the port sources and couldn't identify that it was an
option.


From: Jakob Egger <jakob(at)eggerapps(dot)at>
To: Geoff Montee <geoff(dot)montee(at)gmail(dot)com>
Cc: Palle Girgensohn <girgen(at)pingpong(dot)net>, Dave Page <dpage(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-27 09:09:38
Message-ID: 72AE2E04-CD4E-4E7A-9303-49DE5354B4B3@eggerapps.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am 26.11.2014 um 17:46 schrieb Geoff Montee <geoff(dot)montee(at)gmail(dot)com>:
> This topic reminds me of a thread from a couple months ago:
>
> http://www.postgresql.org/message-id/F8268DB6-B50F-429F-8289-DA8FFA5F22BA@tripadvisor.com
>
> It sounds like adding ICU support to core may also allow for adding
> collation versioning to indexes.

Reading through this thread it becomes clear to me that adding support for ICU is more important than I thought, and the only problem is that no one has yet volunteered for it :)

I've started looking through the PostgreSQL source and Palle's patch to estimate what needs to be done.

MINIMUM TODO
============

* Add support for per-column collations in varstr_comp() in varlena.c. Currently the patch creates a single ICU collator for the default collation and stores it in a static variable. We would need to change this to create collators for each collation and store them in a hash table similar to pg_newlocale_from_collation() / lookup_collation_cache()

* There's a new feature in trunk for faster sorting using SortSupport, so we would also need to also patch bttextfastcmp_locale() in varlena.c

These two changes would allow using ICU for collation. This has two major advantages:
1) Systems with broken strcoll like OS X and FreeBSD can take advantage of ICU to offer proper text sorting
2) You can link with a specific version of ICU to avoid index corruption and duplicate keys caused by changing implementations of the glibc strcoll function

NEXT STEPS: Support for more collations
=======================================

ICU offers a lot more collations than the OS. For example, besides "de_CH" it also offers "de_CH(at)collation=phonebook". Adding support for these is a bit more involved.

* initdb would need to be extended to also look for collations offered by ICU and add them to the pg_collation catalog.

* A special case for LC_COLLATE must be added to check_locale() in the backend, get_canonical_locale_name() in pg_upgrade, check_locale_name() in initdb to support collations provided by ICU

* pg_perm_setlocale() must get a special case to handle ICU collations

* the local handling code in pgperl must be modified (when using a ICU collation as default collation, we must decide what collation to send to perl)

* convert_string_datum() in selfuncs.c could be patched to use ICU instead of strxfrm. However, as far as I understand, this is not absolutely required as this is only used by the query planner and would in the worst case prevent some optimisation in corner cases

These changes would probably have an even bigger impact, because then people would no longer be limited to the collations supported by the locales installed on their OS.

NEXT STEPS: Collation versioning in indices
===========================================

Since ICU provides reliable versioning of collations, this would allow us to finally prevent index corruption caused by changing implementations of strcoll. I haven't looked at this in detail, but I assume that this would be a small change with potentially big impact.

Ideally, PostgreSQL would detect when the collation is a different version than the one used to create the index, and stop using the index until it is rebuilt.

I'll take a shot at the MINIMUM TODO as outlined above.


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: Geoff Montee <geoff(dot)montee(at)gmail(dot)com>, Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-27 09:15:43
Message-ID: CA+OCxozt5BjdggE9-QnYxCQBROATXQV6ytWES8XPdG77f=5kbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 27, 2014 at 9:09 AM, Jakob Egger <jakob(at)eggerapps(dot)at> wrote:

> Am 26.11.2014 um 17:46 schrieb Geoff Montee <geoff(dot)montee(at)gmail(dot)com>:
> > This topic reminds me of a thread from a couple months ago:
> >
> >
> http://www.postgresql.org/message-id/F8268DB6-B50F-429F-8289-DA8FFA5F22BA@tripadvisor.com
> >
> > It sounds like adding ICU support to core may also allow for adding
> > collation versioning to indexes.
>
> Reading through this thread it becomes clear to me that adding support for
> ICU is more important than I thought, and the only problem is that no one
> has yet volunteered for it :)
>
> I've started looking through the PostgreSQL source and Palle's patch to
> estimate what needs to be done.
>
> MINIMUM TODO
> ============
>
> * Add support for per-column collations in varstr_comp() in varlena.c.
> Currently the patch creates a single ICU collator for the default collation
> and stores it in a static variable. We would need to change this to create
> collators for each collation and store them in a hash table similar to
> pg_newlocale_from_collation() / lookup_collation_cache()
>
> * There's a new feature in trunk for faster sorting using SortSupport, so
> we would also need to also patch bttextfastcmp_locale() in varlena.c
>
> These two changes would allow using ICU for collation. This has two major
> advantages:
> 1) Systems with broken strcoll like OS X and FreeBSD can take advantage of
> ICU to offer proper text sorting
> 2) You can link with a specific version of ICU to avoid index corruption
> and duplicate keys caused by changing implementations of the glibc strcoll
> function
>
>
> NEXT STEPS: Support for more collations
> =======================================
>
> ICU offers a lot more collations than the OS. For example, besides "de_CH"
> it also offers "de_CH(at)collation=phonebook". Adding support for these is a
> bit more involved.
>
> * initdb would need to be extended to also look for collations offered by
> ICU and add them to the pg_collation catalog.
>
> * A special case for LC_COLLATE must be added to check_locale() in the
> backend, get_canonical_locale_name() in pg_upgrade, check_locale_name() in
> initdb to support collations provided by ICU
>
> * pg_perm_setlocale() must get a special case to handle ICU collations
>
> * the local handling code in pgperl must be modified (when using a ICU
> collation as default collation, we must decide what collation to send to
> perl)
>
> * convert_string_datum() in selfuncs.c could be patched to use ICU instead
> of strxfrm. However, as far as I understand, this is not absolutely
> required as this is only used by the query planner and would in the worst
> case prevent some optimisation in corner cases
>
> These changes would probably have an even bigger impact, because then
> people would no longer be limited to the collations supported by the
> locales installed on their OS.
>
> NEXT STEPS: Collation versioning in indices
> ===========================================
>
> Since ICU provides reliable versioning of collations, this would allow us
> to finally prevent index corruption caused by changing implementations of
> strcoll. I haven't looked at this in detail, but I assume that this would
> be a small change with potentially big impact.
>
> Ideally, PostgreSQL would detect when the collation is a different version
> than the one used to create the index, and stop using the index until it is
> rebuilt.
>
>
> I'll take a shot at the MINIMUM TODO as outlined above.
>
>
We've already included ICU support in our Postgres Plus Advanced Server
product. Before you spend too much time on this, give me a few days to see
if we can get that change contributed back. The people I need to speak to
are OOO for Thanksgiving at the moment though, so it may be a few days.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <stark(at)mit(dot)edu>
To: Jakob Egger <jakob(at)eggerapps(dot)at>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>, Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Geoff Montee <geoff(dot)montee(at)gmail(dot)com>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-27 10:03:42
Message-ID: CAM-w4HOtHev7X7qra6U0ewx8thdeeBrucct7pmjONdAgvzNyqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27 Nov 2014 09:09, "Jakob Egger" <jakob(at)eggerapps(dot)at> wrote:
>
> ICU offers a lot more collations than the OS. For example, besides
"de_CH" it also offers "de_CH(at)collation=phonebook". Adding support for
these is a bit more involved.
>
> * initdb would need to be extended to also look for collations offered by
ICU and add them to the pg_collation catalog.

Hm. Actually the pg_collation catalog might give a handy way out for the
issue of being inconsistent with the system collation. We could support
both sets of collations and let the user select an ICU collation or system
collation at runtime.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>, Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Geoff Montee <geoff(dot)montee(at)gmail(dot)com>, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-27 15:03:29
Message-ID: 3921.1417100609@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> Hm. Actually the pg_collation catalog might give a handy way out for the
> issue of being inconsistent with the system collation. We could support
> both sets of collations and let the user select an ICU collation or system
> collation at runtime.

+1 ... this seems like a nice end-run around the backwards compatibility
problem.

Another issue is that (AFAIK) ICU doesn't support any non-Unicode
encodings, which means that a build supporting *only* ICU collations is a
nonstarter IMO. So we really need a way to deal with both system and ICU
collations, and treating the latter as a separate subset of pg_collation
seems like a decent way to do that. (ISTR some discussion about forcibly
converting strings in other encodings to Unicode to compare them, but
I sure don't want to do that. I think it'd be saner just to mark the
ICU collations as only compatible with UTF8 database encoding.)

regards, tom lane

PS: I've removed pgsql-packagers from the cc, this thread is no
longer relevant to them.


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Jakob Egger <jakob(at)eggerapps(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>, Palle Girgensohn <girgen(at)pingpong(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Geoff Montee <geoff(dot)montee(at)gmail(dot)com>, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-27 23:24:40
Message-ID: CAM3SWZSwzAPmjKncxpTnaxUcL0Q9KcEq2A7WD8oRVGLMWpjxHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 27, 2014 at 7:03 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> +1 ... this seems like a nice end-run around the backwards compatibility
> problem.
>
> Another issue is that (AFAIK) ICU doesn't support any non-Unicode
> encodings, which means that a build supporting *only* ICU collations is a
> nonstarter IMO. So we really need a way to deal with both system and ICU
> collations, and treating the latter as a separate subset of pg_collation
> seems like a decent way to do that. (ISTR some discussion about forcibly
> converting strings in other encodings to Unicode to compare them, but
> I sure don't want to do that. I think it'd be saner just to mark the
> ICU collations as only compatible with UTF8 database encoding.)

I would like to see ICU become the defacto standard set of collations,
with support for *versioning*, in the same way that UTF-8 might be
considered the defacto standard encoding.

It seems likely that we'll want to store sort keys (strxfrm() blobs)
in indexes at some point in the future. I now believe that that's more
problematic than just using strcoll() in B-Tree support function 1.
Although that isn't the most compelling reason to pursue ICU support.
--
Peter Geoghegan


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: stark(at)mit(dot)edu, jakob(at)eggerapps(dot)at, pgsql-hackers(at)postgresql(dot)org, tobias(dot)bussmann(at)scnat(dot)ch, girgen(at)pingpong(dot)net, magnus(at)hagander(dot)net, geoff(dot)montee(at)gmail(dot)com, dpage(at)postgresql(dot)org
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-27 23:49:14
Message-ID: 20141128.084914.360019478169807346.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Another issue is that (AFAIK) ICU doesn't support any non-Unicode
> encodings, which means that a build supporting *only* ICU collations is a
> nonstarter IMO. So we really need a way to deal with both system and ICU
> collations, and treating the latter as a separate subset of pg_collation
> seems like a decent way to do that. (ISTR some discussion about forcibly
> converting strings in other encodings to Unicode to compare them, but
> I sure don't want to do that. I think it'd be saner just to mark the
> ICU collations as only compatible with UTF8 database encoding.)

+1. Forcing only Unicode collation is totally unacceptable.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Jakob Egger <jakob(at)eggerapps(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>, Magnus Hagander <magnus(at)hagander(dot)net>, Geoff Montee <geoff(dot)montee(at)gmail(dot)com>, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2014-11-28 12:48:05
Message-ID: 7B0A6131-901C-402C-B3D2-D74D8B154BF2@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 27 nov 2014 kl. 16:03 skrev Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>
> Another issue is that (AFAIK) ICU doesn't support any non-Unicode
> encodings, which means that a build supporting *only* ICU collations is a
> nonstarter IMO.

The patch I originally wrote replaces strwcoll but for keeps the original behaviour for 8-bit charsets' encodings.


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, Geoff Montee <geoff(dot)montee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2015-04-19 11:46:28
Message-ID: 0ECFF0FA-2D9C-46D4-BEF8-34C7A5215FDC@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 27 nov 2014 kl. 10:15 skrev Dave Page <dpage(at)pgadmin(dot)org>:
>
>
>
> On Thu, Nov 27, 2014 at 9:09 AM, Jakob Egger <jakob(at)eggerapps(dot)at> wrote:
> Am 26.11.2014 um 17:46 schrieb Geoff Montee <geoff(dot)montee(at)gmail(dot)com>:
> > This topic reminds me of a thread from a couple months ago:
> >
> > http://www.postgresql.org/message-id/F8268DB6-B50F-429F-8289-DA8FFA5F22BA@tripadvisor.com
> >
> > It sounds like adding ICU support to core may also allow for adding
> > collation versioning to indexes.
>
> Reading through this thread it becomes clear to me that adding support for ICU is more important than I thought, and the only problem is that no one has yet volunteered for it :)
>
> I've started looking through the PostgreSQL source and Palle's patch to estimate what needs to be done.
>
> MINIMUM TODO
> ============
>
> * Add support for per-column collations in varstr_comp() in varlena.c. Currently the patch creates a single ICU collator for the default collation and stores it in a static variable. We would need to change this to create collators for each collation and store them in a hash table similar to pg_newlocale_from_collation() / lookup_collation_cache()
>
> * There's a new feature in trunk for faster sorting using SortSupport, so we would also need to also patch bttextfastcmp_locale() in varlena.c
>
> These two changes would allow using ICU for collation. This has two major advantages:
> 1) Systems with broken strcoll like OS X and FreeBSD can take advantage of ICU to offer proper text sorting
> 2) You can link with a specific version of ICU to avoid index corruption and duplicate keys caused by changing implementations of the glibc strcoll function
>
>
> NEXT STEPS: Support for more collations
> =======================================
>
> ICU offers a lot more collations than the OS. For example, besides "de_CH" it also offers "de_CH(at)collation=phonebook". Adding support for these is a bit more involved.
>
> * initdb would need to be extended to also look for collations offered by ICU and add them to the pg_collation catalog.
>
> * A special case for LC_COLLATE must be added to check_locale() in the backend, get_canonical_locale_name() in pg_upgrade, check_locale_name() in initdb to support collations provided by ICU
>
> * pg_perm_setlocale() must get a special case to handle ICU collations
>
> * the local handling code in pgperl must be modified (when using a ICU collation as default collation, we must decide what collation to send to perl)
>
> * convert_string_datum() in selfuncs.c could be patched to use ICU instead of strxfrm. However, as far as I understand, this is not absolutely required as this is only used by the query planner and would in the worst case prevent some optimisation in corner cases
>
> These changes would probably have an even bigger impact, because then people would no longer be limited to the collations supported by the locales installed on their OS.
>
> NEXT STEPS: Collation versioning in indices
> ===========================================
>
> Since ICU provides reliable versioning of collations, this would allow us to finally prevent index corruption caused by changing implementations of strcoll. I haven't looked at this in detail, but I assume that this would be a small change with potentially big impact.
>
> Ideally, PostgreSQL would detect when the collation is a different version than the one used to create the index, and stop using the index until it is rebuilt.
>
>
> I'll take a shot at the MINIMUM TODO as outlined above.
>
>
> We've already included ICU support in our Postgres Plus Advanced Server product. Before you spend too much time on this, give me a few days to see if we can get that change contributed back. The people I need to speak to are OOO for Thanksgiving at the moment though, so it may be a few days.
>
> --

Hi,

Just poking this old thread again. What happened here, is anyone putting work into this area at the moment?

Palle


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>
Cc: Jakob Egger <jakob(at)eggerapps(dot)at>, Geoff Montee <geoff(dot)montee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-packagers(at)postgresql(dot)org" <pgsql-packagers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bussmann Tobias <tobias(dot)bussmann(at)scnat(dot)ch>
Subject: Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Date: 2015-04-20 19:43:36
Message-ID: 553556E8.6030702@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/19/15 7:46 AM, Palle Girgensohn wrote:
> Just poking this old thread again. What happened here, is anyone putting work into this area at the moment?

I plan to look at it for 9.6.