Quick Links

Re: Support UTF-8 files with BOM in COPY FROM

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support UTF-8 files with BOM in COPY FROM
Date:	2011-09-26 12:06:10
Message-ID:	4E806AB2.5090200@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 09/26/2011 07:12 AM, Magnus Hagander wrote:
> On Mon, Sep 26, 2011 at 06:58, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> Hi,
>>
>> I'd like to support UTF-8 text or csv files that has BOM (byte order mark)
>> in COPY FROM command. BOM will be automatically detected and ignored
>> if the file encoding is UTF-8. WIP patch attached.
>>
>> I'm thinking about only COPY FROM for reads, but if someone wants to add
>> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.
>>
>> Comments welcome.
> I like it in general. But if we're looking at the BOM, shouldn't we
> also look and *reject* the file if it's a BOM for a non-UTF8 file? Say
> if the BOM claims it's UTF16?
>

It should be rejected as invalidly encoded anyway, as a non-utf8 BOM is
not valid utf-8. We shouldn't check in non-unicode cases where the
sequence might be valid in those encodings (e.g. ISO-8859-1).

cheers

andrew

In response to

Re: Support UTF-8 files with BOM in COPY FROM at 2011-09-26 11:12:42 from Magnus Hagander

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2011-09-26 12:12:41	Re: Online base backup from the hot-standby
Previous Message	Magnus Hagander	2011-09-26 11:47:54	Re: Support UTF-8 files with BOM in COPY FROM