Re: Problem with COPY CSV

Lists: pgsql-hackers
From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Problem with COPY CSV
Date: 2005-12-01 01:41:52
Message-ID: 438E54E0.5010601@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is a small test extract from the USDA nutrient database.

The problem is that the script won't load the COPY data correctly. This
is with CVS HEAD (and 8.1).

It is the 4th column in the table that gives the problem (nutr_no
integer). Each of the 3 COPY rows has a different way of specifying a
number:

* As an unquoted empty value
* As a quoted number
* As a quoted empty value

Now, I'm specifying "NULL AS ''" in the COPY command but it refuses to
recognize the quoted empty value as NULL.

Anyone have any ideas? Is this a bug? The manual even says that using
"NULL AS ''" lets you make no distinction between empty and quoted empty
values.

Chris

Attachment Content-Type Size
test.sql.gz application/x-gzip 276 bytes

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with COPY CSV
Date: 2005-12-01 04:20:52
Message-ID: 438E7A24.9020106@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne wrote:

> Attached is a small test extract from the USDA nutrient database.
>
> The problem is that the script won't load the COPY data correctly.
> This is with CVS HEAD (and 8.1).
>
> It is the 4th column in the table that gives the problem (nutr_no
> integer). Each of the 3 COPY rows has a different way of specifying a
> number:
>
> * As an unquoted empty value
> * As a quoted number
> * As a quoted empty value
>
> Now, I'm specifying "NULL AS ''" in the COPY command but it refuses to
> recognize the quoted empty value as NULL.
>
> Anyone have any ideas? Is this a bug? The manual even says that
> using "NULL AS ''" lets you make no distinction between empty and
> quoted empty values.

It is not a bug. It is working as designed and as documented. The rule
is basically that a null value is never quoted and a quoted value is
never null. The docs say:

"The CSV format has no standard way to distinguish a NULL value from an
empty string. PostgreSQL's COPY handles this by quoting. A NULL is
output as the NULL string and is not quoted, while a data value matching
the NULL string is quoted. Therefore, using the default settings, a NULL
is written as an unquoted empty string, while an empty string is written
with double quotes (""). Reading values follows similar rules. You can
use FORCE NOT NULL to prevent NULL input comparisons for specific columns."

The corollary of this is that you can't make a quoted value be accepted
as null (FORCE NOT NULL does the converse, i.e. it makes an unquoted
value be considered as not null).

The handling of nulls in CSV mode was discussed in almost nauseous
detail at the time we did CSV about 18 months ago, so there should be no
surprise here.

For data as irregular as this I suggest that you import it into a text
field and then update the value of that field to NULL where it's empty -
you could then do ALTER TYPE ... USING ... . The other possibility would
be to preprocess the data.

I at least am not inclined to tinker too much more with CSV mode - we
could end up catering for every weird output format in the world if
we're not careful.

cheers

andrew