Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Damir <dam(dot)bel07(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-11-08 19:34:51
Message-ID: B082C6BE-D2F2-4DD4-8649-E81E24A6840E@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 8 Nov 2023, at 19:18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I think an actually usable feature of this sort would involve
> copying all the failed lines to some alternate output medium,
> perhaps a second table with a TEXT column to receive the original
> data line. (Or maybe an array of text that could receive the
> broken-down field values?) Maybe we could dump the message info,
> line number, field name etc into additional columns.

I agree that the errors should be easily visible to the user in some way. The
feature is for sure interesting, especially in data warehouse type jobs where
dirty data is often ingested.

As a data point, Greenplum has this feature with additional SQL syntax to
control it:

COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS;

LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT
LIMIT xyz ROWS sets the limit of how many rows can be faulty before the
operation errors out. I'm not at all advocating that we should mimic this,
just wanted to add a reference to postgres derivative where this has been
implemented.

--
Daniel Gustafsson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-08 19:52:16 Re: XID-wraparound hazards in LISTEN/NOTIFY
Previous Message Tomas Vondra 2023-11-08 19:16:05 Re: XX000: tuple concurrently deleted during DROP STATISTICS