Re: COPY enhancements

From: Emmanuel Cecchet <manu(at)asterdata(dot)com>
To: Selena Deckelmann <selenamarie(at)gmail(dot)com>
Cc: Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-05 15:30:48
Message-ID: 4ACA1128.9000903@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Selena,

> This is my first pass at the error logging portion of this patch. I'm
> going to take a break and try to go through the partitioning logic as
> well later this afternoon.
>
> caveat: I'm not familiar with most of the code paths that are being
> touched by this patch.
>
> Overall:
> * I noticed '\see' included in the comments in your code. These should
> be removed.
>
Should we avoid any doxygen tags in general in the Postgres code?
> * The regression is failling, as Jeff indicated, and I didn't figure
> out why yet either. Hopefully will have a look closer this afternoon.
>
Let me know if the response I sent to Jeff works for you.
> Comments:
> * copy.c: Better formatting, maybe rewording needed for comment
> starting on line 1990.
>
Some of the bad formatting has been left on purpose to minimize the size
of the patch otherwise there would have been many tab/white spaces
changes due to the indentation in the PG_TRY blocks. I suggest that
whoever is going to commit the code runs pg_indent or I can fix the
formatting once the reviewing is done.
> ** Maybe say: "Check that errorData->sqlerrcode only logged tuples
> that are malformed. This ensures that we let other errors pass
> through."
>
Ok.
> * copy.c: line: 2668 -> need to fix that comment :) (/* Regular code
> */) -- this needs to be more descriptive of what is actually
> happening. We fell through the partitioning logic, right, and back to
> the default behavior?
>
Yes.
> * copy.c: line 2881, maybe say instead: "Mark that we have not read a
> line yet. This is necessary so that we can perform error logging on
> complete lines only."
>
Ok.
> Formatting:
> * copy.c: whitespace is maybe a little off at line: 2004-2009
> * NEW FILES: src/backend/utils/misc/errorlogging.c & errorlogging.h need headers
>
I was not sure what is auto-generated by SVN commit scripts. I'd be
happy to add headers. Is there a specification somewhere or should I
just copy/paste from another file?
> Code:
> * copy.c: line 1990 -> cur_lineno_str[11] & related: why is this
> conversion necessary? (sorry if that is a dumb question)
>
This conversion is necessary if you log in the error table the index of
the row that is causing the error (this is what is logged by default in
ERROR_LOGGING_KEY).
> * copy.c: line 2660 -> what if we are error logging for copy? Does
> this get logged properly?
>
Yes, this is in a try/catch block that performs error logging.
> * errorlogging.c: Noticed the comment at 503 -> 'note that we always
> enable wal logging'.. Thinking this through - the reasoning is that
> you won't be rolling back the error logging no matter what, right?
>
I think that this was the original idea but we should probably rollback
the error logging if the command has been rolled back. It might be more
consistent to use the same hi_options as the copy command. Any idea what
would be best?
> * src/include/commands/copy.c and copy.h: struct CopyStateData was
> moved from copy.c to copy.h; this made sense to me, just noting. That
> move made it a little tricky to find the changes that were made. There
> were 10 items added.
>
Yes, patch can be a little bit lost when you move a big data structure
like this one.
> ** super nit pick: 'readlineOk' uses camel-case instead of underscores
> like the rest of the new variables
>
Yes, queryDesc also has camel-case. I will fix readlineOk.
> * errorlogging.c: could move currentCommand pg_stat call in Init
> routine into MalformedTupleIs, or maybe move the setting of that
> parameter into the Init routine for consistency's sake.
>
The advantage of calling pg_stat in InitializeErrorLogging is that it
is evaluated only once for the entire copy command (and it's not going
to change during the execution of the command). I am not sure to
understand what your second suggestion is since currentCommand is set
and initialized in Init.
> Documentation:
> * doc/src/sgml/ref/copy.sgml: line 313: 'failif' needs a space
>
ok
> ** Also: The error table log examples have relations shown in a
> different order than the actual CREATE TABLE declaration in the code.
>
The order of the columns does not really matter but for consistency sake
we can put the same order.
> * src/test/regress/sql/copyselect.sql: Format of options changed..
> added parenthesis. Is this change documented elsewhere? (sorry if I
> just missed this in the rest of the thread/patch)
>
Yes this has already been committed by Tom. The new format of options
has been introduced just before this patch.

I am attaching a revised version of the patch.
Emmanuel

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

Attachment Content-Type Size
aster-copy-newsyntax-patch-8.5v6context.txt.gz application/x-gzip 35.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Colish 2009-10-05 15:37:15 Re: Rules: A Modest Proposal
Previous Message Alvaro Herrera 2009-10-05 15:28:13 Re: Rules: A Modest Proposal