Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: ToDo: allow to get a number of processed rows by COPY statement [Review of Patch]
Date: 2012-09-27 20:58:23
Message-ID: CAFj8pRBD8bczZfGVam_V+36YCP_iMuX4Wa=G++gcZAdDmX4DGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

I reduced this patch as just plpgsql (SPI) problem solution.

Correct generic solution needs a discussion about enhancing our libpq
interface - we can take a number of returned rows, but we should to
get number of processed rows too. A users should to parse this number
from completationTag, but this problem is not too hot and usually is
not blocker, because anybody can process completationTag usually.

So this issue is primary for PL/pgSQL, where this information is not
possible. There is precedent - CREATE AS SELECT (thanks for sample
:)), and COPY should to have same impact on ROW_COUNT like this
statement (last patch implements it).

Regards

Pavel

2012/9/21 Amit Kapila <amit(dot)kapila(at)huawei(dot)com>:
> On Friday, September 21, 2012 1:23 AM Pavel Stehule wrote:
>
>>> Basic stuff:
>>> ------------
>>> - Patch applies OK. but offset difference in line numbers.
>>> - Compiles with errors in contrib [pg_stat_statements, sepgsql] modules
>>> - Regression failed; one test-case in COPY due to incomplete test-case
>>> attached patch. – same as reported by Heikki
>>
>>fixed patch is in attachment
>
> After modifications:
> ---------------------------
> - Patch applies OK
> - Compiles cleanly without any errors/warnings
> - Regression tests pass.
>
>>>
>>> What it does:
>>> --------------
>>> Modification to get the number of processed rows evaluated via SPI. The
>>> changes are to add extra parameter in ProcessUtility to get the number of
>>> rows processed by COPY command.
>>>
>>> Code Review Comments:
>>> ---------------------
>>> 1. New parameter is added to ProcessUtility_hook_type function
>>> but the functions which get assigned to these functions like
>>> sepgsql_utility_command, pgss_ProcessUtility, prototype & definition is
>>> not modified.
>
> Functionality is not fixed correctly for hook functions, In function pgss_ProcessUtility
> for bellow snippet of code processed parameter is passed NULL, as well as not initialized.
> because of this when "pg_stat_statements" extention is utilized COPY command is giving garbage values.
> if (prev_ProcessUtility)
> prev_ProcessUtility(parsetree, queryString, params,
> dest, completionTag, context, NULL);
> else
> standard_ProcessUtility(parsetree, queryString, params,
> dest, completionTag, context, NULL);
>
> Testcase is attached.
> In this testcase table has only 1000 records but it show garbage value.
> postgres=# show shared_preload_libraries ;
> shared_preload_libraries
> --------------------------
> pg_stat_statements
> (1 row)
> postgres=# CREATE TABLE tbl (a int);
> CREATE TABLE
> postgres=# INSERT INTO tbl VALUES(generate_series(1,1000));
> INSERT 0 1000
> postgres=# do $$
> declare r int;
> begin
> copy tbl to '/home/kiran/copytest.csv' csv;
> get diagnostics r = row_count;
> raise notice 'exported % rows', r;
> truncate tbl;
> copy tbl from '/home/kiran/copytest.csv' csv;
> get diagnostics r = row_count;
> raise notice 'imported % rows', r;
> end;
> $$ language plpgsql;
> postgres$#
> NOTICE: exported 13281616 rows
> NOTICE: imported 13281616 rows
> DO
>
>>>
>>> 2. Why to add the new parameter if completionTag hold the number of
>>> processed tuple information; can be extracted
>>>
>>> from it as follows:
>>> _SPI_current->processed = strtoul(completionTag + 7,
>>> NULL, 10);
>>
>>this is basic question. I prefer a natural type for counter - uint64
>>instead text. And there are no simply way to get offset (7 in this
>>case)
>
> I agree with your point, but currently in few other places we are parsing the completion tag for getting number of tuples processed. So may be in future we can change those places as well. For example

yes, this step can be done in future - together with libpq enhancing.
Is not adequate change API (and break lot of extensions) for this
isolated problem. So I propose fix plpgsql issue, and add to ToDo -
"enhance libpq to return a number of processed rows as number" - but
this change breaking compatibility.

Pavel

>
> pgss_ProcessUtility
> {
> ..
>
> /* parse command tag to retrieve the number of affected rows. */
> if (completionTag &&
> sscanf(completionTag, "COPY " UINT64_FORMAT, &rows) != 1)
> rows = 0;
>
> }
>
> _SPI_execute_plan
> {
> ..
> ..
> if (IsA(stmt, CreateTableAsStmt))
> {
> Assert(strncmp(completionTag, "SELECT ", 7) == 0);
> _SPI_current->processed = strtoul(completionTag + 7,
> NULL, 10);
>
> ..
> }
>
>
> With Regards,
> Amit Kapila.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Attachment Content-Type Size
copy-processed-rows-simple.patch application/octet-stream 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-09-27 22:01:12 Re: Patch: incorrect array offset in backend replication tar header
Previous Message Alvaro Herrera 2012-09-27 20:38:04 Re: ALTER command reworks