Re: ECPG FETCH readahead

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Michael Meskes <meskes(at)postgresql(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: ECPG FETCH readahead
Date: 2012-03-06 06:07:41
Message-ID: 4F55A9AD.6050600@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2012-03-05 19:56 keltezéssel, Noah Misch írta:
>
> Having pondered the matter further, I now agree with Michael that the feature
> should stay disabled by default. See my response to him for rationale.
> Assuming that conclusion holds, we can recommended a higher value to users who
> enable the feature at all. Your former proposal of 256 seems fine.

OK.

>
>> BTW, the default disabled behaviour was to avoid "make check" breakage,
>> see below.
>>
>>> I would not offer
>>> an ecpg-time option to disable the feature per se. Instead, let the user set
>>> the default chunk size at ecpg time. A setting of 1 effectively disables the
>>> feature, though one could later re-enable it with ECPGFETCHSZ.
>> This means all code previously going through ECPGdo() would go through
>> ECPGopen()/ECPGfetch()/ECPGclose(). This is more intrusive and all
>> regression tests that were only testing certain features would also
>> test the readahead feature, too.
> It's a good sort of intrusiveness, reducing the likelihood of introducing bugs
> basically unrelated to readahead that happen to afflict only ECPGdo() or only
> the cursor.c interfaces. Let's indeed not have any preexisting test cases use
> readahead per se, but having them use the cursor.c interfaces anyway will
> build confidence in the new code. The churn in expected debug output isn't
> ideal, but I don't prefer the alternative of segmenting the implementation for
> the sake of the test cases.

I see.

>
>> Also, the test for WHERE CURRENT OF at ecpg time would have to be done
>> at runtime, possibly making previously working code fail if ECPGFETCHSZ is enabled.
> Good point.
>
>> How about still allowing "NO READAHEAD" cursors that compile into plain ECPGdo()?
>> This way, ECPGFETCHSZ don't interfere with WHERE CURRENT OF. But this would
>> mean code changes everywhere where WHERE CURRENT OF is used.
> ECPGFETCHSZ should only affect cursors that make no explicit mention of
> READAHEAD. I'm not sure whether that should mean actually routing READHEAD 1
> cursors through ECPGdo() or simply making sure that cursor.c achieves the same
> outcome; see later for a possible reason to still do the latter.
>
>> Or how about a new feature in the backend, so ECPG can do
>> UPDATE/DELETE ... WHERE OFFSET N OF cursor
>> and the offset of computed from the actual cursor position and the position known
>> by the application? This way an app can do readahead and do work on rows collected
>> by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
>> behind the scenes.
> That's a neat idea, but I would expect obstacles threatening our ability to
> use it automatically for readahead. You would have to make the cursor a
> SCROLL cursor. We'll often pass a negative offset, making the operation fail
> if the cursor query used FOR UPDATE. Volatile functions in the query will get
> more calls. That's assuming the operation will map internally to something
> like MOVE N; UPDATE ... WHERE CURRENT OF; MOVE -N. You might come up with
> innovations to mitigate those obstacles, but those innovations would probably
> also apply to MOVE/FETCH. In any event, this would constitute a substantive
> patch in its own right.

I was thinking along the lines of a Portal keeping the ItemPointerData
for each tuple in the last FETCH statement. The WHERE OFFSET N OF cursor
would treat the offset value relative to the tuple order returned by FETCH.
So, OFFSET 0 OF == CURRENT OF and other values of N are negative.
This way, it doesn't matter if the cursor is SCROLL, NO SCROLL or have
the default behaviour with "SCROLL in some cases". Then ECPGopen()
doesn't have to play games with the DECLARE statement. Only ECPGfetch()
needs to play with MOVE statements, passing different offsets to the backend,
not what the application passed.

> One way out of trouble here is to make WHERE CURRENT OF imply READHEAD
> 1/READHEAD 0 (incidentally, perhaps those two should be synonyms) on the
> affected cursor. If the cursor has some other readahead quantity declared
> explicitly, throw an error during preprocessing.

I played with this idea a while ago, from a different point of view.
If the ECPG code had the DECLARE mycur, DML ... WHERE CURRENT OF mycur
and OPEN mycur in exactly this order, i.e. WHERE CURRENT OF appears in
a standalone function between DECLARE and the first OPEN for the cursor,
then ECPG disabled readahead automatically for that cursor and for that
cursor only. But this requires effort on the user of ECPG and can be very
fragile. Code cleanup with reordering functions can break previously
working code.

> Failing a reasonable resolution, I'm prepared to withdraw my suggestion of
> making ECPGFETCHSZ always-usable. It's nice to have, not critical.
>
>>>> +bool
>>>> +ECPGopen(const int lineno, const int compat, const int force_indicator,
>>>> + const char *connection_name, const bool questionmarks,
>>>> + const char *curname, const int st, const char *query, ...)
>>>> +{
>>>> + va_list args;
>>>> + bool ret, scrollable;
>>>> + char *new_query, *ptr, *whold, *noscroll, *scroll, *dollar0;
>>>> + struct sqlca_t *sqlca = ECPGget_sqlca();
>>>> +
>>>> + if (!query)
>>>> + {
>>>> + ecpg_raise(lineno, ECPG_EMPTY, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
>>>> + return false;
>>>> + }
>>>> + ptr = strstr(query, "for ");
>>>> + if (!ptr)
>>>> + {
>>>> + ecpg_raise(lineno, ECPG_INVALID_STMT, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL);
>>>> + return false;
>>>> + }
>>>> + whold = strstr(query, "with hold ");
>>>> + dollar0 = strstr(query, "$0");
>>>> +
>>>> + noscroll = strstr(query, "no scroll ");
>>>> + scroll = strstr(query, "scroll ");
>>> A query like 'SELECT 1 AS "with hold "' fools these lexical tests.
>> But SELECT 1 AS "with hold" doesn't go through ECPGopen(), it's run by ECPGdo()
>> so no breakage there. ecpglib functions are not intended to be called from manually
>> constructed C code.
> I tried something like
> EXEC SQL DECLARE cur CURSOR FOR SELECT * FROM generate_series(1 , $1) AS t("with hold ");
> It wrongly generated this backend command:
> declare cur no scroll cursor with hold for select * from generate_series ( 1 , $1 ) as t ( "with hold " )

Ah, ok. The grammar test in ecpg is better.

>>> An unusable ECPGFETCHSZ should procedure an error, not
>>> silently give no effect.
>> Point taken. Which error handling do imagine? abort() or simply returning false
>> and raise and error in SQLCA?
> The latter.
>
>>>> + /*
>>>> + * If statement went OK, add the cursor and discover the
>>>> + * number of rows in the recordset. This will slow down OPEN
>>>> + * but we gain a lot with caching.
>>>> + */
>>>> + if (ret /* && sqlca->sqlerrd[2] == 0 */)
>>> Why is the commented code there?
>> Some leftover from testing, it shouldn't be there.

Actually, now I remember. It was a wishful thinking on my part
that whenever PG supports returning a number of rows at opening
the cursor, correct or estimated, this code shouldn't be executed,
just accept what the backend has given us.

>>
>>>> + {
>>>> + struct connection *con = ecpg_get_connection(connection_name);
>>>> + struct cursor_descriptor *cur;
>>>> + bool existing;
>>>> + int64 n_tuples;
>>>> +
>>>> + if (scrollable)
>>>> + {
>>>> + PGresult *res;
>>>> + char *query;
>>>> + char *endptr = NULL;
>>>> +
>>>> + query = ecpg_alloc(strlen(curname) + strlen("move all in ") + 2, lineno);
>>>> + sprintf(query, "move all in %s", curname);
>>>> + res = PQexec(con->connection, query);
>>>> + n_tuples = strtoull(PQcmdTuples(res), &endptr, 10);
>>>> + PQclear(res);
>>>> + ecpg_free(query);
>>>> +
>>>> + /* Go back to the beginning of the resultset. */
>>>> + query = ecpg_alloc(strlen(curname) + strlen("move absolute 0 in ") + 2, lineno);
>>>> + sprintf(query, "move absolute 0 in %s", curname);
>>>> + res = PQexec(con->connection, query);
>>>> + PQclear(res);
>>>> + ecpg_free(query);
>>>> + }
>>>> + else
>>>> + {
>>>> + n_tuples = 0;
>>>> + }
>>> You give this rationale for the above code:
>>>
>>> On Thu, Jun 17, 2010 at 02:09:47PM +0200, Boszormenyi Zoltan wrote:
>>>> ECPGopen() also discovers the total number of records in the recordset,
>>>> so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
>>>> didn't report the (possibly estimated) number of rows in the resultset
>>>> is now
>>>> overcome. This slows down OPEN for cursors serving larger datasets
>>>> but it makes possible to position the readahead window using MOVE
>>>> ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
>>>> variants are used by the application. And the caching is more than
>>>> overweighs
>>>> the slowdown in OPEN it seems.
>>> From the documentation for Informix and Oracle, those databases do not
>>> populate sqlerrd[2] this way:
>>> http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlt.doc/sqltmst189.htm
>>> http://docs.oracle.com/cd/A57673_01/DOC/api/doc/PC_22/ch10.htm#toc139
>> The problem here is that Informix in the field in fact returns the number of rows
>> in the cursor and the customer we developed this readahead code for relied on this.
>> Maybe this was eliminated in newer versions of Informix to make it faster.
>>
>>> The performance impact will vary widely depending on the query cost per row
>>> and the fraction of rows the application will actually retrieve. Consider a
>>> complex aggregate returning only a handful of rows.
>> Indeed.
>>
>>> Consider SELECT * on a
>>> 1B-row table with the application ceasing reads after 1000 rows. Performance
>>> aside, this will yield double execution of any volatile functions involved.
>>> So, I think we ought to diligently avoid this step. (Failing that, the
>>> documentation must warn about the extra full cursor scan and this feature must
>>> stay disabled by default.)
>> OK, how about enabling it for Informix-compat mode only, or only via an
>> environment variable? I agree it should be documented.
> For a query where backend execution cost dominates the cost of transferring
> rows to the client, does Informix take roughly twice the normal time to
> execute the query via an ESQL/C cursor? Is that acceptable overhead for every
> "ecpg -C" user? (FWIW, I've never used Informix-compat mode.) If not, the
> feature deserves its own option.
>
> Whatever the trigger condition, shouldn't this apply independent of whether
> readahead is in use for a given cursor?

I guess so.

> (This could constitute a reason to
> use the cursor.c interfaces for every cursor.)

Indeed.

> Does some vendor-neutral standard define semantics for sqlerrd, or has it
> propagated by imitation?

No idea.

> This reminds me to mention: your documentation should note that the use of
> readahead or the option that enables sqlerrd[2] calculation may change the
> outcome of queries calling volatile functions. See how the DECLARE
> documentation page discusses this hazard for SCROLL/WITH HOLD cursors.

OK.

>
>>>> --- a/src/interfaces/ecpg/ecpglib/extern.h
>>>> +++ b/src/interfaces/ecpg/ecpglib/extern.h
>>>> @@ -60,6 +60,12 @@ struct statement
>>>> bool questionmarks;
>>>> struct variable *inlist;
>>>> struct variable *outlist;
>>>> + char *oldlocale;
>>>> + const char **dollarzero;
>>>> + int ndollarzero;
>>>> + const char **param_values;
>>>> + int nparams;
>>>> + PGresult *results;
>>>> };
>>> Please comment the members of this struct like we do in most of src/include.
>> OK.
>>
>>> dollarzero has something to do with dynamic cursor names, right? Does it have
>>> other roles?
>> Yes, it had other roles. ECPG supports user variables in cases where the
>> PostgreSQL grammar doesn't. There's this rule:
>>
>> ECPG: var_valueNumericOnly addon
>> if ($1[0] == '$')
>> {
>> free($1);
>> $1 = mm_strdup("$0");
>> }
>>
>> The "var_value: NumericOnly" case in gram.y can show up in a lot of cases.
>> This feature was there before the dynamic cursor. You can even use them together
>> which means more than one $0 placeholders in the statement. E.g.:
>> FETCH :amount FROM :curname;
>> gets translated to
>> FETCH $0 FROM $0;
>> by ecpg, and both the amount and the cursor name is passed in in user variables.
>> The value is needed by cursor.c, this is why this "dollarzero" pointer is needed.
> Thanks for that explanation; the situation is clearer to me now.
>
> nm
>

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-03-06 06:43:06 Re: review: CHECK FUNCTION statement
Previous Message Tom Lane 2012-03-06 04:38:33 Re: Dropping PL language retains support functions