New ECPG idea, was: Re: ECPG FETCH readahead

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Noah Misch <noah(at)leadboat(dot)com>, Michael Meskes <meskes(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Hans-Jürgen Schönig <hs(at)cybertec(dot)at>
Subject: New ECPG idea, was: Re: ECPG FETCH readahead
Date: 2013-08-17 11:02:29
Message-ID: 520F5845.7060104@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2013-08-17 12:08 keltezéssel, Boszormenyi Zoltan írta:
> Hi,
>
> I am restarting this old thread... :-)
>
> 2012-04-24 10:17 keltezéssel, Michael Meskes írta:
>>> OK, I will implement #2. Another question popped up: what to do
>>> with FETCH ALL? The current readahead window size or temporarily
>>> bumping it to say some tens of thousands can be used. We may not
>>> know how much is the "all records". This, although lowers performance,
>>> saves memory.
>> I would say doing a large fetch in two or three batches won't cost too much in
>> terms of performance.
>>
>>> Please, don't apply this patch yet. I discovered a rather big hole
>>> that can confuse the cursor position tracking if you do this:
>>> ...
>>> That will also need a new round of review. Sorry for that.
>> No problem, better to find it now instead of after release.
>>
>> Anyway, I moved the patch to 2012-next (I hope I did it correctly) so 2012-1
>> can be closed. Let's try to get this patch done in the next commit fest.
>>
>> Michael
>
> I had time to look into this patch of mine again after about 1.5 years.
> Frankly, this time was too long to remember every detail of the patch
> and looking at parts of the patch as a big entity was confusing.
>
> So I started fresh and to make review easier, I broke the patch up
> into small pieces that all build on each other. I have also fixed quite
> a few bugs, mostly in my code, but some in the ECPG parser and
> the regression tests as well.
>
> I have put the broken up patchset into a GIT tree of mine at GitHub:
> https://github.com/zboszor/ecpg-readahead/
> but the huge compressed patch is also attached for reference.
> It was generated with
>
> $ git diff
> 221e92f64c6e136e550ec2592aac3ae0d4623209..870922676e6ae0faa4ebbf94b92e0b97ec418e16
>
> ECPG regression tests are now Valgrind-clean except two of them
> but both are pre-existing bugs.
>
> 1. ecpg/test/compat_informix/rfmtlong.pgc points out a problem in
> ecpg/compatlib/informix.c
>
> ==5036== 1 errors in context 1 of 4:
> ==5036== Invalid read of size 4
> ==5036== at 0x4E3453C: rfmtlong (informix.c:941)
> ==5036== by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22)
> ==5036== by 0x4006BE: main (rfmtlong.pgc:45)
> ==5036== Address 0x60677d8 is 24 bytes inside a block of size 25 alloc'd
> ==5036== at 0x4C28409: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5036== by 0x4E34268: rfmtlong (informix.c:783)
> ==5036== by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22)
> ==5036== by 0x4006BE: main (rfmtlong.pgc:45)
>
> The same error is reported 4 times.
>
> 2. ecpg_add_mem() seems to leak memory:
>
> ==5463== 256 bytes in 16 blocks are definitely lost in loss record 1 of 1
> ==5463== at 0x4C2A121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5463== by 0x4E3E153: ecpg_alloc (memory.c:21)
> ==5463== by 0x4E3E212: ecpg_add_mem (memory.c:110)
> ==5463== by 0x4E3542B: ecpg_store_result (execute.c:409)
> ==5463== by 0x4E37E5A: ecpg_process_output (execute.c:1777)
> ==5463== by 0x4E38CCA: ecpg_do (execute.c:2137)
> ==5463== by 0x4E38D8A: ECPGdo (execute.c:2159)
> ==5463== by 0x400A82: fn (alloc.pgc:51)
> ==5463== by 0x5152C52: start_thread (pthread_create.c:308)
> ==5463== by 0x545C13C: clone (clone.S:113)
>
> The last two issue we talked about in this thread are also implemented:
> - permanently raise the readahead window if the application sends a
> bigger FETCH command, and
> - temporarily raise the readahead window for FETCH ALL commands
>
> The cursor position tracking was completely rewritten, so the client side
> properly follows the cursor position known by the backend and doesn't
> skip MOVE statements where it shouldn't. The previously known
> bug is completely eliminated this way.
>
> Please, review that patch.

I have another idea to make ECPG building on this huge patch.

Currently, UPDATE/DELETE WHERE CURRENT OF has to issue a MOVE
before the command in case the cursor positions known by the application
and the backend are different.

My idea builds on the fact that UPDATE/DELETE RETURNING is present
in all supported back branches.

A mini-parser only understanding SELECT, UPDATE and DELETE should
be added to ecpglib, so

DECLARE cursor CURSOR FOR SELECT ...

and

PREPARE prepared_stmt FROM :query;
DECLARE cursor CURSOR FOR prepared_stmt;

can be analyzed and tweaked behind the application's back.

This is needed to detect whether a query is a simple updatable
scan of a table, and returning errors early to the application if it's not,
without actually sending the UPDATE/DELETE WHERE CURRENT OF
query to the backend.

For the purpose of WHERE CURRENT OF, I would add a ctid
column at the end of the targelist that is treated like "resjunk"
in the backend when returning data to the application.

So, SELECTs would return the ctid information of the tuples.
The cursor query was a FETCH N with abs(N)>1 because of
the readahead. For this reason, the cursor positions known
by the application and the backend are different.

The extra MOVE can be eliminated by replacing

UPDATE table SET ... WHERE CURRENT OF cursor;

with

UPDATE table SET ... WHERE ctid='...' RETURNING ctid;

Of course, if the original query contained RETURNING, the
ctid field would be appended in resjunk fashion.

The new ctid can be saved back into the cache using PQsetvalue()
and the (position ; new ctid) pair into a hash, both can be
looked up in case the application wants to modify the
same tuple again. Some protection is needed against growing
the hash too big. But usually[*] programs don't go back
to modify the same record twice.

[*] This is only an educated guess.

How about it?

Best regards,
Zoltán Böszörményi

>
> Thanks in advance and best regards,
> Zoltán Böszörményi
>
>
>

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Huang Bambo 2013-08-17 11:16:21 Re: [pgsql-zh-general] Chinese in Postgres
Previous Message Boszormenyi Zoltan 2013-08-17 10:08:16 Re: ECPG FETCH readahead