Re: cursor interface to libpq

Lists: pgsql-interfaces
From: "Kirby Bohling (TRSi)" <kbohling(at)oasis(dot)novia(dot)net>
To: pgsql-interfaces(at)postgresql(dot)org
Subject: cursor interface to libpq
Date: 2000-09-19 22:59:06
Message-ID: Pine.GSO.4.21.0009191748310.22170-100000@oasis.novia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces


I am trying to run a select statement, and I keep running out of
memory. I have noticed that libpq has the nice feature that I don't have
to request each row one at a time. After a little investigate, I found
that libpq appears to get the entire result set at once. Well, at least
PQgetvalue() doesn't appear to do anything besides return a pointer to a
string. There is no lazy evaluation. It doesn't just fetch the row I
need, and flush old ones as memory permits.

Is there any interface that I can link to a C/C++ program that
will allow me row at a time access to the data? Not being able to run
select statements is going to be a fairly sizeable problem for me.

This might also explain why when I attempt to dump my database, it also
fails, pg_dump seems to be based on libpq. I have shutting down my
database and just backing up the filesystem to be safe. I run in a shop
that hopes to go 24/7 sometime soon, so I need to find an answer RSN.

Thanks,
Kirby


From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: "Kirby Bohling (TRSi)" <kbohling(at)oasis(dot)novia(dot)net>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: cursor interface to libpq
Date: 2000-09-20 05:45:46
Message-ID: 39C84F0A.9E6C5947@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

> I am trying to run a select statement, and I keep running out of
> memory. I have noticed that libpq has the nice feature that I don't have
> to request each row one at a time. After a little investigate, I found
> that libpq appears to get the entire result set at once. Well, at least
> PQgetvalue() doesn't appear to do anything besides return a pointer to a
> string. There is no lazy evaluation. It doesn't just fetch the row I
> need, and flush old ones as memory permits.

Use a cursor at the query level to keep the result set in the backend.
But you will still (possibly) run out of memory, since the *backend*
must keep the result set in memory and/or on disk. Make sure you have
enough swap space, disk space, and main memory to handle the size of
database you have.

> Is there any interface that I can link to a C/C++ program that
> will allow me row at a time access to the data? Not being able to run
> select statements is going to be a fairly sizeable problem for me.

Look at cursors, but that doesn't eliminate the need for enough memory.

Expect to buy more memory, and make sure you have enough swap space.
Send more details on what you have currently and what your database
looks like and we can give you better feedback on your resource
requirements.

- Thomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
Cc: "Kirby Bohling (TRSi)" <kbohling(at)oasis(dot)novia(dot)net>, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: cursor interface to libpq
Date: 2000-09-20 06:16:17
Message-ID: 19106.969430577@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu> writes:
>> I am trying to run a select statement, and I keep running out of
>> memory. I have noticed that libpq has the nice feature that I don't have
>> to request each row one at a time. After a little investigate, I found
>> that libpq appears to get the entire result set at once. Well, at least
>> PQgetvalue() doesn't appear to do anything besides return a pointer to a
>> string. There is no lazy evaluation. It doesn't just fetch the row I
>> need, and flush old ones as memory permits.

> Use a cursor at the query level to keep the result set in the backend.
> But you will still (possibly) run out of memory, since the *backend*
> must keep the result set in memory and/or on disk.

Er, no. DECLARE CURSOR does not cause the backend to buffer the whole
result set; it just stores the state of the executor's plan tree. AFAIK
you can deal with an indefinitely large result set if you use a cursor
and fetch just a limited number of rows at a time.

The bottleneck here is mainly that libpq's API is defined in terms of
providing random access to a result set, no matter how large --- so
libpq has to buffer the whole result set in client memory.

Aside from random access there are also error-reporting issues.
Currently libpq guarantees to tell you about any errors encountered
during a query before you start to read result rows. That guarantee
wouldn't hold in a streaming-results scenario.

These issues have been discussed quite a few times before --- see the
pg-interfaces archives. I think everyone agrees that it'd be a good
idea to have a streamable libpq interface, but no one's stepped up to
the plate to define or implement one...

regards, tom lane


From: "Kirby Bohling (TRSi)" <kbohling(at)oasis(dot)novia(dot)net>
To: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: cursor interface to libpq
Date: 2000-09-20 16:00:47
Message-ID: Pine.GSO.4.21.0009201026100.10741-100000@oasis.novia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces


Thomas,

Right now I have a database that has right around 4 Million
rows in the primary table. I am porting it away from MySQL basically
because I want transactions. When the table is written as insert
statements it is around 3.2Gb, when that is gzipped it comes out to 1.3Gb.

That table size will probably continue to grow at a rate of 200,000
rows a week. I don't know the exact size of each row, but it is under the
8K limit. Right now I am working on a FreeBSD box with 1.5Gb of swap and
256MB of Ram. I believe that I could get that upgraded to 1Gb of ram, and
add as much swap space as I wanted. But to be honest I really don't want
to store every row of every select statement in memory. I believe that
the database is around 1.5Gb on disk almost all of it in the one table.

The whole thought of having to keep the entire table in memory at once to
do a backup is scary. I don't expect anyone at work to appreciate
having to match the disk space of the database in order to do backups. I
will either have to switch RDBM's, or unwrap the pg_dump to use the
lower level primatives of parseInput, getAnotherTuple, or better yet
implement it with the cursors at the query level and then I could get by
until a better solution came along. I can deal with most everything with
the exception of the backups.

Any advice, or expectations of memory requirements would be
appreciated. Sorry, if this is the wrong forum for this dicussion, but my
original question started here.

Thanks,
Kirby

On Wed, 20 Sep 2000, Thomas Lockhart wrote:

> > I am trying to run a select statement, and I keep running out of
> > memory. I have noticed that libpq has the nice feature that I don't have
> > to request each row one at a time. After a little investigate, I found
> > that libpq appears to get the entire result set at once. Well, at least
> > PQgetvalue() doesn't appear to do anything besides return a pointer to a
> > string. There is no lazy evaluation. It doesn't just fetch the row I
> > need, and flush old ones as memory permits.
>
> Use a cursor at the query level to keep the result set in the backend.
> But you will still (possibly) run out of memory, since the *backend*
> must keep the result set in memory and/or on disk. Make sure you have
> enough swap space, disk space, and main memory to handle the size of
> database you have.
>
> > Is there any interface that I can link to a C/C++ program that
> > will allow me row at a time access to the data? Not being able to run
> > select statements is going to be a fairly sizeable problem for me.
>
> Look at cursors, but that doesn't eliminate the need for enough memory.
>
> Expect to buy more memory, and make sure you have enough swap space.
> Send more details on what you have currently and what your database
> looks like and we can give you better feedback on your resource
> requirements.
>
> - Thomas
>


From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: "Kirby Bohling (TRSi)" <kbohling(at)oasis(dot)novia(dot)net>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: cursor interface to libpq
Date: 2000-09-20 16:22:52
Message-ID: 39C8E45C.AE72C336@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

> Right now I have a database that has right around 4 Million
> rows in the primary table. I am porting it away from MySQL basically
> because I want transactions. When the table is written as insert
> statements it is around 3.2Gb, when that is gzipped it comes out to 1.3Gb.
> That table size will probably continue to grow at a rate of 200,000
> rows a week. I don't know the exact size of each row, but it is under the
> 8K limit. Right now I am working on a FreeBSD box with 1.5Gb of swap and
> 256MB of Ram. I believe that I could get that upgraded to 1Gb of ram, and
> add as much swap space as I wanted. But to be honest I really don't want
> to store every row of every select statement in memory. I believe that
> the database is around 1.5Gb on disk almost all of it in the one table.

afaik this should all work. You can run pg_dump and pipe the output to a
tape drive or to gzip. You *know* that a real backup will take something
like the size of the database (maybe a factor of two or so less) since
the data has to go somewhere.

Postgres *should* be able to store intermediate results etc on disk, so
the "out of memory" might be due to a per-process limit on your FreeBSD
machine. Others with experience on that platform might have some
specific suggestions.

Scrappy, a FreeBSD partisan, probably has tables on his systems much
bigger than the one under discussion. Perhaps he will speak up here??

Good luck!

- Thomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
Cc: "Kirby Bohling (TRSi)" <kbohling(at)oasis(dot)novia(dot)net>, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: cursor interface to libpq
Date: 2000-09-22 16:31:57
Message-ID: 29420.969640317@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu> writes:
> afaik this should all work. You can run pg_dump and pipe the output to a
> tape drive or to gzip. You *know* that a real backup will take something
> like the size of the database (maybe a factor of two or so less) since
> the data has to go somewhere.

pg_dump in default mode (ie, dump data as COPY commands) doesn't have a
problem with huge tables because the COPY data is just dumped out in a
streaming fashion.

If you insist on using the "dump data as insert commands" option then
huge tables cause a memory problem in pg_dump, but on the other hand you
are going to get pretty tired of waiting for such a script to reload,
too. I recommend just using the default behavior ...

regards, tom lane