Re: updated patch for selecting large results sets in psql using cursors

Lists: pgsql-hackerspgsql-patches
From: <chrisnospam(at)1006(dot)org>
To: <pgsql-patches(at)postgresql(dot)org>
Subject: updated patch for selecting large results sets in psql using cursors
Date: 2006-08-28 14:29:38
Message-ID: 17911.193.206.186.101.1156775378.squirrel@www.endian.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi there,

here comes the latest version (version 7) of the patch to handle large
result sets with psql. As previously discussed, a cursor is used
for SELECT queries when \set FETCH_COUNT some_value > 0
(defaults to 100 if FETCH_COUNT is set with no value).

Comparing to the previous version, the patch actually got smaller and is
less invasive, because I doesn't have to deal with a new command and
can share some more code with SendQuery() in common.c.

Bye,
Chris.

--
Chris Mair
http://www.1006.org

Attachment Content-Type Size
psql_cursor-7.patch text/x-patch 8.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, chrisnospam(at)1006(dot)org
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: updated patch for selecting large results sets in psql using cursors
Date: 2006-08-28 17:11:33
Message-ID: 9799.1156785093@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

<chrisnospam(at)1006(dot)org> writes:
> here comes the latest version (version 7) of the patch to handle large
> result sets with psql. As previously discussed, a cursor is used
> for SELECT queries when \set FETCH_COUNT some_value > 0

Wait a minute. What I thought we had agreed to was a patch to make
commands sent with \g use a cursor. This patch changes SendQuery
so that *every* command executed via psql is treated this way.
That's a whole lot bigger behavioral change than I think is warranted.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: chrisnospam(at)1006(dot)org
Subject: Re: updated patch for selecting large results sets in psql using cursors
Date: 2006-08-28 17:26:02
Message-ID: 200608281926.04419.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Wait a minute. What I thought we had agreed to was a patch to make
> commands sent with \g use a cursor. This patch changes SendQuery
> so that *every* command executed via psql is treated this way.

That's what I remembered. I don't think we want to introduce a
difference between ; and \g.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, chrisnospam(at)1006(dot)org
Subject: Re: [PATCHES] updated patch for selecting large results sets in psql using cursors
Date: 2006-08-28 17:45:13
Message-ID: 13798.1156787113@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Tom Lane wrote:
>> Wait a minute. What I thought we had agreed to was a patch to make
>> commands sent with \g use a cursor. This patch changes SendQuery
>> so that *every* command executed via psql is treated this way.

> That's what I remembered. I don't think we want to introduce a
> difference between ; and \g.

Have we measured the performance impact, then? The last time I profiled
psql, GetVariable was already a hotspot, and this introduces another
call of it into the basic query loop whether you use the feature or not.

regards, tom lane


From: Chris Mair <chrisnospam(at)1006(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] updated patch for selecting large results sets in
Date: 2006-08-28 19:17:30
Message-ID: 1156792650.4026.34.camel@dell.home.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2006-08-28 at 13:45 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > Tom Lane wrote:
> > > Wait a minute. What I thought we had agreed to was a patch to make
> > > commands sent with \g use a cursor. This patch changes SendQuery
> > > so that *every* command executed via psql is treated this way.
>
> > That's what I remembered. I don't think we want to introduce a
> > difference between ; and \g.
>
> Have we measured the performance impact, then? The last time I profiled
> psql, GetVariable was already a hotspot, and this introduces another
> call of it into the basic query loop whether you use the feature or not.
>
> regards, tom lane

Hi,

after agreeing on using a \set variable, I proposed to have it influence
"\g" as well as ";", because I thought that would be the most expected
behaviour. IMHO I'm with Peter, that introducing a difference between
"\g" and ";" would go against the principle of least surprise.

Performance-wise I took for granted without checking that GetVariable's
running time would be negligible.

[looks at the code]

I see it's it's basically two function calls with a loop over a linked
list of values (in the order of 10) doing strcmps and one strtol.
It is not quite clear to me what the impact of this is. I could
imagine it would show up only if you perform lots of trivial queries
through psql. I'm going to benchmark something now and report back.

Anyway, regardless the benchmark, I feel it's somehow not clean to have
a variable introduce a difference between "\g" and ";".

[goes benchmarking...]

Bye, Chris.

--

Chris Mair
http://www.1006.org


From: Chris Mair <chrisnospam(at)1006(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] updated patch for selecting large results
Date: 2006-08-28 21:59:34
Message-ID: 1156802374.4026.94.camel@dell.home.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> Performance-wise I took for granted without checking that GetVariable's
> running time would be negligible.
>
> [looks at the code]
>
> I see it's it's basically two function calls with a loop over a linked
> list of values (in the order of 10) doing strcmps and one strtol.
> It is not quite clear to me what the impact of this is. I could
> imagine it would show up only if you perform lots of trivial queries
> through psql. I'm going to benchmark something now and report back.
>
> Anyway, regardless the benchmark, I feel it's somehow not clean to have
> a variable introduce a difference between "\g" and ";".
>
> [goes benchmarking...]

Ok,
so I ran a file containing 1 million lines of "select 1;" through
psql (discarding the output). 5 runs each with the patch and with the
patch removed (the if() in SendQuery commented).

These are the results in seconds user time of psql on a Pentium M 2.0
GHz (real time was longer, since the postmaster process was on the same
machine).

patch | count | avg | stddev
-------+-------+---------+-------------------
f | 5 | 16.6722 | 0.359759919946455
t | 5 | 17.2762 | 0.259528803796329

The conclusion is that, yes, the overhead is measurable, albeit with
a very synthetic benchmark (of the type MySQL wins ;).

Basically I'm loosing 0.6 usec on each query line (when FETCH_COUNT
is not there and therefore psql need to scan the whole variables list
in GetVariable() for nothing).

Not sure if that's acceptable (I'd say yes, but then again, I'm
not a cycle counter type of programmer *cough* Java *cough* ;)...

Opinions?

Bye, Chris.

--

Chris Mair
http://www.1006.org


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Mair <chrisnospam(at)1006(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] updated patch for selecting large results sets in
Date: 2006-08-28 22:09:06
Message-ID: 19044.1156802946@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Chris Mair <chrisnospam(at)1006(dot)org> writes:
> The conclusion is that, yes, the overhead is measurable, albeit with
> a very synthetic benchmark (of the type MySQL wins ;).

OK, so about 3 or 4% overhead added to extremely short queries.
That's not enough to kill this patch, but it's still annoying ...
and as I mentioned, there are some existing calls of GetVariable
that are executed often enough to be a problem too.

It strikes me that having to do GetVariable *and* strtol and so on
for these special variables is pretty silly; the work should be done
during the infrequent times they are set, rather than the frequent
times they are read. I propose that we add the equivalent of a GUC
assign_hook to psql's variable facility, attach an assign hook function
to each special variable, and have the assign hook transpose the
value into an internal variable that can be read with no overhead.
If we do that then the cost of the FETCH_COUNT patch will be
unmeasurable, and I think we'll see a few percent savings overall in
psql runtime from improving the existing hotspot uses of GetVariable.

Barring objections, I'll hack on this this evening ...

regards, tom lane


From: Chris Mair <chrisnospam(at)1006(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] updated patch for selecting large results
Date: 2006-08-28 22:38:49
Message-ID: 1156804729.4026.123.camel@dell.home.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> > The conclusion is that, yes, the overhead is measurable, albeit with
> > a very synthetic benchmark (of the type MySQL wins ;).
>
> OK, so about 3 or 4% overhead added to extremely short queries.

More accurately, that 3 or 4% overhead is added to about all queries
(we're talking *psql*-only running time).

It's just that for anything but short queries, psql running time
totally dwarfs regarding to postmaster running time anyway.

> That's not enough to kill this patch, but it's still annoying ...
> and as I mentioned, there are some existing calls of GetVariable
> that are executed often enough to be a problem too.
>
> It strikes me that having to do GetVariable *and* strtol and so on
> for these special variables is pretty silly; the work should be done
> during the infrequent times they are set, rather than the frequent
> times they are read. I propose that we add the equivalent of a GUC
> assign_hook to psql's variable facility, attach an assign hook function
> to each special variable, and have the assign hook transpose the
> value into an internal variable that can be read with no overhead.
> If we do that then the cost of the FETCH_COUNT patch will be
> unmeasurable, and I think we'll see a few percent savings overall in
> psql runtime from improving the existing hotspot uses of GetVariable.
>
> Barring objections, I'll hack on this this evening ...

Might help.

Take into account the strtol is not critical at all for FETCH_COUNT,
since when it's actually set, we're supposed to retrieving big data
where a strtol doesn't matter anyway. The overhead comes from scanning
the linked list for nothing in the normal case (when it's not set).

I don't know how the other uses factor in here, but I see it's called
at least twice more on average calls to SendQuery.

Bye,
Chris.

--

Chris Mair
http://www.1006.org


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, chrisnospam(at)1006(dot)org
Subject: Re: updated patch for selecting large results sets
Date: 2006-08-28 22:54:13
Message-ID: 200608282254.k7SMsDB16553@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> Tom Lane wrote:
> > Wait a minute. What I thought we had agreed to was a patch to make
> > commands sent with \g use a cursor. This patch changes SendQuery
> > so that *every* command executed via psql is treated this way.
>
> That's what I remembered. I don't think we want to introduce a
> difference between ; and \g.

I am confused. I assume \g and ; should be affected, like Peter says.
Tom, what *every* command are you talking about? You mean \d?

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, chrisnospam(at)1006(dot)org
Subject: Re: updated patch for selecting large results sets in psql using cursors
Date: 2006-08-28 23:06:38
Message-ID: 19725.1156806398@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Peter Eisentraut wrote:
>> Tom Lane wrote:
>>> Wait a minute. What I thought we had agreed to was a patch to make
>>> commands sent with \g use a cursor.

> I am confused. I assume \g and ; should be affected, like Peter says.
> Tom, what *every* command are you talking about? You mean \d?

Like I said, I thought we were intending to modify \g's behavior only;
that was certainly the implication of the discussion of "\gc".

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, chrisnospam(at)1006(dot)org
Subject: Re: updated patch for selecting large results sets
Date: 2006-08-28 23:08:07
Message-ID: 200608282308.k7SN87317504@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Peter Eisentraut wrote:
> >> Tom Lane wrote:
> >>> Wait a minute. What I thought we had agreed to was a patch to make
> >>> commands sent with \g use a cursor.
>
> > I am confused. I assume \g and ; should be affected, like Peter says.
> > Tom, what *every* command are you talking about? You mean \d?
>
> Like I said, I thought we were intending to modify \g's behavior only;
> that was certainly the implication of the discussion of "\gc".

OK, got it. I just don't see the value to doing \g and not ;. I think
the \gc case was a hack when he didn't have \set. Now that we have
\set, we should be consistent.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, chrisnospam(at)1006(dot)org
Subject: Re: updated patch for selecting large results sets
Date: 2006-08-28 23:20:57
Message-ID: 19854.1156807257@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> OK, got it. I just don't see the value to doing \g and not ;. I think
> the \gc case was a hack when he didn't have \set. Now that we have
> \set, we should be consistent.

I'm willing to accept this if we can make sure we aren't adding any
overhead --- see my proposal elsewhere in the thread for fixing that.

regards, tom lane


From: Chris Mair <chrisnospam(at)1006(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: updated patch for selecting large results sets in
Date: 2006-08-28 23:31:04
Message-ID: 1156807864.4026.161.camel@dell.home.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> > > I am confused. I assume \g and ; should be affected, like Peter says.
> > > Tom, what *every* command are you talking about? You mean \d?
> >
> > Like I said, I thought we were intending to modify \g's behavior only;
> > that was certainly the implication of the discussion of "\gc".

At some point you OKed the "\g and ;" proposal.
I admit I was quick adding the "and ;" part, but it seemed so natural
once we agreed on using a variable.

> OK, got it. I just don't see the value to doing \g and not ;. I think
> the \gc case was a hack when he didn't have \set. Now that we have
> \set, we should be consistent.

I agree with this statement.

If we have a variable that switches just between two versions of \g,
we could have gone with using \u (or whatever) in the first place.

In the mean time I have been converted by the variable camp, and
I think the variable should change "\g" and ";" together, consistently.

If we find we can't live with the performance overhead of that
if(FETCH_COUNT), it is still not clear why we would be better
off moving it into the \g code path only.

Is it because presumably \g is used less often in existing psql scripts?

Bye, Chris.

--

Chris Mair
http://www.1006.org


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, chrisnospam(at)1006(dot)org
Subject: Re: updated patch for selecting large results sets
Date: 2006-08-28 23:51:28
Message-ID: 200608282351.k7SNpS311444@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > OK, got it. I just don't see the value to doing \g and not ;. I think
> > the \gc case was a hack when he didn't have \set. Now that we have
> > \set, we should be consistent.
>
> I'm willing to accept this if we can make sure we aren't adding any
> overhead --- see my proposal elsewhere in the thread for fixing that.

Right, if \g has overhead, I don't want people to start using ; because
it is faster. That is the kind of behavior that makes us look sloppy.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, chrisnospam(at)1006(dot)org
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: updated patch for selecting large results sets in psql using cursors
Date: 2006-08-29 22:31:24
Message-ID: 18251.1156890684@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

<chrisnospam(at)1006(dot)org> writes:
> here comes the latest version (version 7) of the patch to handle large
> result sets with psql. As previously discussed, a cursor is used
> for SELECT queries when \set FETCH_COUNT some_value > 0

Applied with revisions ... I didn't like the fact that the code was
restricted to handle only unaligned output format, so I fixed print.c
to be able to deal with emitting output in sections. This is not
ideal for aligned output mode, because we compute column widths
separately for each FETCH group, but all the other output modes work
nicely. I also did a little hacking to make \timing and pager output
work as expected.

regards, tom lane


From: Chris Mair <chrisnospam(at)1006(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] updated patch for selecting large results
Date: 2006-08-30 09:40:20
Message-ID: 1156930821.2251.3.camel@ultra.home.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 2006-08-29 at 18:31 -0400, Tom Lane wrote:
> <chrisnospam(at)1006(dot)org> writes:
> > here comes the latest version (version 7) of the patch to handle large
> > result sets with psql. As previously discussed, a cursor is used
> > for SELECT queries when \set FETCH_COUNT some_value > 0
>
> Applied with revisions ... I didn't like the fact that the code was
> restricted to handle only unaligned output format, so I fixed print.c
> to be able to deal with emitting output in sections. This is not
> ideal for aligned output mode, because we compute column widths
> separately for each FETCH group, but all the other output modes work
> nicely. I also did a little hacking to make \timing and pager output
> work as expected.
>
> regards, tom lane

Cool!
I specially like that as a side effect of your work for applying this,
psql is faster now.

Thanks to all people that helped with this (lots...:)

Bye, Chris.

--

Chris Mair
http://www.1006.org