Temp table's effect on performance

Lists: pgsql-general
From: Robert James <srobertjames(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Temp table's effect on performance
Date: 2013-01-18 17:29:46
Message-ID: CAGYyBggTfQ2Kf1VyR5g2jmqTz5a+Zmme-pAnn5V6SWbUa4xSHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'd like to understand better why manually using a temp table can
improve performance so much.

I had one complicated query that performed well. I replaced a table
in it with a reference to a view, which was really just the table with
an inner join, and performance worsened by 2000x. Literally.

I then modified it to first manually SELECT the view into a temp
table, and performance returned to close to the original query. The
temp table had the same indexes as the original one.

How is that? What does the temp table do that the planner can't do
itself? Don't planner uses temp structures too?

In other words: Since my query is 100% identical algebraicly to not
using a temp table, why is it so much faster? Why can't the planner
work in the exact same order?


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Temp table's effect on performance
Date: 2013-01-18 17:33:50
Message-ID: CAFj8pRCJqGz_kGZviG9MSwgCptzDSKP0a7hrSCKuONLxmgqcMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello

2013/1/18 Robert James <srobertjames(at)gmail(dot)com>:
> I'd like to understand better why manually using a temp table can
> improve performance so much.

one possible effect - there should be different statistic

did you look on EXPLAIN ANALYZE?

Regards

Pavel Stehule

>
> I had one complicated query that performed well. I replaced a table
> in it with a reference to a view, which was really just the table with
> an inner join, and performance worsened by 2000x. Literally.
>
> I then modified it to first manually SELECT the view into a temp
> table, and performance returned to close to the original query. The
> temp table had the same indexes as the original one.
>
> How is that? What does the temp table do that the planner can't do
> itself? Don't planner uses temp structures too?
>
> In other words: Since my query is 100% identical algebraicly to not
> using a temp table, why is it so much faster? Why can't the planner
> work in the exact same order?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Temp table's effect on performance
Date: 2013-01-18 18:43:18
Message-ID: CAMkU=1yxSsWEfPwWf0-N4VOdwL0COLuk_gg3Ln9mX0mq7k5baw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Jan 18, 2013 at 9:29 AM, Robert James <srobertjames(at)gmail(dot)com> wrote:
> I'd like to understand better why manually using a temp table can
> improve performance so much.
>
> I had one complicated query that performed well. I replaced a table
> in it with a reference to a view, which was really just the table with
> an inner join, and performance worsened by 2000x. Literally.

I don't see how this can be answered at a general level. Something
very specific happened. Without knowing what that specific thing is,
what can we say about it?

> I then modified it to first manually SELECT the view into a temp
> table, and performance returned to close to the original query. The
> temp table had the same indexes as the original one.
>
> How is that? What does the temp table do that the planner can't do
> itself? Don't planner uses temp structures too?

Most likely the temp table just got lucky. If you re-wrote hundreds
of other queries to use a temp table, would all of them systematically
get faster? If so, that would certainly be a thing to wonder about.
But a single example, with no details....well, what can be said about
that? When you choose how to make the temp table, you are probably
imposing some intuitive knowledge you have (but the planner does not)
about the nature of your data.

> In other words: Since my query is 100% identical algebraicly to not
> using a temp table, why is it so much faster? Why can't the planner
> work in the exact same order?

Unless you are doing ANALYZE on your temp table, then the planner has
to make some guesses about the size and selectivity and correlations
involved. Those guesses probably just got lucky at being better in
this particular case than the real statistics. Without having any
details, it is hard to say why. It is easy to spot the one time the
planner gets it wrong, while the 9,999 times it got it right go
unnoticed.

Cheers,

Jeff


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert James <srobertjames(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Temp table's effect on performance
Date: 2013-01-18 19:06:34
Message-ID: 11918.1358535994@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Fri, Jan 18, 2013 at 9:29 AM, Robert James <srobertjames(at)gmail(dot)com> wrote:
>> In other words: Since my query is 100% identical algebraicly to not
>> using a temp table, why is it so much faster? Why can't the planner
>> work in the exact same order?

> Unless you are doing ANALYZE on your temp table, then the planner has
> to make some guesses about the size and selectivity and correlations
> involved. Those guesses probably just got lucky at being better in
> this particular case than the real statistics.

Whether you've done ANALYZE or not, the planner can see the physical
size of the temp table, which allows it to make a rowcount estimate
based on a guess as to the average row width (which it can make, in
a pretty squishy way, given only the column datatypes). Now an
estimate gotten that way can be pretty far off, but it might still
be much better than what we can come up with for a sub-select (view).
Of course if you *have* done an ANALYZE on the temp table then the
planner is far better informed than when considering a view.

Whether that's the explanation is of course impossible to know from
the given (lack of) information.

regards, tom lane


From: Robert James <srobertjames(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Temp table's effect on performance
Date: 2013-01-18 19:51:00
Message-ID: CAGYyBggoWvPeB1NHjZHP7X-1R0u9GGg9XMRX4-BQXusXfgohqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 1/18/13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> On Fri, Jan 18, 2013 at 9:29 AM, Robert James <srobertjames(at)gmail(dot)com>
>> wrote:
>>> In other words: Since my query is 100% identical algebraicly to not
>>> using a temp table, why is it so much faster? Why can't the planner
>>> work in the exact same order?
>
>> Unless you are doing ANALYZE on your temp table, then the planner has
>> to make some guesses about the size and selectivity and correlations
>> involved. Those guesses probably just got lucky at being better in
>> this particular case than the real statistics.
>
> Whether you've done ANALYZE or not, the planner can see the physical
> size of the temp table, which allows it to make a rowcount estimate
> based on a guess as to the average row width (which it can make, in
> a pretty squishy way, given only the column datatypes). Now an
> estimate gotten that way can be pretty far off, but it might still
> be much better than what we can come up with for a sub-select (view).
> Of course if you *have* done an ANALYZE on the temp table then the
> planner is far better informed than when considering a view.
>
> Whether that's the explanation is of course impossible to know from
> the given (lack of) information.
>
> regards, tom lane
>

What information would be helpful to post?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert James <srobertjames(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Temp table's effect on performance
Date: 2013-01-18 19:57:39
Message-ID: 24316.1358539059@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert James <srobertjames(at)gmail(dot)com> writes:
> On 1/18/13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Whether that's the explanation is of course impossible to know from
>> the given (lack of) information.

> What information would be helpful to post?

Both forms of the query, EXPLAIN ANALYZE output for both, along with the
underlying table definitions.

regards, tom lane