Re: 8.3RC2 vs 8.2.6 testing results

Lists: pgsql-general
From: Vlad <marchenko(at)gmail(dot)com>
To: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-28 21:56:20
Message-ID: cd70c6810801281356g27796cb2t194fc6f48cb4e573@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello,

I wanted to share performance-related test results for Postgresql
8.3RC2 and 8.2.6. In both cases we used a freshly imported database
followed by analyze verbose command. Same server was used for testing
(2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were
compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both
cases default configuration was used with increased shared buffers to
1Gb (total server ram is 32Gb), increased work and maintenance mem,
enabled autovacuum, increased default_statistics_target to 100,
increased effective_cache_size to 20Gb, disabled fsync and increased
checkpoint_segments. Total size (on disk) of the tables involved in
the query was around 300Mb.

1. Freshly imported DB size on disk was about 3% smaller for 8.3
2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
We took special measures to make sure that no third factors involved
(no other apps running, all data was cached from disks, etc). Below
is one of the queries that we used for testing (I anonymized table
names) along with query plan for both 8.3 and 8.2. The query execution
plans are the same for both versions, but what we found quite
interesting is that if we add all the times from each line of 8.2's
query plan, it roughly adds-up to the total execution time. For 8.3's
plan each line shows a shorter time, yet resulting in longer total
runtime. Also, summing 8.3's plan lines doesn't come close to the
total execution time:

SELECT _."a_id", SUM(_."counter")
FROM ts.t_c AS _
LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id"
LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id"
LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id"
WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <=
'2008-01-27 23:59:59')
AND __1."status" IS TRUE
AND __2."status" IS TRUE
GROUP BY _."a_id"

8.2.6 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual
time=21517.837..21517.890 rows=47 loops=1)
-> Hash Join (cost=2450.09..111489.75 rows=550289 width=52)
(actual time=76.083..7691.579 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=19.20..95377.74 rows=934651 width=56)
(actual time=0.119..4933.928 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..76276.09 rows=2596252 width=56)
(actual time=0.014..2988.950 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.10 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..76254.99 rows=2596248 width=56) (actual
time=0.011..1979.606 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=18.30..18.30 rows=72 width=4) (actual
time=0.094..0.094 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..18.30
rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual
time=75.931..75.931 rows=59934 loops=1)
-> Hash Join (cost=57.45..1950.44 rows=38436 width=4)
(actual time=0.829..54.760 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1263.82
rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1)
-> Hash (cost=50.90..50.90 rows=524 width=4)
(actual time=0.499..0.499 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..50.90
rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 21518.097 ms

8.3RC2: QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual
time=24354.972..24355.019 rows=47 loops=1)
-> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
(actual time=76.188..8177.510 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=16.20..92904.25 rows=935090 width=56)
(actual time=0.140..5304.968 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..73796.62 rows=2597473 width=56)
(actual time=0.043..3272.024 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.55 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..73775.07 rows=2597469 width=56) (actual
time=0.040..2245.209 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=15.30..15.30 rows=72 width=4) (actual
time=0.091..0.091 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..15.30
rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual
time=76.027..76.027 rows=59934 loops=1)
-> Hash Join (cost=55.45..1882.44 rows=38436 width=4)
(actual time=0.835..54.576 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1197.82
rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1)
-> Hash (cost=48.90..48.90 rows=524 width=4)
(actual time=0.513..0.513 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..48.90
rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 24355.179 ms

Any ideas on what-we-were-doing-wrong are welcomed

--
Vlad


From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: Vlad <marchenko(at)gmail(dot)com>
Cc: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-28 22:26:42
Message-ID: 162867790801281426l24d59ea0ob5a9c404471d562f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello

8.3 plan is not optimal.

> -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
> (actual time=76.188..8177.510 rows=2593557 loops=1)

please, try to increase statistics

default_statistics_target (in postgresql.conf) to 100 and repeat
import and your test.

Regards
Pavel Stehule

On 28/01/2008, Vlad <marchenko(at)gmail(dot)com> wrote:
> Hello,
>
> I wanted to share performance-related test results for Postgresql
> 8.3RC2 and 8.2.6. In both cases we used a freshly imported database
> followed by analyze verbose command. Same server was used for testing
> (2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were
> compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both
> cases default configuration was used with increased shared buffers to
> 1Gb (total server ram is 32Gb), increased work and maintenance mem,
> enabled autovacuum, increased default_statistics_target to 100,
> increased effective_cache_size to 20Gb, disabled fsync and increased
> checkpoint_segments. Total size (on disk) of the tables involved in
> the query was around 300Mb.
>
> 1. Freshly imported DB size on disk was about 3% smaller for 8.3
> 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
> We took special measures to make sure that no third factors involved
> (no other apps running, all data was cached from disks, etc). Below
> is one of the queries that we used for testing (I anonymized table
> names) along with query plan for both 8.3 and 8.2. The query execution
> plans are the same for both versions, but what we found quite
> interesting is that if we add all the times from each line of 8.2's
> query plan, it roughly adds-up to the total execution time. For 8.3's
> plan each line shows a shorter time, yet resulting in longer total
> runtime. Also, summing 8.3's plan lines doesn't come close to the
> total execution time:
>
> SELECT _."a_id", SUM(_."counter")
> FROM ts.t_c AS _
> LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id"
> LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id"
> LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id"
> WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <=
> '2008-01-27 23:59:59')
> AND __1."status" IS TRUE
> AND __2."status" IS TRUE
> GROUP BY _."a_id"
>
>
> 8.2.6 QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual
> time=21517.837..21517.890 rows=47 loops=1)
> -> Hash Join (cost=2450.09..111489.75 rows=550289 width=52)
> (actual time=76.083..7691.579 rows=2593557 loops=1)
> Hash Cond: (_.i_id = __3.id)
> -> Hash Join (cost=19.20..95377.74 rows=934651 width=56)
> (actual time=0.119..4933.928 rows=2596942 loops=1)
> Hash Cond: (_.a_id = __1.id)
> -> Append (cost=0.00..76276.09 rows=2596252 width=56)
> (actual time=0.014..2988.950 rows=2596942 loops=1)
> -> Seq Scan on t_c _ (cost=0.00..21.10 rows=4
> width=56) (actual time=0.001..0.001 rows=0 loops=1)
> Filter: ((date_day >= '2008-01-01'::date)
> AND (date_day <= '2008-01-27'::date))
> -> Seq Scan on t_c_2008_01 _
> (cost=0.00..76254.99 rows=2596248 width=56) (actual
> time=0.011..1979.606 rows=2596942 loops=1)
> Filter: ((date_day >= '2008-01-01'::date)
> AND (date_day <= '2008-01-27'::date))
> -> Hash (cost=18.30..18.30 rows=72 width=4) (actual
> time=0.094..0.094 rows=72 loops=1)
> -> Seq Scan on t_a __1 (cost=0.00..18.30
> rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1)
> Filter: (status IS TRUE)
> -> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual
> time=75.931..75.931 rows=59934 loops=1)
> -> Hash Join (cost=57.45..1950.44 rows=38436 width=4)
> (actual time=0.829..54.760 rows=59934 loops=1)
> Hash Cond: (__3.b_id = __2.id)
> -> Seq Scan on t_i __3 (cost=0.00..1263.82
> rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1)
> -> Hash (cost=50.90..50.90 rows=524 width=4)
> (actual time=0.499..0.499 rows=524 loops=1)
> -> Seq Scan on t_b __2 (cost=0.00..50.90
> rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1)
> Filter: (status IS TRUE)
> Total runtime: 21518.097 ms
>
>
>
> 8.3RC2: QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual
> time=24354.972..24355.019 rows=47 loops=1)
> -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
> (actual time=76.188..8177.510 rows=2593557 loops=1)
> Hash Cond: (_.i_id = __3.id)
> -> Hash Join (cost=16.20..92904.25 rows=935090 width=56)
> (actual time=0.140..5304.968 rows=2596942 loops=1)
> Hash Cond: (_.a_id = __1.id)
> -> Append (cost=0.00..73796.62 rows=2597473 width=56)
> (actual time=0.043..3272.024 rows=2596942 loops=1)
> -> Seq Scan on t_c _ (cost=0.00..21.55 rows=4
> width=56) (actual time=0.001..0.001 rows=0 loops=1)
> Filter: ((date_day >= '2008-01-01'::date)
> AND (date_day <= '2008-01-27'::date))
> -> Seq Scan on t_c_2008_01 _
> (cost=0.00..73775.07 rows=2597469 width=56) (actual
> time=0.040..2245.209 rows=2596942 loops=1)
> Filter: ((date_day >= '2008-01-01'::date)
> AND (date_day <= '2008-01-27'::date))
> -> Hash (cost=15.30..15.30 rows=72 width=4) (actual
> time=0.091..0.091 rows=72 loops=1)
> -> Seq Scan on t_a __1 (cost=0.00..15.30
> rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1)
> Filter: (status IS TRUE)
> -> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual
> time=76.027..76.027 rows=59934 loops=1)
> -> Hash Join (cost=55.45..1882.44 rows=38436 width=4)
> (actual time=0.835..54.576 rows=59934 loops=1)
> Hash Cond: (__3.b_id = __2.id)
> -> Seq Scan on t_i __3 (cost=0.00..1197.82
> rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1)
> -> Hash (cost=48.90..48.90 rows=524 width=4)
> (actual time=0.513..0.513 rows=524 loops=1)
> -> Seq Scan on t_b __2 (cost=0.00..48.90
> rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1)
> Filter: (status IS TRUE)
> Total runtime: 24355.179 ms
>
>
> Any ideas on what-we-were-doing-wrong are welcomed
>
>
> --
> Vlad
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>


From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: Vlad <marchenko(at)gmail(dot)com>
Cc: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-28 22:27:48
Message-ID: 162867790801281427v7c2d34bbw2a1d68bb78243c11@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 28/01/2008, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hello
>
> 8.3 plan is not optimal.
>
> > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
> > (actual time=76.188..8177.510 rows=2593557 loops=1)
>
> please, try to increase statistics

I am blind, I am sorry, It's noise, you did it.

>
> default_statistics_target (in postgresql.conf) to 100 and repeat
> import and your test.
>
> Regards
> Pavel Stehule
>
> On 28/01/2008, Vlad <marchenko(at)gmail(dot)com> wrote:
> > Hello,
> >
> > I wanted to share performance-related test results for Postgresql
> > 8.3RC2 and 8.2.6. In both cases we used a freshly imported database
> > followed by analyze verbose command. Same server was used for testing
> > (2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were
> > compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both
> > cases default configuration was used with increased shared buffers to
> > 1Gb (total server ram is 32Gb), increased work and maintenance mem,
> > enabled autovacuum, increased default_statistics_target to 100,
> > increased effective_cache_size to 20Gb, disabled fsync and increased
> > checkpoint_segments. Total size (on disk) of the tables involved in
> > the query was around 300Mb.
> >
> > 1. Freshly imported DB size on disk was about 3% smaller for 8.3
> > 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
> > We took special measures to make sure that no third factors involved
> > (no other apps running, all data was cached from disks, etc). Below
> > is one of the queries that we used for testing (I anonymized table
> > names) along with query plan for both 8.3 and 8.2. The query execution
> > plans are the same for both versions, but what we found quite
> > interesting is that if we add all the times from each line of 8.2's
> > query plan, it roughly adds-up to the total execution time. For 8.3's
> > plan each line shows a shorter time, yet resulting in longer total
> > runtime. Also, summing 8.3's plan lines doesn't come close to the
> > total execution time:
> >
> > SELECT _."a_id", SUM(_."counter")
> > FROM ts.t_c AS _
> > LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id"
> > LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id"
> > LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id"
> > WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <=
> > '2008-01-27 23:59:59')
> > AND __1."status" IS TRUE
> > AND __2."status" IS TRUE
> > GROUP BY _."a_id"
> >
> >
> > 8.2.6 QUERY PLAN
> > --------------------------------------------------------------------------------------------------------------------------------------------------------
> > HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual
> > time=21517.837..21517.890 rows=47 loops=1)
> > -> Hash Join (cost=2450.09..111489.75 rows=550289 width=52)
> > (actual time=76.083..7691.579 rows=2593557 loops=1)
> > Hash Cond: (_.i_id = __3.id)
> > -> Hash Join (cost=19.20..95377.74 rows=934651 width=56)
> > (actual time=0.119..4933.928 rows=2596942 loops=1)
> > Hash Cond: (_.a_id = __1.id)
> > -> Append (cost=0.00..76276.09 rows=2596252 width=56)
> > (actual time=0.014..2988.950 rows=2596942 loops=1)
> > -> Seq Scan on t_c _ (cost=0.00..21.10 rows=4
> > width=56) (actual time=0.001..0.001 rows=0 loops=1)
> > Filter: ((date_day >= '2008-01-01'::date)
> > AND (date_day <= '2008-01-27'::date))
> > -> Seq Scan on t_c_2008_01 _
> > (cost=0.00..76254.99 rows=2596248 width=56) (actual
> > time=0.011..1979.606 rows=2596942 loops=1)
> > Filter: ((date_day >= '2008-01-01'::date)
> > AND (date_day <= '2008-01-27'::date))
> > -> Hash (cost=18.30..18.30 rows=72 width=4) (actual
> > time=0.094..0.094 rows=72 loops=1)
> > -> Seq Scan on t_a __1 (cost=0.00..18.30
> > rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1)
> > Filter: (status IS TRUE)
> > -> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual
> > time=75.931..75.931 rows=59934 loops=1)
> > -> Hash Join (cost=57.45..1950.44 rows=38436 width=4)
> > (actual time=0.829..54.760 rows=59934 loops=1)
> > Hash Cond: (__3.b_id = __2.id)
> > -> Seq Scan on t_i __3 (cost=0.00..1263.82
> > rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1)
> > -> Hash (cost=50.90..50.90 rows=524 width=4)
> > (actual time=0.499..0.499 rows=524 loops=1)
> > -> Seq Scan on t_b __2 (cost=0.00..50.90
> > rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1)
> > Filter: (status IS TRUE)
> > Total runtime: 21518.097 ms
> >
> >
> >
> > 8.3RC2: QUERY PLAN
> > --------------------------------------------------------------------------------------------------------------------------------------------------------
> > HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual
> > time=24354.972..24355.019 rows=47 loops=1)
> > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
> > (actual time=76.188..8177.510 rows=2593557 loops=1)
> > Hash Cond: (_.i_id = __3.id)
> > -> Hash Join (cost=16.20..92904.25 rows=935090 width=56)
> > (actual time=0.140..5304.968 rows=2596942 loops=1)
> > Hash Cond: (_.a_id = __1.id)
> > -> Append (cost=0.00..73796.62 rows=2597473 width=56)
> > (actual time=0.043..3272.024 rows=2596942 loops=1)
> > -> Seq Scan on t_c _ (cost=0.00..21.55 rows=4
> > width=56) (actual time=0.001..0.001 rows=0 loops=1)
> > Filter: ((date_day >= '2008-01-01'::date)
> > AND (date_day <= '2008-01-27'::date))
> > -> Seq Scan on t_c_2008_01 _
> > (cost=0.00..73775.07 rows=2597469 width=56) (actual
> > time=0.040..2245.209 rows=2596942 loops=1)
> > Filter: ((date_day >= '2008-01-01'::date)
> > AND (date_day <= '2008-01-27'::date))
> > -> Hash (cost=15.30..15.30 rows=72 width=4) (actual
> > time=0.091..0.091 rows=72 loops=1)
> > -> Seq Scan on t_a __1 (cost=0.00..15.30
> > rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1)
> > Filter: (status IS TRUE)
> > -> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual
> > time=76.027..76.027 rows=59934 loops=1)
> > -> Hash Join (cost=55.45..1882.44 rows=38436 width=4)
> > (actual time=0.835..54.576 rows=59934 loops=1)
> > Hash Cond: (__3.b_id = __2.id)
> > -> Seq Scan on t_i __3 (cost=0.00..1197.82
> > rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1)
> > -> Hash (cost=48.90..48.90 rows=524 width=4)
> > (actual time=0.513..0.513 rows=524 loops=1)
> > -> Seq Scan on t_b __2 (cost=0.00..48.90
> > rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1)
> > Filter: (status IS TRUE)
> > Total runtime: 24355.179 ms
> >
> >
> > Any ideas on what-we-were-doing-wrong are welcomed
> >
> >
> > --
> > Vlad
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
> >
>


From: Vlad <marchenko(at)gmail(dot)com>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-28 22:32:59
Message-ID: cd70c6810801281432g2f3967a6y8cee4968fce8cff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Pavel:

thanks for your feedback. To me plans generated by 8.2 and 8.3 are
equal and only differ by execution times. (I don't know, maybe email
wrap'ed lines, so I've attached plans to my message). Also, I confirm
that that parameter was increased (to 100) before the ran tests.

On Jan 28, 2008 4:26 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hello
>
> 8.3 plan is not optimal.
>
> > -> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
> > (actual time=76.188..8177.510 rows=2593557 loops=1)
>
> please, try to increase statistics
>
> default_statistics_target (in postgresql.conf) to 100 and repeat
> import and your test.
>
> Regards
> Pavel Stehule

-- Vlad

Attachment Content-Type Size
82_plan.txt text/plain 2.3 KB
83_plan.txt text/plain 2.3 KB

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: Vlad <marchenko(at)gmail(dot)com>
Cc: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-28 22:35:07
Message-ID: dcc563d10801281435i74d7a446ha90c7d7e360a5309@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 28, 2008 3:56 PM, Vlad <marchenko(at)gmail(dot)com> wrote:
> Hello,
>
> 1. Freshly imported DB size on disk was about 3% smaller for 8.3
> 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
> We took special measures to make sure that no third factors involved
> (no other apps running, all data was cached from disks, etc). Below
> is one of the queries that we used for testing (I anonymized table
> names) along with query plan for both 8.3 and 8.2. The query execution
> plans are the same for both versions, but what we found quite
> interesting is that if we add all the times from each line of 8.2's
> query plan, it roughly adds-up to the total execution time. For 8.3's
> plan each line shows a shorter time, yet resulting in longer total
> runtime. Also, summing 8.3's plan lines doesn't come close to the
> total execution time:

This last bit often means there's some overhead in the systems
timeofday() function calls.

If you just use \timing from psql, and run the script without explain
analyze, what speeds do you get on each?


From: Vlad <marchenko(at)gmail(dot)com>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-28 22:41:22
Message-ID: cd70c6810801281441h2a4e048eh85e1c151ed306654@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> This last bit often means there's some overhead in the systems
> timeofday() function calls.
>
> If you just use \timing from psql, and run the script without explain
> analyze, what speeds do you get on each?
>

17480ms (8.2.6)
20342ms (8.3RC2)

--
Vlad


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vlad <marchenko(at)gmail(dot)com>
Cc: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 03:13:28
Message-ID: 11517.1201576408@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Vlad <marchenko(at)gmail(dot)com> writes:
> 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.

The particular case you are showing here seems to be all about the speed
of hash aggregation --- at least the time differential is mostly in the
HashAggregate step. What is the data type of a_id? I speculate that
you're noticing the slightly slower/more complicated hash function that
8.3 uses for integers. On a case where the data was well distributed
you'd not see any countervailing efficiency gain from those extra
cycles.

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 05:24:29
Message-ID: Pine.GSO.4.64.0801290020100.20361@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 28 Jan 2008, Tom Lane wrote:

> I speculate that you're noticing the slightly slower/more complicated
> hash function that 8.3 uses for integers.

There was a similar slowdown in the Clodaldo case you tracked down
recently. Is it worth considering an addition to the release notes
warning about this class of problem? If there have been two of them so
far just in the beta I wonder how many people are going to run into some
variant of this in the future.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Vlad <marchenko(at)gmail(dot)com>, PG-General <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 07:45:04
Message-ID: e51f66da0801282345s16d9d128n227dc33c41a3c0cf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 1/29/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Vlad <marchenko(at)gmail(dot)com> writes:
> > 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
>
> The particular case you are showing here seems to be all about the speed
> of hash aggregation --- at least the time differential is mostly in the
> HashAggregate step. What is the data type of a_id? I speculate that
> you're noticing the slightly slower/more complicated hash function that
> 8.3 uses for integers. On a case where the data was well distributed
> you'd not see any countervailing efficiency gain from those extra
> cycles.

AFAIK we have a plan to update string hash in 8.4 to fastest
available (Jenkins lookup3). Maybe we should update integer
hash too then to the best:

http://www.cris.com/~Ttwang/tech/inthash.htm

("32 bit Mix Functions" is the one).

--
marko


From: "Vlad Marchenko" <marchenko(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 15:56:13
Message-ID: D790CC4FA7C842E6BB9BB6A5BBEBC973@vladm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom:

Yes, they are ints. To (somewhat) check your guess on the role of the hash
aggregation speed, I just ran oltp test from sysbench
(http://sysbench.sourceforge.net/docs/#database_mode) on a table with 1mln
of records. That test uses pretty simple queries (that do not use
aggregation) and 8.3 showed about the same performance as 8.2 (strictly
speaking 8.3 was about 1-2% slower, but not 10-15% like on my query).

I'm curious if that new hash aggregation algorithm was put in 8.3 with the
performance increase as a goal or it was simply a required change to support
some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't
seems a favorable step for the type of application that we have...

-- vlad

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Vlad" <marchenko(at)gmail(dot)com>
Cc: "PG-General" <pgsql-general(at)postgresql(dot)org>
Sent: Monday, January 28, 2008 9:13 PM
Subject: Re: [GENERAL] 8.3RC2 vs 8.2.6 testing results

> Vlad <marchenko(at)gmail(dot)com> writes:
>> 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
>
> The particular case you are showing here seems to be all about the speed
> of hash aggregation --- at least the time differential is mostly in the
> HashAggregate step. What is the data type of a_id? I speculate that
> you're noticing the slightly slower/more complicated hash function that
> 8.3 uses for integers. On a case where the data was well distributed
> you'd not see any countervailing efficiency gain from those extra
> cycles.
>
> regards, tom lane
>


From: Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com>
To: "Vlad Marchenko" <marchenko(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 16:34:26
Message-ID: a595de7a0801290834g15462046y438a352aa9b5a149@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2008/1/29, Vlad Marchenko <marchenko(at)gmail(dot)com>:
> Tom:
>
> Yes, they are ints. To (somewhat) check your guess on the role of the hash
> aggregation speed, I just ran oltp test from sysbench
> (http://sysbench.sourceforge.net/docs/#database_mode) on a table with 1mln
> of records. That test uses pretty simple queries (that do not use
> aggregation) and 8.3 showed about the same performance as 8.2 (strictly
> speaking 8.3 was about 1-2% slower, but not 10-15% like on my query).
>
> I'm curious if that new hash aggregation algorithm was put in 8.3 with the
> performance increase as a goal or it was simply a required change to support
> some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't
> seems a favorable step for the type of application that we have...

Vlad,

What happens if you run the 8.3 test with enable_hashagg set to off?

Saudações, Clodoaldo Pinto Neto

> ----- Original Message -----
> From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> To: "Vlad" <marchenko(at)gmail(dot)com>
> Cc: "PG-General" <pgsql-general(at)postgresql(dot)org>
> Sent: Monday, January 28, 2008 9:13 PM
> Subject: Re: [GENERAL] 8.3RC2 vs 8.2.6 testing results
>
>
> > Vlad <marchenko(at)gmail(dot)com> writes:
> >> 2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
> >
> > The particular case you are showing here seems to be all about the speed
> > of hash aggregation --- at least the time differential is mostly in the
> > HashAggregate step. What is the data type of a_id? I speculate that
> > you're noticing the slightly slower/more complicated hash function that
> > 8.3 uses for integers. On a case where the data was well distributed
> > you'd not see any countervailing efficiency gain from those extra
> > cycles.
> >
> > regards, tom lane
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


From: Vlad <marchenko(at)gmail(dot)com>
To: Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 16:49:45
Message-ID: cd70c6810801290849x4c9d1733vc5d1d3b429b32603@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I just tried adjusting two parameters:

enable_hashagg = off
both versions run slower, still 8.2 quicker than 8.3 (by ~ the same %)

enable_hashjoin = off
both versions run slower, still 8.2 quicker than 8.3 (by ~ the same %)

On Jan 29, 2008 10:34 AM, Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com> wrote:
> Vlad,
>
> What happens if you run the 8.3 test with enable_hashagg set to off?
>
> Saudações, Clodoaldo Pinto Neto
>

--
Vlad


From: Vlad <marchenko(at)gmail(dot)com>
To: Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 18:17:47
Message-ID: cd70c6810801291017pcea4e1eoc3a02c7d1dd23bae@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

More test results for public. We ran my original query and found out
that on 4 cores CPU:

1 thread test 8.2 bits 8.3 by 10..15%
4 threads 8.2. wins 8.3 by ~2%
8 threads 8.3 finally wins 8.2 by ~2%

the same data set was affected during multi-threaded runs, but it's
100% cached from the disk. I guess we see the result of concurrent
scans optimization in 8.3...

--
Vlad


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Vlad Marchenko" <marchenko(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: 8.3RC2 vs 8.2.6 testing results
Date: 2008-01-29 18:56:26
Message-ID: 23933.1201632986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Vlad Marchenko" <marchenko(at)gmail(dot)com> writes:
> I'm curious if that new hash aggregation algorithm was put in 8.3 with the
> performance increase as a goal

Yes.
http://archives.postgresql.org/pgsql-hackers/2007-05/msg01179.php
http://archives.postgresql.org/pgsql-committers/2007-06/msg00007.php
(and nearby messages)

> or it was simply a required change to support
> some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't
> seems a favorable step for the type of application that we have...

You are condemning a demonstrably better algorithm on the basis of one
test case. If you poke around a bit harder you will find other cases
where it is faster than 8.2 because of having better distribution.

regards, tom lane