Re: scale up (postgresql vs mssql)

Lists: pgsql-performance
From: Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: scale up (postgresql vs mssql)
Date: 2012-04-29 13:21:23
Message-ID: CAMiEbcj7LF56GiC8BTJbfku-TikaAqoAEY9kZtSfYKoj4Ntvew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

hi, all.

well, i wondered why there is high rate of bo (blocks out). the procedure
is practically read-only during the whole test. although it's not strictly
read-only, because in a certain condition, there might be writing to a
certain table. but that condition can not be met during this test.

so, i created a ramdisk:
mkfs -q /dev/ram2 100000
mkdir -p /ram4
mount /dev/ram2 /ram4
df -H | grep /ram4

and then:
CREATE TABLESPACE pgram4 OWNER postgres LOCATION '/ram4';

and in postgresql.conf i configured:
temp_tablespaces = 'pgram4'

now, i believe, all the temp-table were in RAM.
vmstat showed:
r b swpd free buff cache si so bi bo in cs us sy id wa st
6 0 0 5916720 69488 1202668 0 0 0 3386 1902 1765 25 3 72 1 0
9 1 0 5907728 69532 1204212 0 0 0 1392 5375 4510 88 8 3 1 0
7 0 0 5886584 69672 1205096 0 0 0 1472 5278 4520 88 10 2 0 0
8 0 0 5877384 69688 1206384 0 0 0 1364 5312 4522 89 8 2 1 0
8 0 0 5869332 69748 1207188 0 0 0 1296 5285 4437 88 8 3 1 0
6 1 0 5854404 69852 1208776 0 0 0 2955 5333 4518 88 9 2 0 0

10 times less bo (blocks out)
5 times less wa (percentage of time spent by cpu waiting to IO)
2 times less b (wait Queue – Process which are waiting for I/O)

the overall test result was (just?) ~15% better...

when i created the ramdisk with mkfs.ext4 (instead of the default ext2),
the performance was the same (~15% better), but vmstat output looked much
the same as before (without the ramdisk) !?? why is that?

as i mentioned, the procedure is practically read-only. shouldn't i expect
bo (blocks out) to be ~0? after forcing temp-tables to be in the RAM, what
other reasons may be the cause for bo (blocks out)?

i see no point pasting the whole procedure here, since it's very long. the
general course of the procedure is:
create temp-tables if they are not exist (practically, they do exist)
do a lot of: insert into temp-table select from table
and : insert into temp-table select from table join temp-table....
after finished insert into temp-table: analyze temp-table (this was the
only way the optimizer behaved properly)
finally, open refcursors of select from temp-tables

Thanks again.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: scale up (postgresql vs mssql)
Date: 2012-05-03 13:40:16
Message-ID: CAHyXU0w8pqSArePOdwKxf8DfvPZ_uPayngGx1_4PC3kPJ+7CVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, Apr 29, 2012 at 8:21 AM, Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il> wrote:
> hi, all.
>
> well, i wondered why there is high rate of bo (blocks out). the procedure is
> practically read-only during the whole test. although it's not strictly
> read-only, because in a certain condition, there might be writing to a
> certain table. but that condition can not be met during this test.
>
> so, i created a ramdisk:
> mkfs -q /dev/ram2 100000
> mkdir -p /ram4
> mount /dev/ram2 /ram4
> df -H | grep /ram4
>
> and then:
> CREATE TABLESPACE pgram4 OWNER postgres LOCATION '/ram4';
>
> and in postgresql.conf i configured:
> temp_tablespaces = 'pgram4'
>
> now, i believe, all the temp-table were in RAM.
> vmstat showed:
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 6 0 0 5916720 69488 1202668 0 0 0 3386 1902 1765 25 3 72 1 0
> 9 1 0 5907728 69532 1204212 0 0 0 1392 5375 4510 88 8 3 1 0
> 7 0 0 5886584 69672 1205096 0 0 0 1472 5278 4520 88 10 2 0 0
> 8 0 0 5877384 69688 1206384 0 0 0 1364 5312 4522 89 8 2 1 0
> 8 0 0 5869332 69748 1207188 0 0 0 1296 5285 4437 88 8 3 1 0
> 6 1 0 5854404 69852 1208776 0 0 0 2955 5333 4518 88 9 2 0 0
>
> 10 times less bo (blocks out)
> 5 times less wa (percentage of time spent by cpu waiting to IO)
> 2 times less b (wait Queue – Process which are waiting for I/O)
>
> the overall test result was (just?) ~15% better...
>
> when i created the ramdisk with mkfs.ext4 (instead of the default ext2), the
> performance was the same (~15% better), but vmstat output looked much the
> same as before (without the ramdisk) !?? why is that?
>
> as i mentioned, the procedure is practically read-only. shouldn't i expect
> bo (blocks out) to be ~0? after forcing temp-tables to be in the RAM, what
> other reasons may be the cause for bo (blocks out)?
>
> i see no point pasting the whole procedure here, since it's very long. the
> general course of the procedure is:
> create temp-tables if they are not exist (practically, they do exist)
> do a lot of: insert into temp-table select from table
> and         : insert into temp-table select from table join temp-table....
> after finished insert into temp-table: analyze temp-table (this was the only
> way the optimizer behaved properly)
> finally, open refcursors of select from temp-tables

i/o writes from read queries can be caused by a couple of things:
*) sorts, and other 'spill to disk' features of large queries
*) hint bits (what I think is happening in your case):

the first time a tuple is touched after it's controlling transaction
is committed, the transaction's state (committed or aborted) is saved
on the tuple itself to optimize subsequent accesses. for most
workloads this is barely noticeable but it can show up if you're
moving a lot of records around per transaction.

merlin


From: Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: scale up (postgresql vs mssql)
Date: 2012-05-03 17:07:55
Message-ID: CAMiEbch3b1zTEASn5+geGXivb_BNPvZ1VnxQ_yLS-RParM35Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

guess what:

after reducing bo (blocks out) to ~10% by using a ramdisk (improving
overall performance by ~15-20%), i now managed to reduced it to ~3% by ....
removing the "analyze temp-table" statements.
it also :
reduced b (Process which are waiting for I/O) to zero
reduced wa (percentage of time spent by cpu for waiting to IO) to zero
and reduced id (cpu idle time percent) to be 4 times less.

r b swpd free buff cache si so bi bo in cs us sy id wa st
8 0 0 6144048 237472 485640 0 0 0 40 4380 3237 79 5 16 0 0
8 0 0 6138216 238032 485736 0 0 0 40 4741 3506 93 7 0 0 0
8 0 0 6125256 238276 486484 0 0 0 2709 4801 3447 92 7 1 0 0
7 0 0 6119400 238376 485792 0 0 0 32 4854 4311 93 6 1 0 0
5 0 0 6105624 238476 486172 0 0 0 364 4783 3430 92 7 1 0 0
5 0 0 6092956 238536 485384 0 0 0 416 4954 3652 91 8 2 0 0

unfortunately, this time there was no significant performance gain. ):

i afraid now there are certain statements that do not use an optimal
query-plan. these statements looks like:
insert into temp-table1 (value) select table1.f1 from table1 join
temp-table2 on table1.recid=temp-table2.recid where table1.f2 in (x,y,z);
temp-table2 never contains more then 10 records.
there is an index on table1: recid,f2
previous tests showed that the query-optimizer normally chose to do
hash-join (i.e: ignoring the index), but once we did "analyze
temp-table2;", the index was used. i read somewhere that the optimizer's
assumption is that every temp-table contains 1k of records. i believe that
is the reason for the bad plan. we tried to do "set
enable_hashjoin=false;", but it did not seem to be working inside a
function (although it did work independently). what can we do about that?

another thing i found is that a sequence on a temp-table is being stored on
the current tablespace, and not on the temp_tablespace. would you consider
this as a bug?
anyway, i found a way to not using any sequences on the temp-tables. but
this did not change the bo (blocks-out) figures.

merlin,
about the Hint Bits. i read this article:
http://wiki.postgresql.org/wiki/Hint_Bits
as far as i understand, this is not the case here, because i ran the test
many times, and there were no DML operations at all in between. so i
believe that the hint-bits are already cleared in most of the tuples.

Thanks again for any more help.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: scale up (postgresql vs mssql)
Date: 2012-05-04 13:04:09
Message-ID: CAHyXU0xNx4+kj8KC25PhohHMAuFXr6eQ_ADkzJCpUDqsuFm8YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, May 3, 2012 at 12:07 PM, Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il> wrote:
> guess what:
>
> after reducing bo (blocks out) to ~10% by using a ramdisk (improving overall
> performance by ~15-20%), i now managed to reduced it to ~3% by .... removing
> the "analyze temp-table" statements.
> it also :
> reduced b (Process which are waiting for I/O) to zero
> reduced wa (percentage of time spent by cpu for waiting to IO) to zero
> and reduced id (cpu idle time percent) to be 4 times less.
>
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 8 0 0 6144048 237472 485640 0 0 0 40 4380 3237 79 5 16 0 0
> 8 0 0 6138216 238032 485736 0 0 0 40 4741 3506 93 7 0 0 0
> 8 0 0 6125256 238276 486484 0 0 0 2709 4801 3447 92 7 1 0 0
> 7 0 0 6119400 238376 485792 0 0 0 32 4854 4311 93 6 1 0 0
> 5 0 0 6105624 238476 486172 0 0 0 364 4783 3430 92 7 1 0 0
> 5 0 0 6092956 238536 485384 0 0 0 416 4954 3652 91 8 2 0 0
>
>
> unfortunately, this time there was no significant performance gain. ):
>
> i afraid now there are certain statements that do not use an optimal
> query-plan. these statements looks like:
> insert into temp-table1 (value) select table1.f1 from table1 join
> temp-table2 on table1.recid=temp-table2.recid where table1.f2 in (x,y,z);
> temp-table2 never contains more then 10 records.
> there is an index on table1: recid,f2
> previous tests showed that the query-optimizer normally chose to do
> hash-join (i.e: ignoring the index), but once we did "analyze temp-table2;",
> the index was used. i read somewhere that the optimizer's assumption is that
> every temp-table contains 1k of records. i believe that is the reason for
> the bad plan. we tried to do "set enable_hashjoin=false;", but it did not
> seem to be working inside a function (although it did work independently).
> what can we do about that?

let's see the query plan...when you turned it off, did it go faster?
put your suspicious plans here: http://explain.depesz.com/

> another thing i found is that a sequence on a temp-table is being stored on
> the current tablespace, and not on the temp_tablespace. would you consider
> this as a bug?
> anyway, i found a way to not using any sequences on the temp-tables. but
> this did not change the bo (blocks-out) figures.
>
> merlin,
> about the Hint Bits. i read this
> article: http://wiki.postgresql.org/wiki/Hint_Bits
> as far as i understand, this is not the case here, because i ran the test
> many times, and there were no DML operations at all in between. so i believe
> that the hint-bits are already cleared in most of the tuples.

yeah. well, your query was an insert? that would naturally result in
blocks out.

merlin


From: Robert Klemme <shortcutter(at)googlemail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il>, pgsql-performance(at)postgresql(dot)org
Subject: Re: scale up (postgresql vs mssql)
Date: 2012-05-09 07:11:42
Message-ID: CAM9pMnNxRQC73wmBKOwSgyBH1X-p0Z1n=hAxmLmT1owN18TZOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, May 4, 2012 at 3:04 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> let's see the query plan...when you turned it off, did it go faster?
> put your suspicious plans here: http://explain.depesz.com/

I suggest to post three plans:

1. insert into temp table
2. access to temp table before analyze
3. access to temp table after analyze

Maybe getting rid of the temp table (e.g. using a view or even an
inline view) is even better than optimizing temp table access.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Klemme <shortcutter(at)googlemail(dot)com>
Cc: Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il>, pgsql-performance(at)postgresql(dot)org
Subject: Re: scale up (postgresql vs mssql)
Date: 2012-05-09 12:53:01
Message-ID: CAHyXU0wHndzyvo+WmLdu0V6Xhj5cCdVCumf3qZShxHYh5RFFxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, May 9, 2012 at 2:11 AM, Robert Klemme
<shortcutter(at)googlemail(dot)com> wrote:
> On Fri, May 4, 2012 at 3:04 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>> let's see the query plan...when you turned it off, did it go faster?
>> put your suspicious plans here: http://explain.depesz.com/
>
> I suggest to post three plans:
>
> 1. insert into temp table
> 2. access to temp table before analyze
> 3. access to temp table after analyze
>
> Maybe getting rid of the temp table (e.g. using a view or even an
> inline view) is even better than optimizing temp table access.

yeah -- perhaps a CTE might work as well.

merlin