PostgreSQL block size vs. LVM2 stripe width

Lists: pgsql-hackers
From: markw(at)osdl(dot)org
To: pgsql-hackers(at)postgresql(dot)org
Cc: linux-lvm(at)redhat(dot)com, linux-ia64(at)vger(dot)kernel(dot)org
Subject: PostgreSQL block size vs. LVM2 stripe width
Date: 2004-03-26 22:00:01
Message-ID: 200403262200.i2QM04223972@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have some results from DBT-2 testing PostgreSQL with difference block
sizes against different lvm stripe widths on Linux. I've found that
iostat appears to report more erratic numbers as the block size of the
database increases but I'm not able to see any reason for it.

I have pg_xlog on a separate set of drives from the rest of the database
and was wondering if having different block sizes for the log and the
data has been discusses?

Or does anyone have any tips for an optimal combination of settings?

Here's a summary from an Itanium2 system, where bigger is better:

Linux-2.6.3, LVM2 Stripe Width
(going across)
PostgreSQL
BLCKSZ
(going down) 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB
2 KB 2617 2656 2652 2664 2667 2642
4 KB 4393 4486 4577 4557 4511 4448
8 KB 4337 4423 4471 4576 4111 3642
16 KB 4412 4495 4532 4536 2985 2312
32 KB 3705 3784 3886 3925 2936 2362

Links to more data:
http://developer.osdl.org/markw/lvm2/blocks.html

Mark


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org, linux-lvm(at)redhat(dot)com, linux-ia64(at)vger(dot)kernel(dot)org
Subject: Re: PostgreSQL block size vs. LVM2 stripe width
Date: 2004-03-27 22:03:09
Message-ID: 06sb60hni6h5cok941gqnuo4ld1b0v1rgi@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

how often did you run your tests? Are the results reproduceable?

On Fri, 26 Mar 2004 14:00:01 -0800 (PST), markw(at)osdl(dot)org wrote:
> Linux-2.6.3, LVM2 Stripe Width
> (going across)
>PostgreSQL
>BLCKSZ
>(going down) 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB
>2 KB 2617 2656 2652 2664 2667 2642
>4 KB 4393 4486 4577 4557 4511 4448
>8 KB 4337 4423 4471 4576 4111 3642
>16 KB 4412 4495 4532 4536 2985 2312
>32 KB 3705 3784 3886 3925 2936 2362

Unless someone can present at least an idea of a theory why a BLCKSZ of
8 KB is at a local minimum (1 or 2% below the neighbouring values) for
stripe widths up to 64 KB I'm not sure whether we can trust these
numbers.

Before I hit the send button, I did a quick check of the link you
provided. The links in the table contain the following test numbers:

16 KB 32 KB 64 KB 128 KB 256 KB 512 KB
2 KB 72 71 70 69 66 65
4 KB 64 63 62 61 60 58
8 KB 54 53 52 51 50 49
16 KB 79 78 77 76 75 74
32 KB 86 85 84 83 82 80

Does this mean that you first ran all test with 8 KB, then with 4, 2, 16
and 32 KB BLCKSZ? If so, I suspect that you are measuring the effects
of something different.

Servus
Manfred


From: markw(at)osdl(dot)org
To: mkoi-pg(at)aon(dot)at
Cc: pgsql-hackers(at)postgresql(dot)org, linux-lvm(at)redhat(dot)com, linux-ia64(at)vger(dot)kernel(dot)org
Subject: Re: PostgreSQL block size vs. LVM2 stripe width
Date: 2004-03-29 16:50:42
Message-ID: 200403291650.i2TGop212446@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Manfred,

On 27 Mar, Manfred Koizar wrote:
> Mark,
>
> how often did you run your tests? Are the results reproduceable?

In this case, I've only done 1 per each combination. I've found the
results for this test to be reproduceable.

> On Fri, 26 Mar 2004 14:00:01 -0800 (PST), markw(at)osdl(dot)org wrote:
>> Linux-2.6.3, LVM2 Stripe Width
>> (going across)
>>PostgreSQL
>>BLCKSZ
>>(going down) 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB
>>2 KB 2617 2656 2652 2664 2667 2642
>>4 KB 4393 4486 4577 4557 4511 4448
>>8 KB 4337 4423 4471 4576 4111 3642
>>16 KB 4412 4495 4532 4536 2985 2312
>>32 KB 3705 3784 3886 3925 2936 2362
>
> Unless someone can present at least an idea of a theory why a BLCKSZ of
> 8 KB is at a local minimum (1 or 2% below the neighbouring values) for
> stripe widths up to 64 KB I'm not sure whether we can trust these
> numbers.
>
> Before I hit the send button, I did a quick check of the link you
> provided. The links in the table contain the following test numbers:
>
> 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB
> 2 KB 72 71 70 69 66 65
> 4 KB 64 63 62 61 60 58
> 8 KB 54 53 52 51 50 49
> 16 KB 79 78 77 76 75 74
> 32 KB 86 85 84 83 82 80
>
> Does this mean that you first ran all test with 8 KB, then with 4, 2, 16
> and 32 KB BLCKSZ? If so, I suspect that you are measuring the effects
> of something different.

Yes, that's correct, but why do you suspect that?

Mark


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: markw(at)osdl(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org, linux-lvm(at)redhat(dot)com, linux-ia64(at)vger(dot)kernel(dot)org
Subject: Re: PostgreSQL block size vs. LVM2 stripe width
Date: 2004-03-29 22:42:54
Message-ID: 5r5h60pus29i6hf4eftka319gmqlrd6ies@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 29 Mar 2004 08:50:42 -0800 (PST), markw(at)osdl(dot)org wrote:
>In this case, I've only done 1 per each combination. I've found the
>results for this test to be reproduceable.

Pardon?

>>> Linux-2.6.3, LVM2 Stripe Width
>>>BLCKSZ
>>>(going down) 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB
>>>2 KB 2617 2656 2652 2664 2667 2642
>>>4 KB 4393 4486 4577 4557 4511 4448
>>>8 KB 4337 4423 4471 4576 4111 3642
>>>16 KB 4412 4495 4532 4536 2985 2312
>>>32 KB 3705 3784 3886 3925 2936 2362

>> Does this mean that you first ran all test with 8 KB, then with 4, 2, 16
>> and 32 KB BLCKSZ? If so, I suspect that you are measuring the effects
>> of something different.
>
>Yes, that's correct, but why do you suspect that?

Gut feelings, hard to put into words. Let me try:

Nobody really knows what the "optimal" BLCKSZ is. Most probably it
depends on the application, OS, hardware, and other factors. 8 KB is
believed to be a good general purpose BLCKSZ.

I wouldn't be surprised if 8 KB turns out to be suboptimal in one or the
other case (or even in most cases). But if so, I would expect it to be
either too small or too large.

In your tests, however, there are three configurations where 8 KB is
slower than both 4 KB and 16 KB. Absent any explanation for this
interesting effect, it is easier to mistrust your numbers.

If you run your tests in the opposite order, on the same hardware, in
the same freshly formatted partitions, and you get the same results,
that would be an argument in favour of their accurancy.

Maybe we find out that those 1.5% are just noise.

Servus
Manfred


From: markw(at)osdl(dot)org
To: mkoi-pg(at)aon(dot)at
Cc: pgsql-hackers(at)postgresql(dot)org, linux-lvm(at)redhat(dot)com, linux-ia64(at)vger(dot)kernel(dot)org
Subject: Re: PostgreSQL block size vs. LVM2 stripe width
Date: 2004-03-29 22:52:35
Message-ID: 200403292252.i2TMqi222698@mail.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30 Mar, Manfred Koizar wrote:
> On Mon, 29 Mar 2004 08:50:42 -0800 (PST), markw(at)osdl(dot)org wrote:
>>In this case, I've only done 1 per each combination. I've found the
>>results for this test to be reproduceable.
>
> Pardon?

I haven't repeated any runs for each combination, e.g. 1 test with 16kb
lvm stripe width and 2kb BLCKSZ, 1 test with 16kb lvm stripe width and
4kb BLCKSZ...

>>>> Linux-2.6.3, LVM2 Stripe Width
>>>>BLCKSZ
>>>>(going down) 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB
>>>>2 KB 2617 2656 2652 2664 2667 2642
>>>>4 KB 4393 4486 4577 4557 4511 4448
>>>>8 KB 4337 4423 4471 4576 4111 3642
>>>>16 KB 4412 4495 4532 4536 2985 2312
>>>>32 KB 3705 3784 3886 3925 2936 2362
>
>>> Does this mean that you first ran all test with 8 KB, then with 4, 2, 16
>>> and 32 KB BLCKSZ? If so, I suspect that you are measuring the effects
>>> of something different.
>>
>>Yes, that's correct, but why do you suspect that?
>
> Gut feelings, hard to put into words. Let me try:
>
> Nobody really knows what the "optimal" BLCKSZ is. Most probably it
> depends on the application, OS, hardware, and other factors. 8 KB is
> believed to be a good general purpose BLCKSZ.
>
> I wouldn't be surprised if 8 KB turns out to be suboptimal in one or the
> other case (or even in most cases). But if so, I would expect it to be
> either too small or too large.
>
> In your tests, however, there are three configurations where 8 KB is
> slower than both 4 KB and 16 KB. Absent any explanation for this
> interesting effect, it is easier to mistrust your numbers.
>
> If you run your tests in the opposite order, on the same hardware, in
> the same freshly formatted partitions, and you get the same results,
> that would be an argument in favour of their accurancy.
>
> Maybe we find out that those 1.5% are just noise.

I did reformat each partition between tests. :) When I have tested for
repeatability in the past I have found results to fluxuate up to 5%, so
I would claim the 1.5% to be noise.

Mark