tuplestore potential performance problem

Lists: pgsql-hackers
From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: tuplestore potential performance problem
Date: 2008-12-03 14:15:50
Message-ID: e08cc0400812030615rd5c6462we46ac362628c21c6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While attacking this issue(*1), I found that tuplestore that is on the
file status has potential performance problem.

The performance problem introduced by Heikki's new approach was caused
by BufFile's frequent flush out in such cases like you put a new row
into it and read middle row of it then put another row again, and so
on. When tuplestore switches its internal mode from TSS_WRITEFILE to
TSS_READFILE, underlying BufFile seeks to read pointer and flushes out
its dirty buffer if the reading pointer is not near the writing
pointer. Also, reading to writing switch avoids OS disk cache benefit.

This is not critical in TSS_INMEM.

So I decided to keep writing until finish if the tuplestore gets in
file mode from memory mode rather than switching reading and writing
randomly, which recovers the earlier performance almost. I am not sure
but am afraid that the nodeCtescan also uses similar logic. Doesn't
CTE have any problem for large data set?

Regards,

*1:http://archives.postgresql.org/pgsql-hackers/2008-12/msg00077.php

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplestore potential performance problem
Date: 2008-12-03 14:30:55
Message-ID: 7757.1228314655@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
> While attacking this issue(*1), I found that tuplestore that is on the
> file status has potential performance problem.

> The performance problem introduced by Heikki's new approach was caused
> by BufFile's frequent flush out in such cases like you put a new row
> into it and read middle row of it then put another row again, and so
> on. When tuplestore switches its internal mode from TSS_WRITEFILE to
> TSS_READFILE, underlying BufFile seeks to read pointer and flushes out
> its dirty buffer if the reading pointer is not near the writing
> pointer. Also, reading to writing switch avoids OS disk cache benefit.

> This is not critical in TSS_INMEM.

> So I decided to keep writing until finish if the tuplestore gets in
> file mode from memory mode rather than switching reading and writing
> randomly, which recovers the earlier performance almost. I am not sure
> but am afraid that the nodeCtescan also uses similar logic. Doesn't
> CTE have any problem for large data set?

If this means a lot of contortion/complication in the upper-level code,
seems like it'd be better to address the performance issue within
tuplestore/buffile. We could keep separate buffers for write and read
perhaps. But do you have real evidence of a performance problem?
I'd sort of expect the kernel's disk cache to mitigate this pretty well.

regards, tom lane


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplestore potential performance problem
Date: 2008-12-03 14:42:56
Message-ID: e08cc0400812030642m506eca24mb6d9f87304e2f378@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/12/3 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> If this means a lot of contortion/complication in the upper-level code,
> seems like it'd be better to address the performance issue within
> tuplestore/buffile. We could keep separate buffers for write and read
> perhaps. But do you have real evidence of a performance problem?
> I'd sort of expect the kernel's disk cache to mitigate this pretty well.
>
> regards, tom lane
>
I don't have real evidence but reasoned it. No strace was done. So it
may not be cased by flushing out but this commit gets performance
quite better, to earlier patch performance, around 44sec from around
76sec.

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=87d9b8ac5dca9fae5f3ac4f3218d8fb4eca8b5b0;hp=f1976a9d002b20006ac31ca85db27abcf56e9ea2

where pos = -1 means spool all rows until the end.

The "earlier" approach was buffering all the table and the newer
Heikki's approach was buffer on row by row while reading. The newest
is buffering row by row while reading during in memory, and holding
all the remaining tuples before reading after out to file, something
like hybrid method.

Regards,

--
Hitoshi Harada


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplestore potential performance problem
Date: 2008-12-03 14:49:16
Message-ID: e08cc0400812030649q37e913b8m6148e7099c5031a8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I don't have real evidence but reasoned it. No strace was done. So it
> may not be cased by flushing out but this commit gets performance
> quite better, to earlier patch performance, around 44sec from around
> 76sec.
>

Oh, I mean, 116sec to 44sec.

--
Hitoshi Harada


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplestore potential performance problem
Date: 2009-01-15 01:32:16
Message-ID: 200901150132.n0F1WGe10799@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Has this been addressed?

---------------------------------------------------------------------------

Hitoshi Harada wrote:
> 2008/12/3 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> > If this means a lot of contortion/complication in the upper-level code,
> > seems like it'd be better to address the performance issue within
> > tuplestore/buffile. We could keep separate buffers for write and read
> > perhaps. But do you have real evidence of a performance problem?
> > I'd sort of expect the kernel's disk cache to mitigate this pretty well.
> >
> > regards, tom lane
> >
> I don't have real evidence but reasoned it. No strace was done. So it
> may not be cased by flushing out but this commit gets performance
> quite better, to earlier patch performance, around 44sec from around
> 76sec.
>
> http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=87d9b8ac5dca9fae5f3ac4f3218d8fb4eca8b5b0;hp=f1976a9d002b20006ac31ca85db27abcf56e9ea2
>
> where pos = -1 means spool all rows until the end.
>
> The "earlier" approach was buffering all the table and the newer
> Heikki's approach was buffer on row by row while reading. The newest
> is buffering row by row while reading during in memory, and holding
> all the remaining tuples before reading after out to file, something
> like hybrid method.
>
> Regards,
>
> --
> Hitoshi Harada
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplestore potential performance problem
Date: 2009-01-15 05:29:08
Message-ID: e08cc0400901142129k283f37b1q63e7d8eb4a31aeb1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/1/15 Bruce Momjian <bruce(at)momjian(dot)us>:
>
> Has this been addressed?

It is mentioned at

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01849.php

* Look at tuplestore performance issues. The tuplestore_in_memory()
thing is just a band-aid, we ought to try to solve it properly.
tuplestore_advance seems like a weak spot as well.

but not solved yet. It seems to me that to solve this the tuplestore's
inside design should be changed much. In-file state doesn't use memory
any more but it should be re-used for writing buffer, whereas the
current desgin uses BufFile to do it, which causes tell/seek overhead
for repeated put/get operation. And this is not for 8.4, I guess.

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplestore potential performance problem
Date: 2009-01-15 14:51:00
Message-ID: 5413.1232031060@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> writes:
> 2009/1/15 Bruce Momjian <bruce(at)momjian(dot)us>:
>> Has this been addressed?

> It is mentioned at
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01849.php
> but not solved yet. It seems to me that to solve this the tuplestore's
> inside design should be changed much. In-file state doesn't use memory
> any more but it should be re-used for writing buffer, whereas the
> current desgin uses BufFile to do it, which causes tell/seek overhead
> for repeated put/get operation. And this is not for 8.4, I guess.

I was thinking of something a lot simpler, like just allowing BufFile
to maintain separate buffers for read and write pointers. But in any
case it's not likely to get done for 8.4.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplestore potential performance problem
Date: 2009-01-15 15:36:29
Message-ID: 200901151536.n0FFaTQ27055@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada wrote:
> 2009/1/15 Bruce Momjian <bruce(at)momjian(dot)us>:
> >
> > Has this been addressed?
>
> It is mentioned at
>
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01849.php
>
> * Look at tuplestore performance issues. The tuplestore_in_memory()
> thing is just a band-aid, we ought to try to solve it properly.
> tuplestore_advance seems like a weak spot as well.
>
> but not solved yet. It seems to me that to solve this the tuplestore's
> inside design should be changed much. In-file state doesn't use memory
> any more but it should be re-used for writing buffer, whereas the
> current desgin uses BufFile to do it, which causes tell/seek overhead
> for repeated put/get operation. And this is not for 8.4, I guess.

Oh, I see it is already on the TODO list; great. I have added
reference URL to the TODO.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +