RAM-only temporary tables

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: RAM-only temporary tables
Date: 2008-11-05 23:00:56
Message-ID: 4911D148.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

An idea for a possible enhancement to PostgreSQL: allow creation of a
temporary table without generating any disk I/O. (Creating and
dropping a three-column temporary table within a database transaction
currently generates about 150 disk writes).

If some circumstances don't make it feasible to always do this as a
RAM-only operation, perhaps a clause could be added to the CREATE
TEMPORARY TABLE syntax to specify this behavior along with whatever
limitations on the temporary table are required for this to work.
(For example, maybe this is only feasible if the table will be dropped
by the end of the transaction?)

-Kevin


From: "Alex Hunsaker" <badalex(at)gmail(dot)com>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-05 23:57:07
Message-ID: 34d269d40811051557na24a796m48dc367ebcc71973@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 5, 2008 at 16:00, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> An idea for a possible enhancement to PostgreSQL: allow creation of a
> temporary table without generating any disk I/O. (Creating and
> dropping a three-column temporary table within a database transaction
> currently generates about 150 disk writes).

Well arguably you can already do this...

create tablespace ram location '/dev/shm/';
create temp table (junk int) tablespace ram;

In theory because only one backend needs the temp table we should be
able to use mmap() to fake it being a file as well without to much
trouble. But im sure I am missing something...


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 00:06:35
Message-ID: 4912350B.5030906@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:
> An idea for a possible enhancement to PostgreSQL: allow creation of a
> temporary table without generating any disk I/O. (Creating and
> dropping a three-column temporary table within a database transaction
> currently generates about 150 disk writes).
>
> If some circumstances don't make it feasible to always do this as a
> RAM-only operation, perhaps a clause could be added to the CREATE
> TEMPORARY TABLE syntax to specify this behavior along with whatever
> limitations on the temporary table are required for this to work.
> (For example, maybe this is only feasible if the table will be dropped
> by the end of the transaction?)

As someone else already pointed out you can put temp tables on a RAM
disk, but the larger issue is that temp tables still cause system table
churn which will always need to be on stable media.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 01:42:30
Message-ID: 20081106014230.GX4114@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:
> An idea for a possible enhancement to PostgreSQL: allow creation of a
> temporary table without generating any disk I/O. (Creating and
> dropping a three-column temporary table within a database transaction
> currently generates about 150 disk writes).

Most of these are catalog updates. A trace of WAL logs including only
heap inserts says that to create a temp table with 3 columns (2 int, 1
text) and no indexes there are this many inserts:

3 1247 (pg_type)
20 1249 (pg_attribute)
3 1259 (pg_class)
7 2608 (pg_depend)
1 2610 (pg_index)

Note the excess of pg_attribute entries! There are 3 in the table, 3 in
the toast table, and then there are 14 extra attrs which are for system
columns (7 for the main table, 7 for the toast table). Just getting rid
of pg_attribute entries for those would probably prove to be an
importante gain. (Don't forget the index updates for each of those heap
inserts; for pg_type it's 2 btree inserts for each index insert.) If
you do this, you've shaved 42 of those 150 writes.

Perhaps another gain is getting rid of array types for temp tables; a
new feature of 8.3. Who wants those anyway? That also removes one or
two of the pg_depend entries. Maybe these extra array types are the
additional 8.3 overhead.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 15:45:55
Message-ID: 4912BCD3.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Kevin Grittner wrote:
>> An idea for a possible enhancement to PostgreSQL: allow creation of
a
>> temporary table without generating any disk I/O. (Creating and
>> dropping a three-column temporary table within a database
transaction
>> currently generates about 150 disk writes).
>
> Most of these are catalog updates. A trace of WAL logs including
only
> heap inserts says that to create a temp table with 3 columns (2 int,
1
> text) and no indexes there are this many inserts:
>
> 3 1247 (pg_type)
> 20 1249 (pg_attribute)
> 3 1259 (pg_class)
> 7 2608 (pg_depend)
> 1 2610 (pg_index)
>
> Note the excess of pg_attribute entries! There are 3 in the table, 3
in
> the toast table, and then there are 14 extra attrs which are for
system
> columns (7 for the main table, 7 for the toast table). Just getting
rid
> of pg_attribute entries for those would probably prove to be an
> importante gain. (Don't forget the index updates for each of those
heap
> inserts; for pg_type it's 2 btree inserts for each index insert.)
If
> you do this, you've shaved 42 of those 150 writes.

Note that the 150 disk writes were for the CREATE and the DROP. Does
that mean that we'd actually shave 84 of 150 writes?

Also, if you're looking to account for all the writes, it's worth
noting that my test declared a one-column primary key (on an integer
column) in the CREATE TEMPORARY TABLE statement.

In suggesting this enhancement, my hope is that each session could
check for a referenced table as a temporary in RAM before going to the
system tables, in a manner vaguely similar to how space reserved by
the temp_buffers GUC is used for temp table data. I, of course, am
suggesting this from a position of blissful ignorance of the actual
complexity of making such a change.

-Kevin


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 15:53:10
Message-ID: 20081106155310.GE5520@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:

> Note that the 150 disk writes were for the CREATE and the DROP. Does
> that mean that we'd actually shave 84 of 150 writes?

Hmm, you'd shave more than 42 but not 84, because index entries are not
deleted until a later vacuum. (I'd say about 56 -- 42 plus the 14 heap
deletions).

> Also, if you're looking to account for all the writes, it's worth
> noting that my test declared a one-column primary key (on an integer
> column) in the CREATE TEMPORARY TABLE statement.

That probably makes up for the extra few writes that I didn't see in my
quick test.

> In suggesting this enhancement, my hope is that each session could
> check for a referenced table as a temporary in RAM before going to the
> system tables, in a manner vaguely similar to how space reserved by
> the temp_buffers GUC is used for temp table data. I, of course, am
> suggesting this from a position of blissful ignorance of the actual
> complexity of making such a change.

Right -- I don't expect we can make use of such an idea readily. Not
creating unnecessary pg_attribute entries for system columns is probably
a lot easier to do. The idea of uncatalogued temp tables has been
suggested and rejected several times in the past.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 16:13:16
Message-ID: 4912C33C.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Kevin Grittner wrote:
>
>> Note that the 150 disk writes were for the CREATE and the DROP.
Does
>> that mean that we'd actually shave 84 of 150 writes?
>
> Hmm, you'd shave more than 42 but not 84, because index entries are
not
> deleted until a later vacuum. (I'd say about 56 -- 42 plus the 14
heap
> deletions).
>
>> Also, if you're looking to account for all the writes, it's worth
>> noting that my test declared a one-column primary key (on an
integer
>> column) in the CREATE TEMPORARY TABLE statement.
>
> That probably makes up for the extra few writes that I didn't see in
my
> quick test.

It sounds like you were counting the 8kB pages pushed from the
PostgreSQL cache to the OS cache, and I was counting the 1kB blocks
pushed from the OS cache to the RAID controller cache. By watching
vmstat results after pushing this to a more-or-less steady state, I
was probably picking up the results of autovacuum runs, but multiple
writes to a single page were often combined by the OS. If we match,
it's really just a coincidence.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 16:17:47
Message-ID: 27329.1225988267@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Note that the 150 disk writes were for the CREATE and the DROP. Does
> that mean that we'd actually shave 84 of 150 writes?

It really shouldn't be the case that each system catalog tuple insertion
generates a separate write --- especially not for multiple insertions
into the same catalog, which we could expect to go into the same page or
few pages.

I think a large fraction of the writes you're measuring are coming from
the file create/unlink operations. It would certainly be important to
identify where the bulk of the cost *really* is before we start
expending effort on a solution.

> In suggesting this enhancement, my hope is that each session could
> check for a referenced table as a temporary in RAM before going to the
> system tables, in a manner vaguely similar to how space reserved by
> the temp_buffers GUC is used for temp table data.

This isn't very workable. For one thing, client-side operations such as
psql's \dt still need to see catalog entries for temp tables.

There's been some handwaving about keeping catalog entries for temp
tables in temp children of the main system catalogs, but it hasn't got
past the handwaving stage.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 16:22:08
Message-ID: 20081106162208.GH5520@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:

> It sounds like you were counting the 8kB pages pushed from the
> PostgreSQL cache to the OS cache, and I was counting the 1kB blocks
> pushed from the OS cache to the RAID controller cache. By watching
> vmstat results after pushing this to a more-or-less steady state, I
> was probably picking up the results of autovacuum runs, but multiple
> writes to a single page were often combined by the OS. If we match,
> it's really just a coincidence.

I was counting WAL inserts actually, so "logical database operations".
I left out "storage" items (filesystem actions), so it was mainly just
catalog changes.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 16:23:38
Message-ID: 27417.1225988618@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Right -- I don't expect we can make use of such an idea readily. Not
> creating unnecessary pg_attribute entries for system columns is probably
> a lot easier to do.

I seem to recall having proposed that in the past, and getting shot down
on the basis that clients might be depending on those pg_attribute
entries being there. I'm not sure how big a risk there really is ---
most of the code I've seen explicitly selects attnum > 0 --- but it's a
consideration.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 16:38:37
Message-ID: 4912C92D.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I think a large fraction of the writes you're measuring are coming
from
> the file create/unlink operations. It would certainly be important
to
> identify where the bulk of the cost *really* is before we start
> expending effort on a solution.

Any ideas on a good way to gather that information?

Given the temp_buffers space, would it make sense to defer the
creation of the actual file until there is actually a need to spill
data to the disk?

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 17:14:07
Message-ID: 29286.1225991647@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think a large fraction of the writes you're measuring are coming from
>> the file create/unlink operations. It would certainly be important to
>> identify where the bulk of the cost *really* is before we start
>> expending effort on a solution.

> Any ideas on a good way to gather that information?

I had done some preliminary trials using strace (you need to trace the
active backend, the bgwriter, and the wal writer process to be sure you
see everything going on). However it's difficult to tell how much
physical I/O results from the create or unlink syscalls. It might be
interesting to make a test program that just creates 4000 files and then
removes them again, and see what sort of load you see from that.

> Given the temp_buffers space, would it make sense to defer the
> creation of the actual file until there is actually a need to spill
> data to the disk?

No, because that opens us to problems with reuse of relfilenode numbers.

One place that I've always wanted to look at was suppressing the
creation of a btree metapage until there's some useful data in the
table. We managed to avoid creating a root page until there's data,
but at the time avoiding the metapage seemed too invasive. (Admittedly,
though, if one assumes that your real world case does involve putting
some data in the tables, this wouldn't actually save anything...)

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 17:16:20
Message-ID: 20081106171620.GM5520@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> One place that I've always wanted to look at was suppressing the
> creation of a btree metapage until there's some useful data in the
> table. We managed to avoid creating a root page until there's data,
> but at the time avoiding the metapage seemed too invasive. (Admittedly,
> though, if one assumes that your real world case does involve putting
> some data in the tables, this wouldn't actually save anything...)

Agreed on the parenthised comment -- it'd be just benchmark
optimization.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 18:36:44
Message-ID: 4912E4DC.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> it's difficult to tell how much
> physical I/O results from the create or unlink syscalls. It might
be
> interesting to make a test program that just creates 4000 files

We use xfs with noatime for our databases.

In a fresh subdirectory of such a mountpoint:

for ((i=0 ; i < 4000 ; ++i)) ; do touch $i ; done

causes 44,969 block writes

> and then
> removes them again, and see what sort of load you see from that.

rm *

causes 26,820 block writes

That would make the file creation and unlink just under half the load.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-06 18:47:21
Message-ID: 5922.1225997241@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> That would make the file creation and unlink just under half the load.

Worst possible case :-( ... means that we wouldn't get much improvement
without addressing both aspects.

It strikes me however that this does put some urgency into the question
of how much per-relation FSM is going to cost us. For short-lived temp
tables the FSM is never going to have any usefulness at all, but in the
current HEAD code it'll double the create/unlink load.

Heikki, would it be reasonable to fix things so that a nonexistent FSM
fork is semantically the same as an empty one, and not create FSM until
there's actually something to put in it?

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 16:28:35
Message-ID: 4919B2B3.9070706@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> That would make the file creation and unlink just under half the load.
>
> Worst possible case :-( ... means that we wouldn't get much improvement
> without addressing both aspects.
>
> It strikes me however that this does put some urgency into the question
> of how much per-relation FSM is going to cost us. For short-lived temp
> tables the FSM is never going to have any usefulness at all, but in the
> current HEAD code it'll double the create/unlink load.

Agreed.

> Heikki, would it be reasonable to fix things so that a nonexistent FSM
> fork is semantically the same as an empty one, and not create FSM until
> there's actually something to put in it?

Possibly, but I'd like to understand what exactly the problem is. I
tried running this:

CREATE TEMPORARY TABLE footemp (id int4);
DROP TABLE footemp;

with pgbench -f, but can't see any meaningful difference between 8.3 and
CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off. There
probably is a measurable difference there if you run longer tests, but
doesn't seem like the extra file creation+unlink is worth worrying
about. With the caveat that this is on reiserfs, on my laptop. Does
someone see a difference on other filesystems?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 19:13:01
Message-ID: 10595.1226430781@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> Heikki, would it be reasonable to fix things so that a nonexistent FSM
>> fork is semantically the same as an empty one, and not create FSM until
>> there's actually something to put in it?

> Possibly, but I'd like to understand what exactly the problem is. I
> tried running this:

> CREATE TEMPORARY TABLE footemp (id int4);
> DROP TABLE footemp;

> with pgbench -f, but can't see any meaningful difference between 8.3 and
> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off.

Try several thousand temp tables within one transaction.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 19:20:54
Message-ID: 4919DB16.8090602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> I tried running this:
>
> CREATE TEMPORARY TABLE footemp (id int4);
> DROP TABLE footemp;
>
> with pgbench -f, but can't see any meaningful difference between 8.3 and
> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off. There
> probably is a measurable difference there if you run longer tests, but
> doesn't seem like the extra file creation+unlink is worth worrying
> about. With the caveat that this is on reiserfs, on my laptop. Does
> someone see a difference on other filesystems?

I just tried that on a test server with data directory a small RAID
array with ext3. Results are similar, at ~120 tps with both CVS HEAD and
8.3, and with fsync=off, around ~1700 tps on both versions.

oprofile does suggest that more time is spent in the kernel in CVS HEAD.
CVS HEAD:

37387 13.9383 no-vmlinux postgres (no
symbols)
20604 7.6814 postgres postgres
CatalogCacheFlushRelation
16075 5.9929 postgres postgres
hash_search_with_hash_value
10109 3.7688 postgres postgres
LWLockAcquire
9225 3.4392 postgres postgres
_bt_compare
8973 3.3452 postgres postgres
XLogInsert
8368 3.1197 postgres postgres
LWLockRelease
8009 2.9858 postgres postgres hash_any
6944 2.5888 no-vmlinux no-vmlinux (no
symbols)
4432 1.6523 no-vmlinux pgbench (no
symbols)
4197 1.5647 postgres postgres
AllocSetAlloc
3773 1.4066 libc-2.7.so postgres memcmp
3359 1.2523 postgres postgres
_bt_checkkeys
3315 1.2359 postgres postgres
FunctionCall2
3135 1.1688 postgres postgres
PinBuffer

8.3:

27264 9.3254 no-vmlinux postgres (no
symbols)
20673 7.0710 postgres postgres
CatalogCacheFlushRelation
18576 6.3537 postgres postgres
hash_search_with_hash_value
12795 4.3764 postgres postgres
LWLockAcquire
11565 3.9557 postgres postgres
_bt_compare
11538 3.9465 postgres postgres hash_any
10414 3.5620 postgres postgres
XLogInsert
10100 3.4546 postgres postgres
LWLockRelease
6306 2.1569 postgres postgres
_bt_checkkeys
5096 1.7430 postgres postgres
AllocSetAlloc
4835 1.6538 no-vmlinux no-vmlinux (no
symbols)
4090 1.3989 postgres postgres
PrepareToInvalidateCacheTuple
4021 1.3753 postgres postgres
FunctionCall2
3965 1.3562 no-vmlinux pgbench (no
symbols)
3869 1.3234 libc-2.7.so postgres memcmp

but frankly I can't get too excited about that.

I also quickly tried a test case with ON COMMIT DELETE ROWS temp table.
I modified pgbench slightly, so that it creates a temp table with ON
COMMIT DELETE ROWS after connecting, and then run a script with a simple
one row INSERT to the temp table. The results look similar; I'm getting
~11000-12000 tps on both 8.3 and CVS HEAD. oprofile suggests that about
~50% of the time is spent in kernel, so I'm actually a bit surprised
that the new FSM file isn't hurting more there.

So, I'm not convinced we need to do anything, based on these test. Maybe
I missed the point of the OP; if so, a repeatable test case would be nice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 19:50:21
Message-ID: 4919E1FD.8080309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> Heikki, would it be reasonable to fix things so that a nonexistent FSM
>>> fork is semantically the same as an empty one, and not create FSM until
>>> there's actually something to put in it?
>
>> Possibly, but I'd like to understand what exactly the problem is. I
>> tried running this:
>
>> CREATE TEMPORARY TABLE footemp (id int4);
>> DROP TABLE footemp;
>
>> with pgbench -f, but can't see any meaningful difference between 8.3 and
>> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off.
>
> Try several thousand temp tables within one transaction.

I ran into an interesting problem while doing that. I created a SQL
script with 10000 CREATE TEMPORARY TABLE statements. After testing with
that a few times, I got this:

WARNING: out of shared memory
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.

Not that surprising, but when I then just tried to run a single CREATE
TEMPORARY TABLE in a new psql session, I got the same error. I then
stopped and started postmaster, and I still get the same error! Testing
with trace_locks=on, looks like the table creation takes a lock on all
the temp tables that are already gone, and runs out of memory doing that.

I'll hunt that down, and try benchmarking the "thousands of temp tables
in one transaction" case again after that...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 20:38:47
Message-ID: 4919ED57.7010409@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Try several thousand temp tables within one transaction.
>
> I ran into an interesting problem while doing that. I created a SQL
> script with 10000 CREATE TEMPORARY TABLE statements. After testing with
> that a few times, I got this:
>
> WARNING: out of shared memory
> ERROR: out of shared memory
> HINT: You might need to increase max_locks_per_transaction.
>
> Not that surprising, but when I then just tried to run a single CREATE
> TEMPORARY TABLE in a new psql session, I got the same error. I then
> stopped and started postmaster, and I still get the same error! Testing
> with trace_locks=on, looks like the table creation takes a lock on all
> the temp tables that are already gone, and runs out of memory doing that.

What's happening is that there is simply so many temporary tables in
pg_class that when the new backend tries to clear them with
RemoveTempRelations, it runs out of lock space.

One idea would be to modify RemoveTempRelations to not acquire locks on
the temp objects, but that's pretty ugly and I'm not sure how it would
interact with concurrent DROP TYPE CASCADE or similar. Any better ideas?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 20:41:37
Message-ID: 12155.1226436097@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> What's happening is that there is simply so many temporary tables in
> pg_class that when the new backend tries to clear them with
> RemoveTempRelations, it runs out of lock space.

What happened to the original owner of the tables? It must have had
locks on all those tables ...

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 20:58:55
Message-ID: 20081111205855.GP4272@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:

> oprofile does suggest that more time is spent in the kernel in CVS HEAD.
> CVS HEAD:
>
> 37387 13.9383 no-vmlinux postgres (no
> symbols)

> 8.3:
>
> 27264 9.3254 no-vmlinux postgres (no
> symbols)

What this means is that there are a lot more syscalls in HEAD than 8.3 ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 21:14:30
Message-ID: 4919F5B6.1020509@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> What's happening is that there is simply so many temporary tables in
>> pg_class that when the new backend tries to clear them with
>> RemoveTempRelations, it runs out of lock space.
>
> What happened to the original owner of the tables? It must have had
> locks on all those tables ...

It's pretty easily reproducible with this:

postgres=# CREATE OR REPLACE FUNCTION createtemps (iters int4) RETURNS
VOID LANGUAGE plpgsql AS $$
DECLARE i int4;
BEGIN
i := 0;
WHILE i < iters LOOP
EXECUTE 'CREATE TEMPORARY TABLE "dummytemp' || i || '_' ||
random() ||'" (id int4)';
i := i + 1;
END LOOP;
END;
$$;
CREATE FUNCTION
postgres=# SELECT createtemps(7000);
createtemps
-------------

(1 row)

postgres=# SELECT createtemps(7000);
createtemps
-------------

(1 row)

postgres=# \q
$ psql postgres
psql (8.4devel)
Type "help" for help.

postgres=# CREATE TEMPORARY TABLE footemp(id int4);WARNING: out of
shared memory
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
postgres=#

In this case, the temp tables are created in two separate transactions,
which is why the creations don't run out of shared memory. But when
they're later dropped, the new backend tries to grab a lock on all of
them at the same time.

I'm not quite sure how I got into that situation with my test case. I
might have run the test case without a BEGIN/COMMIT at first, while
developing the test script, so that all the thousands of temp tables
were created in separate transactions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 21:15:47
Message-ID: 4919F603.4060403@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>
>> oprofile does suggest that more time is spent in the kernel in CVS HEAD.
>> CVS HEAD:
>>
>> 37387 13.9383 no-vmlinux postgres (no
>> symbols)
>
>> 8.3:
>>
>> 27264 9.3254 no-vmlinux postgres (no
>> symbols)
>
> What this means is that there are a lot more syscalls in HEAD than 8.3 ...

Yeah, sure. My point was that it doesn't seem to be significant in the
big scheme of things.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RAM-only temporary tables
Date: 2008-11-11 21:36:46
Message-ID: 4919FAEE.5000506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> Heikki, would it be reasonable to fix things so that a nonexistent FSM
>>> fork is semantically the same as an empty one, and not create FSM until
>>> there's actually something to put in it?
>
>> Possibly, but I'd like to understand what exactly the problem is. I
>> tried running this:
>
>> CREATE TEMPORARY TABLE footemp (id int4);
>> DROP TABLE footemp;
>
>> with pgbench -f, but can't see any meaningful difference between 8.3 and
>> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off.
>
> Try several thousand temp tables within one transaction.

After increasing max_lock_per_transaction to avoid the issue I mentioned
elsewhere in this thread, using this test script:

#/bin/sh

echo "BEGIN;"
for ((i=0;i<=10000;i+=1)); do
echo "CREATE TEMPORARY TABLE footemp$i (id int4);"
done
echo "COMMIT;"

And repeating a few times with:

$ time sh temptest.sh | ~/installations/cvshead/bin/psql postgres -q

I'm again seeing no big difference between 8.3 and CVS HEAD. I'm getting
values around both sides of 6 seconds, with 5.8s being the minimum for
CVS HEAD, and 5.6s the minimum for PG 8.3.

oprofile for 8.3:

samples % image name app name
symbol name
23844 11.1625 postgres postgres
LockReassignCurrentOwner
22299 10.4392 no-vmlinux postgres (no
symbols)
15461 7.2380 postgres postgres
hash_seq_search
12309 5.7624 postgres postgres
CatalogCacheFlushRelation
10981 5.1407 postgres postgres
hash_search_with_hash_value
7139 3.3421 postgres postgres
get_tabstat_entry
7133 3.3393 postgres postgres
XLogInsert
7112 3.3294 no-vmlinux no-vmlinux (no
symbols)
6294 2.9465 postgres postgres
_bt_compare
6027 2.8215 postgres postgres
LWLockAcquire
5189 2.4292 postgres postgres hash_any
4744 2.2209 postgres postgres
LWLockRelease
3355 1.5706 bash bash (no
symbols)

oprofile for CVS HEAD:

samples % image name app name
symbol name
30167 11.0321 postgres postgres
LockReassignCurrentOwner
27345 10.0001 no-vmlinux postgres (no
symbols)
19537 7.1447 postgres postgres
hash_seq_search
17376 6.3544 postgres postgres
CatalogCacheFlushRelation
14910 5.4526 postgres postgres
findDependentObjects
14187 5.1882 postgres postgres
hash_search_with_hash_value
9268 3.3893 postgres postgres
get_tabstat_entry
7478 2.7347 postgres postgres
_bt_compare
6861 2.5091 no-vmlinux no-vmlinux (no
symbols)
6779 2.4791 postgres postgres
LWLockAcquire
6764 2.4736 postgres postgres
XLogInsert
5122 1.8731 postgres postgres
LWLockRelease
4838 1.7693 postgres postgres hash_any
3501 1.2803 bash bash (no
symbols)

I'm quite surprised to see LockReassignCurrentOwner being called at all.
gdb shows where the call is coming from:

#0 LockReassignCurrentOwner () at lock.c:1653
#1 0x0000000000662865 in ResourceOwnerReleaseInternal (owner=0xab21f8,
phase=RESOURCE_RELEASE_LOCKS, isCommit=1 '\001', isTopLevel=0 '\0')
at resowner.c:261
#2 0x00000000006628f0 in ResourceOwnerRelease (owner=0xab21f8,
phase=RESOURCE_RELEASE_LOCKS, isCommit=1 '\001', isTopLevel=0 '\0')
at resowner.c:171
#3 0x000000000066139b in PortalDrop (portal=0xacc060,
isTopCommit=<value optimized out>) at portalmem.c:418
#4 0x00000000005aa162 in exec_simple_query (
query_string=0xa91350 "CREATE TEMPORARY TABLE footemp7013 (id int4);")
at postgres.c:972

I'm also a bit surprised that the higher number of syscalls in CVS HEAD
isn't visible in this oprofile report. The top percentages seemed to
vary by a couple percentage points from run to run, though, so perhaps
it's just drowned by noise.

Kevin, what was your original scenario like that led you to investigate
this?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RAM-only temporary tables
Date: 2008-11-21 23:18:20
Message-ID: 4926ED5C.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sorry for the late response; I was on vacation.

>>> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Kevin, what was your original scenario like that led you to
investigate
> this?

We noticed a performance degradation in application code which, within
a database transaction, looped through large numbers of iterations
where each iteration created, used, and dropped a temporary table.
The temporary table always had several columns (typically ten to
twenty, many of which were varchar) and had a primary key.

We initially thought this was because of an upgrade of PostgreSQL from
8.2.5 to 8.3.4, but subsequent testing showed that it was because of
the concurrent update of the Linux kernel from one which defaulted to
not using write barriers to one which did default to using write
barriers. The file system was XFS on five spindles in RAID 5 on a
good BBU controller. Updating the relevant /etc/fstab entry with
nobarrier brought performance back to an acceptable level.

-Kevin