Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

Lists: pgsql-patchespgsql-performance
From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: pgsql-performance(at)postgresql(dot)org
Subject: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 09:55:59
Message-ID: 47D505AF.4040101@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Hi all

I'm encountering an odd issue with a bulk import query using PostgreSQL
8.3. After a 400,000 row import into a just-truncated table `booking', a
sequential scan run on the table in the same transaction is incredibly
slow, taking ~ 166738.047 ms. After a:
`COMMIT; BEGIN;'
the same query runs in 712.615 ms, with almost all the time difference
being in the sequential scan of the `booking' table [schema at end of post].

The table is populated by a complex pl/pgsql function that draws from
several other tables to convert data from another app's format. After
that function runs, here's what happens if I do a simple sequential
scan, then what happens after I commit and run it again:

craig=# explain analyze select * from booking;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on booking (cost=0.00..9871.60 rows=320160 width=139)
(actual time=0.017..166644.697 rows=341481 loops=1)
Total runtime: 166738.047 ms
(2 rows)

craig=# commit; begin;
COMMIT
BEGIN
craig=# explain analyze select * from booking;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on booking (cost=0.00..9871.60 rows=320160 width=139)
(actual time=0.022..624.492 rows=341481 loops=1)
Total runtime: 712.615 ms
(2 rows)

SET client_min_messages = 'debug5';

does not shed any more light; the only extra output is (eg):

craig=# select count(distinct id) from booking;
DEBUG: StartTransactionCommand
DEBUG: CommitTransactionCommand
count
--------
341481
(1 row)

... which took 164558.961 ms to run, or about 2 tuples per second.
[Table schema at end of post]. By comparison, after commit the same
query read about 500 tuples/second.

This issue appears to affect any query that results in a sequential scan
on the newly populated table, and also affects maintenance operations
like ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY that perform
sequential scans. ANALYZE is also really slow. I'm not yet sure if index
scans are affected.

I'm not using WAL logging.

It doesn't matter whether I truncate the table before the import using a
separate transaction or the same one that does the import.

I see essentially identical results, and runtimes, with other more
complex queries, but it seems to boil down to extremely slow sequential
scans.

The Linux 2.6.22 host these queries are running on runs PostgreSQL 8.3.
It has 4GB of RAM and shmmax set to 512MB. Tweaking the postgresql
memory parameters seems to make little difference, but the ones that I
adjusted away from defaults to see if this was a resource issue are:

shared_buffers = 32MB
temp_buffers = 128MB
work_mem = 128MB
maintenance_work_mem = 1024MB # min 1MB

(There are relatively few clients to this database, but they work it hard).

Is this huge speed difference in sequential scans expected behavior? Any
idea what might be causing it?

I'm presently working around it by just committing the transaction after
the bulk import - but there's lots more work to do after that and it
leaves the database in a rather messy interim state.

Here's the table's schema, pasted as a quote to stop Thunderbird
mangling it. There are no rules on this table except those that might be
created internally by postgresql.

> craig=# \d booking
> Table "public.booking"
> Column | Type | Modifiers
> ------------------------+--------------------------+------------------------------------------------------
> id | integer | not null default nextval('booking_id_seq'::regclass)
> customer_id | integer | not null
> edition_id | integer | not null
> description | character varying(255) | not null
> position | integer | not null
> loading_applied | boolean | not null default false
> loading_ratio | numeric(16,4) | not null
> size_type | integer | not null
> size_length | numeric(16,4) |
> base_price | numeric(16,4) | not null
> gst_factor | numeric(16,8) | not null default gst_factor()
> page_number | integer |
> invoiced | timestamp with time zone |
> contract_id | integer |
> old_customer_id | integer | not null
> booked_time | timestamp with time zone | not null
> booked_by | character varying(80) | not null
> cancelled | boolean | not null default false
> art_supplied | boolean | not null default false
> repeat_group | integer |
> notes | text |
> originally_from_system | character(1) |
> pe_booking_id | integer |
> pe_customer_id | integer |
> Indexes:
> "booking_pkey" PRIMARY KEY, btree (id)
> Check constraints:
> "base_price_nonnegative" CHECK (base_price >= 0::numeric)
> "gst_factor_nonnegative_and_sane" CHECK (gst_factor >= 0::numeric AND gst_factor < 9::numeric)
> "loading_ratio_sane" CHECK (loading_ratio > 0::numeric AND loading_ratio < 9::numeric)
> "page_no_sane" CHECK (page_number IS NULL OR page_number > 0 AND page_number <= 500)
> "size_length_nonnegative" CHECK (size_length IS NULL OR size_length >= 0::numeric)
> Foreign-key constraints:
> "booking_contract_id_fkey" FOREIGN KEY (contract_id) REFERENCES contract(id)
> "booking_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customer(id)
> "booking_edition_id_fkey" FOREIGN KEY (edition_id) REFERENCES edition(id) ON DELETE CASCADE
> "booking_old_customer_id_fkey" FOREIGN KEY (old_customer_id) REFERENCES customer(id)
> "booking_position_fkey" FOREIGN KEY ("position") REFERENCES booking_position(id)
> "booking_repeat_group_fkey" FOREIGN KEY (repeat_group) REFERENCES booking_repeat(id) ON DELETE SET NULL
> "booking_size_type_fkey" FOREIGN KEY (size_type) REFERENCES booking_size_type(id)
> Triggers:
> booking_after_insert_update AFTER INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_after_trigger()
> booking_audit AFTER UPDATE ON booking FOR EACH ROW EXECUTE PROCEDURE booking_audit_trigger()
> booking_before_insert BEFORE INSERT ON booking FOR EACH ROW EXECUTE PROCEDURE booking_before_trigger()

--
Craig Ringer


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 11:01:32
Message-ID: 47D5150C.8090802@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Craig Ringer wrote:
> I'm encountering an odd issue with a bulk import query using PostgreSQL
> 8.3. After a 400,000 row import into a just-truncated table `booking', a
> sequential scan run on the table in the same transaction is incredibly
> slow, taking ~ 166738.047 ms. After a:
> `COMMIT; BEGIN;'
> the same query runs in 712.615 ms, with almost all the time difference
> being in the sequential scan of the `booking' table [schema at end of post].
>
> The table is populated by a complex pl/pgsql function that draws from
> several other tables to convert data from another app's format.

You must be having an exception handler block in that pl/pgsql function,
which implicitly creates a new subtransaction on each invocation of the
exception handler block, so you end up with hundreds of thousands of
committed subtransactions. For each row in the seq scan, the list of
subtransactions is scanned, to see if the transaction that inserted the
row is part of the current top-level transaction. That's fine for a
handful of subtransactions, but it gets really slow with large numbers
of them, as you've seen. It's an O(n^2) operation, where n is the number
of rows inserted, so you'll be in even more trouble if the number of
rows increases.

As a work-around, avoid using exception handlers, or process more than 1
row per function invocation. Or COMMIT the transaction, as you did.

For 8.4, it would be nice to improve that. I tested that on my laptop
with a similarly-sized table, inserting each row in a pl/pgsql function
with an exception handler, and I got very similar run times. According
to oprofile, all the time is spent in TransactionIdIsInProgress. I think
it would be pretty straightforward to store the committed subtransaction
ids in a sorted array, instead of a linked list, and binary search. Or
to use a hash table. That should eliminate this problem, though there is
still other places as well where a large number of subtransactions will
hurt performance.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 11:29:01
Message-ID: 47D51B7D.4010800@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Thanks for the extremely helpful response. I don't think I would've
spotted that one in a hurry.

> You must be having an exception handler block in that pl/pgsql
> function, which implicitly creates a new subtransaction on each
> invocation of the exception handler block, so you end up with hundreds
> of thousands of committed subtransactions.

Aah - yes, there is. I didn't realize it'd have such an impact. I can
work around the need for it by explicitly checking the table constraints
in the function - in which case an uncaught exception will terminate the
transaction, but should only arise when I've missed a constraint check.

> For 8.4, it would be nice to improve that. I tested that on my laptop
> with a similarly-sized table, inserting each row in a pl/pgsql
> function with an exception handler, and I got very similar run times.
> According to oprofile, all the time is spent in
> TransactionIdIsInProgress. I think it would be pretty straightforward
> to store the committed subtransaction ids in a sorted array, instead
> of a linked list, and binary search. Or to use a hash table. That
> should eliminate this problem, though there is still other places as
> well where a large number of subtransactions will hurt performance.

That does sound interesting - and it would be nice to be able to use
exception handlers this way without too huge a performance hit. In the
end though it's something that can be designed around once you're aware
of it - and I'm sure that other ways of storing that data have their own
different costs and downsides.

What might also be nice, and simpler, would be a `notice', `log', or
even `debug1' level warning telling the user they've reached an absurd
number of subtransactions that'll cripple PostgreSQL's performance - say
100,000. There's precedent for this in the checkpoint frequency warning
8.3 produces if checkpoints are becoming too frequent - and like that
warning it could be configurable for big sites. If you think that's sane
I might have a go at it - though I mostly work in C++ so the result
probably won't be too pretty initially.

--
Craig Ringer


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 12:16:27
Message-ID: 47D5269B.8060103@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Heikki Linnakangas wrote:
> You must be having an exception handler block in that pl/pgsql
> function, which implicitly creates a new subtransaction on each
> invocation of the exception handler block, so you end up with hundreds
> of thousands of committed subtransactions.
I've just confirmed that that was indeed the issue, and coding around
the begin block dramatically cuts the runtimes of commands executed
after the big import function.

Thanks again!

--
Craig Ringer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 14:33:58
Message-ID: 23588.1205159638@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> For 8.4, it would be nice to improve that. I tested that on my laptop
> with a similarly-sized table, inserting each row in a pl/pgsql function
> with an exception handler, and I got very similar run times. According
> to oprofile, all the time is spent in TransactionIdIsInProgress. I think
> it would be pretty straightforward to store the committed subtransaction
> ids in a sorted array, instead of a linked list, and binary search.

I think the OP is not complaining about the time to run the transaction
that has all the subtransactions; he's complaining about the time to
scan the table that it emitted. Presumably, each row in the table has a
different (sub)transaction ID and so we are thrashing the clog lookup
mechanism. It only happens once because after that the XMIN_COMMITTED
hint bits are set.

This probably ties into the recent discussions about eliminating the
fixed-size allocations for SLRU buffers --- I suspect it would've run
better if it could have scaled up the number of pg_clog pages held in
memory.

regards, tom lane


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 14:48:39
Message-ID: 47D54A47.3030608@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> For 8.4, it would be nice to improve that. I tested that on my laptop
>> with a similarly-sized table, inserting each row in a pl/pgsql function
>> with an exception handler, and I got very similar run times. According
>> to oprofile, all the time is spent in TransactionIdIsInProgress. I think
>> it would be pretty straightforward to store the committed subtransaction
>> ids in a sorted array, instead of a linked list, and binary search.
>
> I think the OP is not complaining about the time to run the transaction
> that has all the subtransactions; he's complaining about the time to
> scan the table that it emitted.

Yes, but only in succeeding statements in the same transaction as the
procedure that creates all the subtransactions. Table scan times return
to normal after that transaction commits.

> Presumably, each row in the table has a
> different (sub)transaction ID and so we are thrashing the clog lookup
> mechanism. It only happens once because after that the XMIN_COMMITTED
> hint bits are set.

It seems to happen with every statement run in the same transaction as,
and after, the procedure with all the subtransactions. As soon as a
COMMIT is executed, operations return to normal speed. There's no
significant delay on the first statement after COMMIT as compared to
subsequent statements, nor do successive statements before the COMMIT
get faster.

In other words, if I repeatedly run one of the statements I used in
testing for my initial post, like:

EXPLAIN ANALYZE SELECT * FROM booking;

... after running the problem stored procedure, it takes just as long
for the second and third and so on runs as for the first.

As soon as I commit the transaction, the exact same statement returns to
running in less than a second, and doesn't significantly change in
runtime for subsequent executions.

I'll bang out a couple of examples at work tomorrow to see what I land
up with, since this is clearly something that can benefit from a neat
test case.

In any case, avoding the use of an exception block per record generated
worked around the performance issues, so it's clearly something to do
with the vast numbers of subtransactions - as Heikki Linnakangas
suggested and tested.

--
Craig Ringer


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 14:53:32
Message-ID: 47D54B6C.7080404@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> For 8.4, it would be nice to improve that. I tested that on my laptop
>> with a similarly-sized table, inserting each row in a pl/pgsql function
>> with an exception handler, and I got very similar run times. According
>> to oprofile, all the time is spent in TransactionIdIsInProgress. I think
>> it would be pretty straightforward to store the committed subtransaction
>> ids in a sorted array, instead of a linked list, and binary search.
>
> I think the OP is not complaining about the time to run the transaction
> that has all the subtransactions; he's complaining about the time to
> scan the table that it emitted.

If you read the original post carefully, he complained that the seq scan
was slow when executed within the same transaction as populating the
table, and fast if he committed in between.

> Presumably, each row in the table has a
> different (sub)transaction ID and so we are thrashing the clog lookup
> mechanism. It only happens once because after that the XMIN_COMMITTED
> hint bits are set.
>
> This probably ties into the recent discussions about eliminating the
> fixed-size allocations for SLRU buffers --- I suspect it would've run
> better if it could have scaled up the number of pg_clog pages held in
> memory.

I doubt that makes any noticeable difference in this case. 300000
transaction ids fit on < ~100 clog pages, and the xmins on heap pages
are nicely in order.

Getting rid of the fixed-size allocations would be nice for other
reasons, of course.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 15:02:52
Message-ID: 24005.1205161372@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> It seems to happen with every statement run in the same transaction as,
> and after, the procedure with all the subtransactions. As soon as a
> COMMIT is executed, operations return to normal speed.

Ah. I misread your post as saying that it happened only once.

regards, tom lane


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 15:03:14
Message-ID: 47D54DB2.4020405@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Craig Ringer wrote:
> I'll bang out a couple of examples at work tomorrow to see what I land
> up with, since this is clearly something that can benefit from a neat
> test case.

Here's what I used to reproduce this:

postgres=# BEGIN;
BEGIN
postgres=# CREATE TABLE foo (id int4,t text);CREATE TABLE
postgres=# CREATE OR REPLACE FUNCTION insertfunc() RETURNS void LANGUAGE
plpgsql AS $$
begin
INSERT INTO foo VALUES ( 1, repeat('a',110));
exception when unique_violation THEN end;
$$;
CREATE FUNCTION
postgres=# SELECT COUNT(insertfunc()) FROM generate_series(1,300000);
count
--------
300000
(1 row)

postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM foo;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=13595.93..13595.94 rows=1 width=0) (actual
time=239535.904..239535.906 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..11948.34 rows=659034 width=0)
(actual time=0.022..239133.898 rows=300000 loops=1)
Total runtime: 239535.974 ms
(3 rows)

The oprofile output is pretty damning:

samples % symbol name
42148 99.7468 TransactionIdIsCurrentTransactionId

If you put a COMMIT right before "EXPLAIN ANALYZE..." it runs in < 1s.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 15:10:00
Message-ID: 47D54F48.2090904@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Tom Lane wrote:
> Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
>
>> It seems to happen with every statement run in the same transaction as,
>> and after, the procedure with all the subtransactions. As soon as a
>> COMMIT is executed, operations return to normal speed.
>>
>
> Ah. I misread your post as saying that it happened only once.
No worries - it's best to be sure.

Thanks for looking into it.

--
Craig Ringer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 15:20:40
Message-ID: 24304.1205162440@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> The oprofile output is pretty damning:

> samples % symbol name
> 42148 99.7468 TransactionIdIsCurrentTransactionId

Oh, I have no doubt that that could eat a lot of cycles inside the
originating transaction ;-). I just misread Craig's complaint as
being about the cost of the first table scan *after* that transaction.

Getting rid of the linked-list representation would be a win in a couple
of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h,
and xactGetCommittedChildren would go away. OTOH AtSubCommit_childXids
would get more expensive.

regards, tom lane


From: Rainer Pruy <Rainer(dot)Pruy(at)Acrys(dot)COM>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 16:55:43
Message-ID: 47D5680F.6080508@acrys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

We experienced a similar degradation,
when heavily using savepoints within a single transaction.
However, we had not yet enough time to really investigate the issue.
It also was not directly reproducible using a (small) set of statements from a script.
As the overall scenario "bulk loads with sub-transactions" is close to the scenario we do run, it might come down to the same reason, so.

Thus take my vote for a solution that does not end up with "don't use (sub-) transactions".

Regards,
Rainer

Craig Ringer schrieb:
> Thanks for the extremely helpful response. I don't think I would've
> spotted that one in a hurry.
>
>> You must be having an exception handler block in that pl/pgsql
>> function, which implicitly creates a new subtransaction on each
>> invocation of the exception handler block, so you end up with hundreds
>> of thousands of committed subtransactions.
>
> Aah - yes, there is. I didn't realize it'd have such an impact. I can
> work around the need for it by explicitly checking the table constraints
> in the function - in which case an uncaught exception will terminate the
> transaction, but should only arise when I've missed a constraint check.
>
>> For 8.4, it would be nice to improve that. I tested that on my laptop
>> with a similarly-sized table, inserting each row in a pl/pgsql
>> function with an exception handler, and I got very similar run times.
>> According to oprofile, all the time is spent in
>> TransactionIdIsInProgress. I think it would be pretty straightforward
>> to store the committed subtransaction ids in a sorted array, instead
>> of a linked list, and binary search. Or to use a hash table. That
>> should eliminate this problem, though there is still other places as
>> well where a large number of subtransactions will hurt performance.
>
> That does sound interesting - and it would be nice to be able to use
> exception handlers this way without too huge a performance hit. In the
> end though it's something that can be designed around once you're aware
> of it - and I'm sure that other ways of storing that data have their own
> different costs and downsides.
>
> What might also be nice, and simpler, would be a `notice', `log', or
> even `debug1' level warning telling the user they've reached an absurd
> number of subtransactions that'll cripple PostgreSQL's performance - say
> 100,000. There's precedent for this in the checkpoint frequency warning
> 8.3 produces if checkpoints are becoming too frequent - and like that
> warning it could be configurable for big sites. If you think that's sane
> I might have a go at it - though I mostly work in C++ so the result
> probably won't be too pretty initially.
>
> --
> Craig Ringer
>

--
Rainer Pruy
Geschäftsführer

Acrys Consult GmbH & Co. KG
Untermainkai 29-30, D-60329 Frankfurt
Tel: +49-69-244506-0 - Fax: +49-69-244506-50
Web: http://www.acrys.com - Email: office(at)acrys(dot)com
Handelsregister: Frankfurt am Main, HRA 31151


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-10 22:00:27
Message-ID: 1205186427.4269.179.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

On Mon, 2008-03-10 at 11:01 +0000, Heikki Linnakangas wrote:
> According
> to oprofile, all the time is spent in TransactionIdIsInProgress.

I recently submitted a patch to optimise this. Your comments would be
welcome on the patch.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 06:24:42
Message-ID: 2e78013d0803102324i66fa5376rca1bc3d250bc8317@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

On Mon, Mar 10, 2008 at 4:31 PM, Heikki Linnakangas
<heikki(at)enterprisedb(dot)com> wrote:
> According
> to oprofile, all the time is spent in TransactionIdIsInProgress. I think
> it would be pretty straightforward to store the committed subtransaction
> ids in a sorted array, instead of a linked list, and binary search.

Assuming that in most of the cases, there will be many committed and few aborted
subtransactions, how about storing the list of *aborted* subtransactions ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow (2 tuples/second) sequential scan afterbulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 09:58:20
Message-ID: 47D657BC.40409@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Simon Riggs wrote:
> On Mon, 2008-03-10 at 11:01 +0000, Heikki Linnakangas wrote:
>> According
>> to oprofile, all the time is spent in TransactionIdIsInProgress.
>
> I recently submitted a patch to optimise this. Your comments would be
> welcome on the patch.

You mean this one:

http://archives.postgresql.org/pgsql-patches/2008-02/msg00008.php

? Unfortunately that patch won't help in this case.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 12:34:07
Message-ID: 47D67C3F.1030406@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

(moved to pgsql-patches, as there's a patch attached)

Tom Lane wrote:
> Getting rid of the linked-list representation would be a win in a couple
> of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h,
> and xactGetCommittedChildren would go away. OTOH AtSubCommit_childXids
> would get more expensive.

I couldn't let this case go, so I wrote a patch. I replaced the linked
list with an array that's enlarged at AtSubCommit_childXids when necessary.

I couldn't measure any performance hit from the extra work of enlarging
and memcpying the array. I tried two test cases:

1. Insert one row per subtransaction, and commit the subtransaction
after each insert. This is like the OP's case.

printf("CREATE TABLE foo (id int4);\n");
printf("BEGIN;\n");
for(i=1; i <= 100000; i++)
{
printf("SAVEPOINT sp%d;\n", i);
printf("INSERT INTO foo VALUES (1);\n");
printf("RELEASE SAVEPOINT sp%d;\n", i);
}
printf("COMMIT;\n");

2. Insert one row per subtransaction, but leave the subtransaction in
not-committed state

printf("CREATE TABLE foo (id int4, t text);\n");
printf("BEGIN;\n");
for(i=1; i <= 100000; i++)
{
printf("SAVEPOINT sp%d;\n", i);
printf("INSERT INTO foo VALUES (1, 'f');\n");
}
printf("COMMIT;\n");

Test case 1 is not bad, because we just keep appending to the parent
array one at a time. Test case 2 might become slower, as the number of
subtransactions increases, as at the commit of each subtransaction you
need to enlarge the parent array and copy all the already-committed
childxids to it. However, you hit the limit on amount of shared mem
required for holding the per-xid locks before that becomes a problem,
and the performance becomes dominated by other things anyway (notably
LockReassignCurrentOwner).

I initially thought that using a single palloc'd array to hold all the
XIDs would introduce a new limit on the number committed
subtransactions, thanks to MaxAllocSize, but that's not the case.
Without patch, we actually allocate an array like that anyway in
xactGetCommittedChildren.

Elsewhere in our codebase where we use arrays that are enlarged as
needed, we keep track of the "allocated" size and the "used" size of the
array separately, and only call repalloc when the array fills up, and
repalloc a larger than necessary array when it does. I chose to just
call repalloc every time instead, as repalloc is smart enough to fall
out quickly if the chunk the allocation was made in is already larger
than the new size. There might be some gain avoiding the repeated
repalloc calls, but I doubt it's worth the code complexity, and calling
repalloc with a larger than necessary size can actually force it to
unnecessarily allocate a new, larger chunk instead of reusing the old
one. Thoughts on that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
childXids-binsearch-4.patch text/x-diff 11.1 KB

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 13:06:33
Message-ID: 2e78013d0803110606hb0eaa2aq1784608cf4c7026c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

On Tue, Mar 11, 2008 at 6:04 PM, Heikki Linnakangas
<heikki(at)enterprisedb(dot)com> wrote:
> (moved to pgsql-patches, as there's a patch attached)
>
>
> I couldn't let this case go, so I wrote a patch. I replaced the linked
> list with an array that's enlarged at AtSubCommit_childXids when necessary.
>

We can replace the array of xids with an array of flags where i'th flag is
set to true if the corresponding subtransaction is committed. This would
make lookup O(1) operation. Of course, the downside is when the array
is sparse. Or we can switch to flag based representation once the array size
grows beyond a limit.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 13:29:04
Message-ID: 20080311132904.GA9116@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Heikki Linnakangas wrote:

> I couldn't let this case go, so I wrote a patch. I replaced the linked
> list with an array that's enlarged at AtSubCommit_childXids when
> necessary.

Do you still need to palloc the return value from
xactGetCommittedChildren? Perhaps you can save the palloc/memcpy/pfree
and just return the pointer to the array already in memory?

Not that it'll any much of a performance impact, but just for
cleanliness :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequentialscan after bulk insert; speed returns to ~500 tuples/second aftercommit
Date: 2008-03-11 14:03:08
Message-ID: 47D6911C.3040605@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>
>> I couldn't let this case go, so I wrote a patch. I replaced the linked
>> list with an array that's enlarged at AtSubCommit_childXids when
>> necessary.
>
> Do you still need to palloc the return value from
> xactGetCommittedChildren? Perhaps you can save the palloc/memcpy/pfree
> and just return the pointer to the array already in memory?

Yeah, good point. The callers just need to be modified not to pfree it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-11 21:06:37
Message-ID: 5827.1205269597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> I initially thought that using a single palloc'd array to hold all the
> XIDs would introduce a new limit on the number committed
> subtransactions, thanks to MaxAllocSize, but that's not the case.
> Without patch, we actually allocate an array like that anyway in
> xactGetCommittedChildren.

Right.

> Elsewhere in our codebase where we use arrays that are enlarged as
> needed, we keep track of the "allocated" size and the "used" size of the
> array separately, and only call repalloc when the array fills up, and
> repalloc a larger than necessary array when it does. I chose to just
> call repalloc every time instead, as repalloc is smart enough to fall
> out quickly if the chunk the allocation was made in is already larger
> than the new size. There might be some gain avoiding the repeated
> repalloc calls, but I doubt it's worth the code complexity, and calling
> repalloc with a larger than necessary size can actually force it to
> unnecessarily allocate a new, larger chunk instead of reusing the old
> one. Thoughts on that?

Seems like a pretty bad idea to me, as the behavior you're counting on
only applies to chunks up to 8K or thereabouts. In a situation where
you are subcommitting lots of XIDs one at a time, this is likely to have
quite awful behavior (or at least, you're at the mercy of the local
malloc library as to how bad it is). I'd go with the same
double-it-each-time-needed approach we use elsewhere.

regards, tom lane


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 13:43:46
Message-ID: 47D7DE12.80700@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> Elsewhere in our codebase where we use arrays that are enlarged as
>> needed, we keep track of the "allocated" size and the "used" size of the
>> array separately, and only call repalloc when the array fills up, and
>> repalloc a larger than necessary array when it does. I chose to just
>> call repalloc every time instead, as repalloc is smart enough to fall
>> out quickly if the chunk the allocation was made in is already larger
>> than the new size. There might be some gain avoiding the repeated
>> repalloc calls, but I doubt it's worth the code complexity, and calling
>> repalloc with a larger than necessary size can actually force it to
>> unnecessarily allocate a new, larger chunk instead of reusing the old
>> one. Thoughts on that?
>
> Seems like a pretty bad idea to me, as the behavior you're counting on
> only applies to chunks up to 8K or thereabouts.

Oh, you're right. Though I'm sure libc realloc has all kinds of smarts
as well, it does seem better to not rely too much on that.

> In a situation where
> you are subcommitting lots of XIDs one at a time, this is likely to have
> quite awful behavior (or at least, you're at the mercy of the local
> malloc library as to how bad it is). I'd go with the same
> double-it-each-time-needed approach we use elsewhere.

Yep, patch attached. I also changed xactGetCommittedChildren to return
the original array instead of copying it, as Alvaro suggested.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
childXids-binsearch-5.patch text/x-diff 14.1 KB

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 15:47:20
Message-ID: 2e78013d0803120847k2e8afbedrc39e29e5194d3c6e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

On Wed, Mar 12, 2008 at 7:13 PM, Heikki Linnakangas
<heikki(at)enterprisedb(dot)com> wrote:
>
>
> Yep, patch attached. I also changed xactGetCommittedChildren to return
> the original array instead of copying it, as Alvaro suggested.
>

Any comments on the flag based approach I suggested earlier ? Am I
missing some normal scenario where it won't work well ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 15:57:19
Message-ID: 19902.1205337439@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Wed, Mar 12, 2008 at 7:13 PM, Heikki Linnakangas
> <heikki(at)enterprisedb(dot)com> wrote:
>> Yep, patch attached. I also changed xactGetCommittedChildren to return
>> the original array instead of copying it, as Alvaro suggested.

> Any comments on the flag based approach I suggested earlier ?

I didn't like it; it seemed overly complicated (consider dealing with
XID wraparound), and it would have problems with a slow transaction
generating a sparse set of subtransaction XIDs. I think getting rid of
the linear search will be enough to fix the performance problem.

regards, tom lane


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 17:02:37
Message-ID: 2e78013d0803121002o2b7a8531y1553ce66cb6d1537@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> I didn't like it; it seemed overly complicated (consider dealing with
> XID wraparound),

We are talking about subtransactions here. I don't think we support
subtransaction wrap-around, do we ?

> and it would have problems with a slow transaction
> generating a sparse set of subtransaction XIDs.

I agree thats the worst case. But is that common ? Thats what I
was thinking when I proposed the alternate solution. I thought that can
happen only if most of the subtransactions abort, which again I thought
is not a normal case. But frankly I am not sure if my assumption is correct.

> I think getting rid of
> the linear search will be enough to fix the performance problem.
>

I wonder if a skewed binary search would help more ? For example,
if we know that the range of xids stored in the array is 1 to 1000 and
if we are searching for a number closer to 1000, we can break the
array into <large,small> parts instead of equal parts and then
search.

Well, may be I making simple things complicated ;-)

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 17:14:04
Message-ID: 47D80F5C.2060500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Pavan Deolasee wrote:
> On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> I didn't like it; it seemed overly complicated (consider dealing with
>> XID wraparound),
>
> We are talking about subtransactions here. I don't think we support
> subtransaction wrap-around, do we ?

Imagine that you start a transaction just before transaction
wrap-around, so that the top level XID is 2^31-10. Then you start 20
subtransactions. What XIDs will they get? Now how would you map those to
a bitmap?

It's certainly possible, you could index the bitmap by the index from
top transaction XID for example. But it does get a bit complicated.

>> and it would have problems with a slow transaction
>> generating a sparse set of subtransaction XIDs.
>
> I agree thats the worst case. But is that common ? Thats what I
> was thinking when I proposed the alternate solution. I thought that can
> happen only if most of the subtransactions abort, which again I thought
> is not a normal case. But frankly I am not sure if my assumption is correct.

It's not that common to have hundreds of thousands of subtransactions to
begin with..

>> I think getting rid of
>> the linear search will be enough to fix the performance problem.
>
> I wonder if a skewed binary search would help more ? For example,
> if we know that the range of xids stored in the array is 1 to 1000 and
> if we are searching for a number closer to 1000, we can break the
> array into <large,small> parts instead of equal parts and then
> search.

Possibly, but I doubt it's worth the trouble. The simple binary search
solved the performance problem well enough. In the test case of the OP,
with 300000 subtransactions, with the patch, there was no longer any
measurable difference whether you ran the "SELECT COUNT(*)" in the same
transaction as the INSERTs or after a COMMIT.

> Well, may be I making simple things complicated ;-)

I think so :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 17:22:13
Message-ID: 23603.1205342533@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> and it would have problems with a slow transaction
>> generating a sparse set of subtransaction XIDs.

> I agree thats the worst case. But is that common ? Thats what I
> was thinking when I proposed the alternate solution. I thought that can
> happen only if most of the subtransactions abort, which again I thought
> is not a normal case.

No, I was thinking of the case where other sessions are chewing up XIDs
while the lots-of-subtransactions transaction runs. If it's slow
enough, there could be very large gaps between the XIDs it acquires for
its subtransactions. So you'd have a situation where the exact same
transaction processing might or might not run out of memory depending
on what else happened meanwhile. Not a very pleasant property.

regards, tom lane


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 17:26:22
Message-ID: 2e78013d0803121026r227bbebdr5189be193b4a1bf5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

On Wed, Mar 12, 2008 at 10:44 PM, Heikki Linnakangas
<heikki(at)enterprisedb(dot)com> wrote:

>
> Imagine that you start a transaction just before transaction
> wrap-around, so that the top level XID is 2^31-10. Then you start 20
> subtransactions. What XIDs will they get? Now how would you map those to
> a bitmap?
>

Wait. Subtransaction ids are local to a transaction and always start from 1.
See this:

/*
* reinitialize within-transaction counters
*/
s->subTransactionId = TopSubTransactionId;
currentSubTransactionId = TopSubTransactionId;

>
> It's not that common to have hundreds of thousands of subtransactions to
> begin with..

True. But thats the case we are trying to solve here :-)

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 17:33:47
Message-ID: 47D813FB.1000006@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

Pavan Deolasee wrote:
> Wait. Subtransaction ids are local to a transaction and always start from 1.
> See this:
>
> /*
> * reinitialize within-transaction counters
> */
> s->subTransactionId = TopSubTransactionId;
> currentSubTransactionId = TopSubTransactionId;

No, we're not talking about SubTransactionIds. We're talking about real
XIDs assigned to subtransactions. Subtransactions are assigned regular
XIDs as well, just like top-level transactions.

I can see where you were coming from with you suggestion now :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-12 17:48:50
Message-ID: 2e78013d0803121048x2bbfcee3g8f7608607b6f4dee@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

On Wed, Mar 12, 2008 at 11:03 PM, Heikki Linnakangas
<heikki(at)enterprisedb(dot)com> wrote:
> Subtransactions are assigned regular
> XIDs as well, just like top-level transactions.
>

Ah, got it now. I never noticed this before.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-17 02:20:39
Message-ID: 20891.1205720439@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> Yep, patch attached. I also changed xactGetCommittedChildren to return
> the original array instead of copying it, as Alvaro suggested.

Applied with minor corrections (mostly comment fixes, but there were
a couple of real mistakes).

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
Date: 2008-03-17 02:21:24
Message-ID: 200803170221.m2H2LOp12205@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches pgsql-performance


This has been applied by Tom.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Tom Lane wrote:
> > "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> >> Elsewhere in our codebase where we use arrays that are enlarged as
> >> needed, we keep track of the "allocated" size and the "used" size of the
> >> array separately, and only call repalloc when the array fills up, and
> >> repalloc a larger than necessary array when it does. I chose to just
> >> call repalloc every time instead, as repalloc is smart enough to fall
> >> out quickly if the chunk the allocation was made in is already larger
> >> than the new size. There might be some gain avoiding the repeated
> >> repalloc calls, but I doubt it's worth the code complexity, and calling
> >> repalloc with a larger than necessary size can actually force it to
> >> unnecessarily allocate a new, larger chunk instead of reusing the old
> >> one. Thoughts on that?
> >
> > Seems like a pretty bad idea to me, as the behavior you're counting on
> > only applies to chunks up to 8K or thereabouts.
>
> Oh, you're right. Though I'm sure libc realloc has all kinds of smarts
> as well, it does seem better to not rely too much on that.
>
> > In a situation where
> > you are subcommitting lots of XIDs one at a time, this is likely to have
> > quite awful behavior (or at least, you're at the mercy of the local
> > malloc library as to how bad it is). I'd go with the same
> > double-it-each-time-needed approach we use elsewhere.
>
> Yep, patch attached. I also changed xactGetCommittedChildren to return
> the original array instead of copying it, as Alvaro suggested.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

>
> --
> Sent via pgsql-patches mailing list (pgsql-patches(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-patches

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +