Re: Bulk Inserts

From: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
To: "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bulk Inserts
Date: 2009-09-15 07:28:29
Message-ID: op.u0ae1rg8cke6l8@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Does that heuristic change the timings much? If not, it seems like it
> would
> better to keep it simple and always do the same thing, like log the
> tuples
> (if it is done under one WALInsertLock, which I am assuming it is..)

It is the logging of whole pages that makes it faster.
If you fill a page with tuples in one operation (while holding exclusive
lock) and then insert WAL records for each tuple, there is no speed gain.

Inserting a full page WAL record (since you just filled the page
completely) :

- only takes WalInsertLock once instead of once per tuple
- reduces wal traffic
- is about 2x faster in my benchmark

And inserting a "clear new page" record (if the page was previously
new/empty and relation is fsync'd at the end) :

- only takes WalInsertLock once instead of once per tuple
- reduces wal traffic a lot
- is about 4x faster in my benchmark

> Do you even need the new empty page record? I think a zero page will be
> handled correctly next time it is read into shared buffers, won't it?

I have no idea ;)

> But I
> guess it is need to avoid problems with partial page writes that would
> leave in a state that is neither all zeros nor consistent.

Plus, empty page records make for very small WAL traffic and I didn't see
any performance difference with or without them.

> If the entire page is logged, would it have to marked as not removable by
> the log compression tool? Or can the tool recreate the needed delta?

No, the tool cannot recreate the data, since the idea is precisely to
replace a lot of "tuple insert" messages with one "entire page" message,
which takes both less space and less time. The warm-standby replicators
that get this WAL need to know the page contents to replicate it... (also,
it will probably be faster for them to redo a page write than redo all the
tuple inserts).

Here is what I'm thinking about now :

* have some kind of BulkInsertState which contains
- info about the relation, indexes, triggers, etc
- a tuple queue.

The tuple queue may be a tuple store, or simply tuple copies in a local
memory context.

You'd have functions to :

- Setup the BulkInsertState
- Add a tuple to the BulkInsertState
- Finish the operation and clear the BulkInsertState

When adding a tuple, it is stored in the queue.
When the queue is full, a bulk insert operation takes place, hopefully we
can fill an entire page, and return.
Post insert triggers and index updates are also handled at this point.

When finished, the function that clears the state also inserts all
remaining tuples in the queue.

With this you could also do something *really* interesting : bulk index
updates...

Bulk index updates are probably mutually exclusive with after-row triggers
though.

Another angle of attack would be to make wal-writing more efficient...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2009-09-15 08:46:57 Re: WIP: generalized index constraints
Previous Message Pierre Frédéric Caillaud 2009-09-15 07:23:05 Re: Bulk Inserts