Re: Bulk Inserts

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bulk Inserts
Date: 2009-09-15 01:55:50
Message-ID: f67928030909141855y2ff8993epe4d967a769cebb56@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/9/14 Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>

>
> I've done a little experiment with bulk inserts.
>
> => heap_bulk_insert()
>
> Behaves like heap_insert except it takes an array of tuples (HeapTuple
> *tups, int ntups).
>
> - Grabs a page (same as heap_insert)
>
> - While holding exclusive lock, inserts as many tuples as it can on the
> page.
> - Either the page gets full
> - Or we run out of tuples.
>
> - Generate xlog : choice between
> - Full Xlog mode :
> - if we inserted more than 10 tuples (totaly bogus
> heuristic), log the entire page
> - Else, log individual tuples as heap_insert does
>

Does that heuristic change the timings much? If not, it seems like it would
better to keep it simple and always do the same thing, like log the tuples
(if it is done under one WALInsertLock, which I am assuming it is..)

> - Light log mode :
> - if page was empty, only xlog a "new empty page" record,
> not page contents
> - else, log fully
> - heap_sync() at the end
>
> - Release the page
> - If we still have tuples to insert, repeat.
>
> Am I right in assuming that :
>
> 1)
> - If the page was empty,
> - and log archiving isn't used,
> - and the table is heap_sync()'d at the end,
> => only a "new empty page" record needs to be created, then the page can be
> completely filled ?
>

Do you even need the new empty page record? I think a zero page will be
handled correctly next time it is read into shared buffers, won't it? But I
guess it is need to avoid problems with partial page writes that would
leave in a state that is neither all zeros nor consistent.

> 2)
> - If the page isn't empty
> - or log archiving is used,
> => logging either the inserted tuples or the entire page is OK to guarantee
> persistence ?
>

If the entire page is logged, would it have to marked as not removable by
the log compression tool? Or can the tool recreate the needed delta?

Jeff

In response to

  • Bulk Inserts at 2009-09-14 14:26:53 from Pierre Frédéric Caillaud

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-09-15 02:01:44 Re: CommitFest 2009-09: Now In Progress
Previous Message Robert Haas 2009-09-15 01:52:55 Re: Issues for named/mixed function notation patch