Quick Links

Re: backup and restore questions

Lists:	pgsql-general

From:	"Sally Sally" <dedeb17(at)hotmail(dot)com>
To:	scott(dot)marlowe(at)ihs(dot)com
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	backup and restore questions
Date:	2004-02-20 21:46:32
Message-ID:	BAY9-F21vwbogTmQm2V00062643@hotmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Thank you all for replying. I appreciate the tips. Apologies to those who
were offended by the html formating.
Scott, quick question. The reason I assumed insert would be "safer" than
copy is because the docs say that in the case of copy it fails on a single
corrupted row whereas insert won't?
Sally

_________________________________________________________________
Stay informed on Election 2004 and the race to Super Tuesday.
http://special.msn.com/msn/election2004.armx

From:	"scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To:	Sally Sally <dedeb17(at)hotmail(dot)com>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: backup and restore questions
Date:	2004-02-20 21:48:40
Message-ID:	Pine.LNX.4.33.0402201444560.12372-100000@css120.ihs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Fri, 20 Feb 2004, Sally Sally wrote:

> Thank you all for replying. I appreciate the tips. Apologies to those who
> were offended by the html formating.
> Scott, quick question. The reason I assumed insert would be "safer" than
> copy is because the docs say that in the case of copy it fails on a single
> corrupted row whereas insert won't?

right, but that's not exactly "safer" from a data integrity standpoint.

After all, if you insert 100M rows, and one fails, you now have 99.99999%
of your data in your database. If you don't check the import for errors,
you now have an incomplete load, and may not know it.

Wrapping the inserts inside of a begin;commit; pair is a neat trick to
both speed things up (postgresql is MUCH faster inserting 100M rows in one
big transaction than individually) AND it makes sure you don't get a
partial load.

Partial loads are a SERIOUS issue when you're importing 10k rows a day
into a database with millions of rows. You usually want all or nothing.

And don't worry about me being offended by html formatting, my email
client, pine, automatically strips that stuff out of your emails. Of
course, others, with their primitive email clients (haha, like there's
anything more primitive than pine :-) will appreciate the change.

What's funny is that when they reply to your email, then I see the html in
the stuff they quoted, but not in your original email.

From:	"scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To:	Sally Sally <dedeb17(at)hotmail(dot)com>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: backup and restore questions
Date:	2004-02-23 16:17:52
Message-ID:	Pine.LNX.4.33.0402230908330.28821-100000@css120.ihs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Fri, 20 Feb 2004, Sally Sally wrote:

Right. What that means in plain terms though is that a single rows causes
an entire import of a table to fail. While individual inserts, which are
handled by individual transaction, can individually fail.

You can, however, edit the dump / extract a portion of it and wrap it in a
begin / commit pair. Note that postgresql will not commit any transaction
with an error, so you don't have to worry about it accidentally commiting
if the data errors out.

Also, collecting as many inserts as possible in a transaction will
generally make postgresql faster, up to a point. While there's no great
gain in inserting any more than a few thousand rows at a time, there's no
real harm in inserting many more (million). Unlike Oracle, which uses
rollback segments, postgresql uses the free disk space to just add new
tuples, so there's no real world limit to the size of your transactions,
except for the real world issue that a transaction taking that long to
insert rows may be an issue if you need to see the data from other clients
as it comes in.

In terms of importing, it may often be that you just want the good rows,
dump the bad, and move on. If this is the case, individual inserts are
the best choice. It is also fairly slow due to the fact that Postgesql
must build up and tear down a transaction for each row.

you may also have data that every row must go in, or you don't want any of
it. If this is the case, either the copy command, or a series of inserts
inside the same transaction are the best choice. They are also
the fastest, with copy slightly outperforming the inserts, at least in the
past. I haven't really tested one against the other lately because with
7.4 it's all so damned fast I only take about 15 minutes to backup or
restore our whole database.