Re: Backup/Restore of single table in multi TB database

Lists: pgsql-generalpgsql-performance
From: "John Smith" <sodgodofall(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Backup/Restore of single table in multi TB database
Date: 2008-05-07 20:02:57
Message-ID: b88f0d670805071302h469141e0ke29413edddb444e0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

Hi,

I have a large database (multiple TBs) where I'd like to be able to do
a backup/restore of just a particular table (call it foo). Because
the database is large, the time for a full backup would be
prohibitive. Also, whatever backup mechanism we do use needs to keep
the system online (i.e., users must still be allowed to update table
foo while we're taking the backup).

After reading the documentation, it seems like the following might
work. Suppose the database has two tables foo and bar, and we're only
interested in backing up table foo:

1. Call pg_start_backup

2. Use the pg_class table in the catalog to get the data file names
for tables foo and bar.

3. Copy the system files and the data file for foo. Skip the data file for bar.

4. Call pg_stop_backup()

5. Copy WAL files generated between 1. and 4. to another location.

Later, if we want to restore the database somewhere with just table
foo, we just use postgres's normal recovery mechanism and point it at
the files we backed up in 2. and the WAL files from 5.

Does anyone see a problem with this approach (e.g., correctness,
performance, etc.)? Or is there perhaps an alternative approach using
some other postgresql mechanism that I'm not aware of?

Thanks!
- John


From: "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>
To: "John Smith" <sodgodofall(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-05-07 20:09:45
Message-ID: e7f9235d0805071309o6d6d7778r6920e5df03a55d3b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

On Wed, May 7, 2008 at 4:02 PM, John Smith <sodgodofall(at)gmail(dot)com> wrote:

> Does anyone see a problem with this approach (e.g., correctness,
> performance, etc.)? Or is there perhaps an alternative approach using
> some other postgresql mechanism that I'm not aware of?

Did you already look at and reject pg_dump for some reason? You can
restrict it to specific tables to dump, and it can work concurrently
with a running system. Your database is large, but how large are the
individual tables you're interested in backing up? pg_dump will be
slower than a file copy, but may be sufficient for your purpose and
will have guaranteed correctness.

I'm fairly certain that you have to be very careful about doing simple
file copies while the system is running, as the files may end up out
of sync based on when each individual one is copied. I haven't done it
myself, but I do know that there are a lot of caveats that someone
with more experience doing that type of backup can hopefully point you
to.

--
- David T. Wilson
david(dot)t(dot)wilson(at)gmail(dot)com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "John Smith" <sodgodofall(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-05-07 20:11:25
Message-ID: 20080507131125.1dadb799@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

On Wed, 7 May 2008 13:02:57 -0700
"John Smith" <sodgodofall(at)gmail(dot)com> wrote:

> Hi,
>
> I have a large database (multiple TBs) where I'd like to be able to do
> a backup/restore of just a particular table (call it foo). Because
> the database is large, the time for a full backup would be
> prohibitive. Also, whatever backup mechanism we do use needs to keep
> the system online (i.e., users must still be allowed to update table
> foo while we're taking the backup).

> Does anyone see a problem with this approach (e.g., correctness,
> performance, etc.)? Or is there perhaps an alternative approach using
> some other postgresql mechanism that I'm not aware of?

Why are you not just using pg_dump -t ? Are you saying the backup of
the single table pg_dump takes to long? Perhaps you could use slony
with table sets?

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>
Cc: "John Smith" <sodgodofall(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-05-07 20:16:35
Message-ID: 20080507131635.436b6085@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

On Wed, 7 May 2008 16:09:45 -0400
"David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com> wrote:

> I'm fairly certain that you have to be very careful about doing simple
> file copies while the system is running, as the files may end up out
> of sync based on when each individual one is copied. I haven't done it
> myself, but I do know that there are a lot of caveats that someone
> with more experience doing that type of backup can hopefully point you
> to.

Besides the fact that it seems to be a fairly hacky thing to do... it
is going to be fragile. Consider:

(serverA) create table foo();
(serverB) create table foo();

(serverA) Insert stuff;
(serverA) Alter table foo add column;

Oops...

(serverA) alter table foo drop column;

You now have different version of the files than on serverb regardless
of the table name.

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: John Smith <sodgodofall(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-05-07 21:28:44
Message-ID: 1210195724.4268.231.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

On Wed, 2008-05-07 at 13:02 -0700, John Smith wrote:

> I have a large database (multiple TBs) where I'd like to be able to do
> a backup/restore of just a particular table (call it foo). Because
> the database is large, the time for a full backup would be
> prohibitive. Also, whatever backup mechanism we do use needs to keep
> the system online (i.e., users must still be allowed to update table
> foo while we're taking the backup).

Have a look at pg_snapclone. It's specifically designed to significantly
improve dump times for very large objects.

http://pgfoundry.org/projects/snapclone/

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "John Smith" <sodgodofall(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-05-07 21:41:01
Message-ID: 19037.1210196461@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

"John Smith" <sodgodofall(at)gmail(dot)com> writes:
> After reading the documentation, it seems like the following might
> work. Suppose the database has two tables foo and bar, and we're only
> interested in backing up table foo:

> 1. Call pg_start_backup

> 2. Use the pg_class table in the catalog to get the data file names
> for tables foo and bar.

> 3. Copy the system files and the data file for foo. Skip the data file for bar.

> 4. Call pg_stop_backup()

> 5. Copy WAL files generated between 1. and 4. to another location.

> Later, if we want to restore the database somewhere with just table
> foo, we just use postgres's normal recovery mechanism and point it at
> the files we backed up in 2. and the WAL files from 5.

> Does anyone see a problem with this approach

Yes: it will not work, not even a little bit, because the WAL files will
contain updates for all the tables. You can't just not have the tables
there during restore.

Why are you not using pg_dump?

regards, tom lane


From: "John Smith" <sodgodofall(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-05-07 22:24:22
Message-ID: b88f0d670805071524g1e4965cbif9b48a822dba961f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

Hi Tom,

Actually, I forgot to mention one more detail in my original post.
For the table that we're looking to backup, we also want to be able to
do incremental backups. pg_dump will cause the entire table to be
dumped out each time it is invoked.

With the pg_{start,stop}_backup approach, incremental backups could be
implemented by just rsync'ing the data files for example and applying
the incremental WALs. So if table foo didn't change very much since
the first backup, we would only need to rsync a small amount of data
plus the WALs to get an incremental backup for table foo.

Besides picking up data on unwanted tables from the WAL (e.g., bar
would appear in our recovered database even though we only wanted
foo), do you see any other problems with this pg_{start,stop}_backup
approach? Admittedly, it does seem a bit hacky.

Thanks,
- John

On Wed, May 7, 2008 at 2:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "John Smith" <sodgodofall(at)gmail(dot)com> writes:
> > After reading the documentation, it seems like the following might
> > work. Suppose the database has two tables foo and bar, and we're only
> > interested in backing up table foo:
>
> > 1. Call pg_start_backup
>
> > 2. Use the pg_class table in the catalog to get the data file names
> > for tables foo and bar.
>
> > 3. Copy the system files and the data file for foo. Skip the data file for bar.
>
> > 4. Call pg_stop_backup()
>
> > 5. Copy WAL files generated between 1. and 4. to another location.
>
> > Later, if we want to restore the database somewhere with just table
> > foo, we just use postgres's normal recovery mechanism and point it at
> > the files we backed up in 2. and the WAL files from 5.
>
> > Does anyone see a problem with this approach
>
> Yes: it will not work, not even a little bit, because the WAL files will
> contain updates for all the tables. You can't just not have the tables
> there during restore.
>
> Why are you not using pg_dump?
>
> regards, tom lane
>


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: John Smith <sodgodofall(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-05-08 06:25:16
Message-ID: 1210227916.4268.254.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

On Wed, 2008-05-07 at 15:24 -0700, John Smith wrote:

> Actually, I forgot to mention one more detail in my original post.
> For the table that we're looking to backup, we also want to be able to
> do incremental backups. pg_dump will cause the entire table to be
> dumped out each time it is invoked.
>
> With the pg_{start,stop}_backup approach, incremental backups could be
> implemented by just rsync'ing the data files for example and applying
> the incremental WALs. So if table foo didn't change very much since
> the first backup, we would only need to rsync a small amount of data
> plus the WALs to get an incremental backup for table foo.
>
> Besides picking up data on unwanted tables from the WAL (e.g., bar
> would appear in our recovered database even though we only wanted
> foo), do you see any other problems with this pg_{start,stop}_backup
> approach? Admittedly, it does seem a bit hacky.

You wouldn't be the first to ask to restore only a single table.

I can produce a custom version that does that if you like, though I'm
not sure that feature would be accepted into the main code.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Q Master <theqmaster(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Ubuntu question
Date: 2008-05-08 06:52:17
Message-ID: 4822A321.8030004@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

Hello,

I had postgresql 7.4 on ubuntu and over one year ago I moved to 8.2
Till now I was backing up my db via pgadmin remotely from windows but
now I want to do it from the ubuntu server.

When I run the command pgdump it said that the database is 8.2 but the
tool is 7.4 - my question is, where in the world is the pgdump for 8.2 -
I can't find it.

pg_dump, pg_dumpall are all in /usr/bin but where are the 8.2 ones ?

TIA,
Q


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Q Master <theqmaster(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Ubuntu question
Date: 2008-05-08 07:01:02
Message-ID: 20080508070102.GB16326@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

On Thu, May 08, 2008 at 01:52:17AM -0500, Q Master wrote:
> I had postgresql 7.4 on ubuntu and over one year ago I moved to 8.2
> Till now I was backing up my db via pgadmin remotely from windows but
> now I want to do it from the ubuntu server.

I suggest looking at the README.Debian for postgres, it contains much
important information you need to understand how multiple concurrently
installed versions work.

> When I run the command pgdump it said that the database is 8.2 but the
> tool is 7.4 - my question is, where in the world is the pgdump for 8.2 -
> I can't find it.
>
> pg_dump, pg_dumpall are all in /usr/bin but where are the 8.2 ones ?

First, check what you have installed with pg_lsclusters (this will give
you the port number). Normally you can specify the cluster directly to
pg_dump but if you want the actual binary go to:

/usr/lib/postgresql/<version>/bin/pg_dump.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.


From: Justin <justin(at)emproshunts(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [GENERAL] Ubuntu question
Date: 2008-05-08 07:47:59
Message-ID: 4822B02F.20505@emproshunts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

Q Master wrote:
> Hello,
>
> I had postgresql 7.4 on ubuntu and over one year ago I moved to 8.2
> Till now I was backing up my db via pgadmin remotely from windows but
> now I want to do it from the ubuntu server.
>
> When I run the command pgdump it said that the database is 8.2 but the
> tool is 7.4 - my question is, where in the world is the pgdump for 8.2
> - I can't find it.
>
> pg_dump, pg_dumpall are all in /usr/bin but where are the 8.2 ones ?
You need to download the pgcontrib package from ubuntu package site. I
use the gnome package manager from ubuntu to handle this plus it
automatically handles the updates if any apply

>
> TIA,
> Q
>
>
>


From: Justin <justin(at)emproshunts(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Ubuntu question
Date: 2008-05-08 07:48:43
Message-ID: 4822B05B.40402@emproshunts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

Q Master wrote:
> Hello,
>
> I had postgresql 7.4 on ubuntu and over one year ago I moved to 8.2
> Till now I was backing up my db via pgadmin remotely from windows but
> now I want to do it from the ubuntu server.
>
> When I run the command pgdump it said that the database is 8.2 but the
> tool is 7.4 - my question is, where in the world is the pgdump for 8.2
> - I can't find it.
>
> pg_dump, pg_dumpall are all in /usr/bin but where are the 8.2 ones ?
You need to download the pgcontrib package from ubuntu package site. I
use the gnome package manager from ubuntu to handle this plus it
automatically handles the updates if any apply

>
> TIA,
> Q
>
>
>


From: Justin <justin(at)emproshunts(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [GENERAL] Ubuntu question
Date: 2008-05-08 07:49:25
Message-ID: 4822B085.6030600@emproshunts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

sorry all i accident cross posted
fat fingered it

Justin wrote:
>
>
> Q Master wrote:
>> Hello,
>>
>> I had postgresql 7.4 on ubuntu and over one year ago I moved to 8.2
>> Till now I was backing up my db via pgadmin remotely from windows but
>> now I want to do it from the ubuntu server.
>>
>> When I run the command pgdump it said that the database is 8.2 but
>> the tool is 7.4 - my question is, where in the world is the pgdump
>> for 8.2 - I can't find it.
>>
>> pg_dump, pg_dumpall are all in /usr/bin but where are the 8.2 ones ?
> You need to download the pgcontrib package from ubuntu package site. I
> use the gnome package manager from ubuntu to handle this plus it
> automatically handles the updates if any apply
>
>>
>> TIA,
>> Q
>>
>>
>>
>
>
>


From: Francisco Reyes <lists(at)stringsutils(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: John Smith <sodgodofall(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-07-19 00:25:50
Message-ID: 4881348E.3050002@stringsutils.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance

Simon Riggs wrote:
> Have a look at pg_snapclone. It's specifically designed to significantly
> improve dump times for very large objects.
>
> http://pgfoundry.org/projects/snapclone/
>
Also, in case the original poster is not aware, by default pg_dump
allows to backup single tables.
Just add -t <table name>.

Does pg_snapclone works mostly on large rows or will it also be faster
than pg_dump for narrow tables?


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Francisco Reyes <lists(at)stringsutils(dot)com>
Cc: John Smith <sodgodofall(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Backup/Restore of single table in multi TB database
Date: 2008-07-19 09:02:32
Message-ID: 1216458152.19656.799.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-performance


On Fri, 2008-07-18 at 20:25 -0400, Francisco Reyes wrote:

> Does pg_snapclone works mostly on large rows or will it also be faster
> than pg_dump for narrow tables?

It allows you to run your dump in multiple pieces. Thats got nothing to
do with narrow or wide.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support