Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search archives
  Advanced Search

Re: Moving data from one set of tables to another?


  • From: Howard Eglowstein <howard(at)yankeescientific(dot)com>
  • To: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
  • Cc: pgsql-novice(at)postgresql(dot)org
  • Subject: Re: Moving data from one set of tables to another?
  • Date: Fri, 19 Sep 2008 10:48:08 -0400
  • Message-id: <48D3BBA8.9020008@yankeescientific.com> <text/plain>

Absolutely true, and if the data weren't stored on the same machine which is running the client, I would have worked harder to combine statements. In this case though, the server and the data are on the same machine and the client application doing the SELECT, INSERT and DELETEs is also on the same machine.

I'd like to see how to have done this with combined statements if I ever have to do it again in a different setup, but it is working well now. It's moved about 1/2 million records so far since last night.

Thanks again!

Howard

Sean Davis wrote:
On Fri, Sep 19, 2008 at 7:42 AM, Howard Eglowstein
<howard(at)yankeescientific(dot)com> wrote:
So you'd agree then that I'll need 7 SQL statements but that I could stack
the INSERT and the first SELECT if I wanted to? Cool. That's what I ended up
with in C code and it's working pretty well. I did some indexing on the
database and got the whole transaction down to about 150ms for the sequence.
I guess that's as good as it's going to get.

Keep in mind that the INSERT [...] SELECT [...] is done server-side,
so the data never goes over the wire to the client.  This is very
different than doing the select, accumulating the data, and then doing
the insert and is likely to be much faster, relatively.  150ms is
already pretty fast, but the principle of doing as much on the server
as possible is an important one when looking for efficiency,
especially when data sizes are large.

Glad to hear that it is working.

Sean

Sean Davis wrote:
On Thu, Sep 18, 2008 at 7:28 PM, Howard Eglowstein
<howard(at)yankeescientific(dot)com> wrote:

What confuses me is that I need to do the one select with all three
tables
and then do three inserts, no? The results is that the 150 fields I get
back
from the select have to be split into 3 groups of 50 fields each and then
written into three tables.

You do the insert part of the command three times, once for each new
table, so three separate SQL statements.  The select remains basically
the same for all three, with only the column selection changing
(data_a.* when inserting into new_a, data_b.* when inserting into
new_b, etc.).  Just leave the ids the same as in the first set of
tables.  There isn't a need to change them in nearly every case.  If
you need to add a new ID column, you can do that as a serial column in
the new tables, but I would stick to the original IDs, if possible.

Sean


What you're suggesting is that there is some statement which could do the
select and the three inserts at once?

Howard

Sean Davis wrote:

You might want to look at insert into ... select ...

You should be able to do this with 1 query per new table (+ the
deletes, obviously).  For a few thousand records, I would expect that
the entire process might take a few seconds.

Sean

     On Thu, Sep 18, 2008 at 6:39 PM, Howard Eglowstein
<howard(at)yankeescientific(dot)com> wrote:


Somewhat empty, yes. The single set of 'data_' tables contains 3 years
worth
of data. I want to move 2 years worth out into the 'new_' tables. When
I'm
done, there will still be 1 year's worth of data left in the original
table.

Howard

Carol Walter wrote:


What do you want for your end product?  Are the old tables empty after
you
put the data into the new tables?

Carol

On Sep 18, 2008, at 3:02 PM, Howard Eglowstein wrote:



I have three tables called 'data_a', 'data_b' and 'data_c' which each
have 50 columns. One of the columns in each is 'id' and is used to
keep
track of which data in data_b and data_c corresponds to a row in
data_a.  If
I want to get all of the data in all 150 fields for this month (for
example), I can get it with:

select * from (data_a, data_b, data_c) where data_a.id=data_b.id AND
data_a.id = data_c.id AND timestamp >= '2008-09-01 00:00:00' and
timestamp
<= '2008-09-30 23:59:59'


What I need to do is execute this search which might return several
thousand rows and write the same structure into 'new_a', 'new_b' and
'new_c'. What i'm doing now in a C program is executing the search
above.
Then I execute:

INSERT INTO data_a (timestamp, field1, field2 ...[imagine 50 of
them])
VALUES ('2008-09-01 00:00:00', 'ABC', 'DEF', ...);
Get the ID that was assigned to this row since 'id' is a serial field
and
the number is assigned sequentially. Say it comes back as '1'.
INSERT INTO data_b (id, field1, field2 ...[imagine 50 of them])
VALUES
('1', 'ABC', 'DEF', ...);
INSERT INTO data_c (id, field1, field2 ...[imagine 50 of them])
VALUES
('1', 'ABC', 'DEF', ...);

That moves a copy of the three rows of data form the three tables
into
the three separate new tables.
From the original group of tables, the id for these rows was, let's
say,
'1234'. Then I execute:

DELETE FROM data_a where id='1234';
DELETE FROM data_b where id='1234';
DELETE FROM data_c where id='1234';

That deletes the old data.

This works fine and gives me exactly what I wanted, but is there a
better
way? This is 7 SQL calls and it takes about 3 seconds per moved
record
on
our Linux box.

Any thoughts or suggestions would be appreciated.

Thanks,

Howard

--
Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice


------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
270.6.21/1678 - Release Date: 9/18/2008 9:01 AM




--
Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice



 ------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
270.6.21/1678 - Release Date: 9/18/2008 9:01 AM



------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database:
270.7.0/1679 - Release Date: 9/18/2008 5:03 PM


>
------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database: 270.7.0/1679 - Release Date: 9/18/2008 5:03 PM





Home | Main Index | Thread Index

Privacy Policy | About PostgreSQL
Copyright © 1996 – 2012 PostgreSQL Global Development Group