Re: Deleting vs foreign keys

Lists: pgsql-general
From: WireSpot <wirespot(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Deleting vs foreign keys
Date: 2005-10-24 21:59:27
Message-ID: b2d4b0380510241459u49c1d4d4o78b98df4e39d2484@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have an application that makes heavy use of foreign keys all over
the tables. This is very nice since the data is very consistent. There
also this "central" table which holds "sites" in it. A site pretty
much is the crux of it all. Deleting a site will very precisely
eliminate all data regarding it, since there's CASCADE on delete's
everywhere.

The only trouble I'm having is that the original developers apparently
didn't account for large amounts of data. I'm starting to get a LOT of
data in some tables, and nowadays deleting a site will take a
disgusting amount of time (in the range of tens of minutes). It's
impossible to do it via Web, so I have to issue the central delete
from the shell and leave it running until it's done.

Is there any way I can make things better? I could queue site drops
and have a cronjob pick them up instead of deleting "live" via Web,
but that's just silly patchwork IMHO.


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: WireSpot <wirespot(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting vs foreign keys
Date: 2005-10-24 23:39:18
Message-ID: 20051024233918.GA26633@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 25, 2005 at 12:59:27AM +0300, WireSpot wrote:
> I have an application that makes heavy use of foreign keys all over
> the tables. This is very nice since the data is very consistent. There
> also this "central" table which holds "sites" in it. A site pretty
> much is the crux of it all. Deleting a site will very precisely
> eliminate all data regarding it, since there's CASCADE on delete's
> everywhere.
>
> The only trouble I'm having is that the original developers apparently
> didn't account for large amounts of data. I'm starting to get a LOT of
> data in some tables, and nowadays deleting a site will take a
> disgusting amount of time (in the range of tens of minutes).

Are there indexes on the foreign key columns? That is, given
the following example,

CREATE TABLE foo (id integer PRIMARY KEY);
CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE);

do you have an index on bar.fooid? Also, do you regularly vacuum
and analyze the database?

--
Michael Fuhr


From: WireSpot <wirespot(at)gmail(dot)com>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 09:18:34
Message-ID: b2d4b0380510250218w758e7d21rcca840b8fa348ce0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/25/05, Michael Fuhr <mike(at)fuhr(dot)org> wrote:
> Are there indexes on the foreign key columns? That is, given
> the following example,
>
> CREATE TABLE foo (id integer PRIMARY KEY);
> CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE);
>
> do you have an index on bar.fooid?

There are no indexes on the referring fields in any tables. Would this
make a huge difference?

> Also, do you regularly vacuum and analyze the database?

Yes.


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: WireSpot <wirespot(at)gmail(dot)com>
Cc: Michael Fuhr <mike(at)fuhr(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 09:25:38
Message-ID: 20051025092537.GF22318@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 25, 2005 at 12:18:34PM +0300, WireSpot wrote:
> On 10/25/05, Michael Fuhr <mike(at)fuhr(dot)org> wrote:
> > Are there indexes on the foreign key columns? That is, given
> > the following example,
> >
> > CREATE TABLE foo (id integer PRIMARY KEY);
> > CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE);
> >
> > do you have an index on bar.fooid?
>
> There are no indexes on the referring fields in any tables. Would this
> make a huge difference?

If bar is at all large, yes. Consider if you delete something from foo.
The system has to then scan bar to find all matching fooids. If there
is no index on that column it will do a seq scan. If you delete
everything from foo, it will (sequentially) scan bar once for every row
in foo...

BTW, have you considered TRUNCATE? (although it may not work with
foreign keys).

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: WireSpot <wirespot(at)gmail(dot)com>
Cc: Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 09:33:15
Message-ID: 1130232795.27587.277.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2005-10-25 at 11:18, WireSpot wrote:
> On 10/25/05, Michael Fuhr <mike(at)fuhr(dot)org> wrote:
> > Are there indexes on the foreign key columns? That is, given
> > the following example,
> >
> > CREATE TABLE foo (id integer PRIMARY KEY);
> > CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE);
> >
> > do you have an index on bar.fooid?
>
> There are no indexes on the referring fields in any tables. Would this
> make a huge difference?

Yes it will. Think about it: when the delete cascades, the rows which
have to be deleted in the referring table have to be found first... if
there's no index, a full table scan will be done for that. An index will
allow the delete process to quickly find the referring rows which have
to be deleted too. So it's a good practice to create indexes on fields
referring to parent tables.

Cheers,
Csaba.


From: WireSpot <wirespot(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 09:45:18
Message-ID: b2d4b0380510250245u491f71d7m925eeb00cf7c3e29@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/25/05, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> BTW, have you considered TRUNCATE? (although it may not work with
> foreign keys).

It doesn't :)

"TRUNCATE cannot be used if there are foreign-key references to the
table from other tables. Checking validity in such cases would require
table scans, and the whole point is not to do one."

Besides, emptying entire tables never happens, there's just selective
deletes from the "master" table.

Thanks for the tip, guys. I'll go punch in some indexes and I'll be
back to report how much of a difference it made.


From: WireSpot <wirespot(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 15:01:37
Message-ID: b2d4b0380510250801l7875a30dv62c0564d436f1c33@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/25/05, WireSpot <wirespot(at)gmail(dot)com> wrote:
> Thanks for the tip, guys. I'll go punch in some indexes and I'll be
> back to report how much of a difference it made.

Adding indexes made the dropping of entries for the "master" table
roughly 6 (six) times faster. It's definitely an improvement and I
suspect it will benefit the application in many other ways.

It's still not perfect though. Based on the figures I've got, some of
the biggest entries in the "master" table would still take about 45
minutes to delete along with all their foreign key referrals. So as
far as practical purposes are concerned, I'm back to square one.

Fortunately, in this case, all the tables hold a redundant site ID.
The only practical alternative I see is to drop all the foreign keys,
delete from all the tables based on that redundant ID, then add the
keys back and hope I don't hit any conflicts.

This is still not perfect, because while I do this there cannot be any
regular access to the database (can't afford to while foreign keys are
down). But if the drop becomes blazing fast in this manner, it would
be worth it... I guess.

Any advice about what went wrong when this database was designed? Was
there any way that the designers could've kept the consistency offered
by foreign keys and at the same time allow for fast deletes?


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: WireSpot <wirespot(at)gmail(dot)com>
Cc: Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 15:24:58
Message-ID: 1130253897.27587.286.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I guess your master table has a considerable tree of child records for
each deleted record, that's why the deletion takes so long. We have this
situation too on top level object deletion. Our solution is to make it
asynchronous, i.e. the user requests it and then a background job does
it. There's no way to make such deletes blazing fast.
Other solutions would be to not delete at all, but mark the top level
record as deleted (using a deleted flag on it). That has the
disadvantage that it's children can still be accessed, unless all access
code checks the master active flag, even for child access. Other
disadvantage is that all the data still is in the data base, taking up
space, possibly slowing down operation. Now that could also be an
advantage if you discover you shouldn't have been deleted that record
just after pushing the "delete" button and receiving the confirmation
that everything was deleted...

Cheers,
Csaba.

On Tue, 2005-10-25 at 17:01, WireSpot wrote:
> On 10/25/05, WireSpot <wirespot(at)gmail(dot)com> wrote:
> > Thanks for the tip, guys. I'll go punch in some indexes and I'll be
> > back to report how much of a difference it made.
>
> Adding indexes made the dropping of entries for the "master" table
> roughly 6 (six) times faster. It's definitely an improvement and I
> suspect it will benefit the application in many other ways.
>
> It's still not perfect though. Based on the figures I've got, some of
> the biggest entries in the "master" table would still take about 45
> minutes to delete along with all their foreign key referrals. So as
> far as practical purposes are concerned, I'm back to square one.
>
> Fortunately, in this case, all the tables hold a redundant site ID.
> The only practical alternative I see is to drop all the foreign keys,
> delete from all the tables based on that redundant ID, then add the
> keys back and hope I don't hit any conflicts.
>
> This is still not perfect, because while I do this there cannot be any
> regular access to the database (can't afford to while foreign keys are
> down). But if the drop becomes blazing fast in this manner, it would
> be worth it... I guess.
>
> Any advice about what went wrong when this database was designed? Was
> there any way that the designers could've kept the consistency offered
> by foreign keys and at the same time allow for fast deletes?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


From: WireSpot <wirespot(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 16:44:16
Message-ID: b2d4b0380510250944m5d369a8bx7c460f2032e3a842@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/25/05, Csaba Nagy <nagy(at)ecircle-ag(dot)com> wrote:
> I guess your master table has a considerable tree of child records for
> each deleted record, that's why the deletion takes so long. We have this
> situation too on top level object deletion. Our solution is to make it
> asynchronous, i.e. the user requests it and then a background job does
> it. There's no way to make such deletes blazing fast.

It's a bit puzzling, frankly, to see a nice example of engineering
(which is what I consider a consistent DB design) foiled by a
performance and practical issue. :(

That's it then. I'll ask the designers of the application to consider
implementing the background delete, since it's obvious I can't solve
it satisfactorily solely on the DB side.


From: Richard Huxton <dev(at)archonet(dot)com>
To: WireSpot <wirespot(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting vs foreign keys
Date: 2005-10-25 17:11:49
Message-ID: 435E6755.9030008@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

WireSpot wrote:
> On 10/25/05, Csaba Nagy <nagy(at)ecircle-ag(dot)com> wrote:
>
>>I guess your master table has a considerable tree of child records for
>>each deleted record, that's why the deletion takes so long. We have this
>>situation too on top level object deletion. Our solution is to make it
>>asynchronous, i.e. the user requests it and then a background job does
>>it. There's no way to make such deletes blazing fast.
>
>
> It's a bit puzzling, frankly, to see a nice example of engineering
> (which is what I consider a consistent DB design) foiled by a
> performance and practical issue. :(

Don't kid yourself that there's a solution to this. If you've got a site
with a lot of data in a lot of dependent tables, then have to expend a
certain amount of effort to delete them.

Anything PG did to make your delete faster would almost certainly slow
down updates/inserts/selects. Managing a lot of data takes a lot of effort.

--
Richard Huxton
Archonet Ltd