Delete performance

Lists: pgsql-admin
From: adey <adey11(at)gmail(dot)com>
To: Pgsql-Admin <pgsql-admin(at)postgresql(dot)org>
Subject: Delete performance
Date: 2006-02-21 07:02:53
Message-ID: 1c66bda80602202302u2f6f6bc0jcfc3c728940ad2d0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Please give me some guidance?
We are attempting many deletes in our production database for the first
time, and we're getting nowhere fast.
The SQL runs for more than 12 hours to delete 2 million rows, and hasn't
finished each time we've tried it as we've had to cancel it.
I have tried running queries for locks, current activity, and buffer hits. I
can see row locks on the affected tables for the delete PID, but no
significant buffer hits or changes in row numbers while it is running. We
have fsync set to default (true) with default 8 buffers. Postgres 7.4.2 is
running on Debian on a 4 processor server with 4gb RAM. TOP shows cache
increasing slowly, and postmaster using at least 1 CPU 100%. pg_clog files
swap about every 4 hours. We Vacuum (no parms) and ANALYZE daily, but no
VACUUM FULL for months. Delete is being performed on a parent table of 11
million rows, related to 5 child tables by foreign keys with ON DELETE
CASCADE. We have followed previous advice in this forum and tweaked /
increased the "famous" performance parameters in v7 such as
effective_cache_size, vacuum_mem and buffer size with associated SHMMAX
increase.

Where to next please?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: adey <adey11(at)gmail(dot)com>
Cc: Pgsql-Admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Delete performance
Date: 2006-02-21 15:01:27
Message-ID: 15417.1140534087@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

adey <adey11(at)gmail(dot)com> writes:
> We are attempting many deletes in our production database for the first
> time, and we're getting nowhere fast.
> The SQL runs for more than 12 hours to delete 2 million rows, and hasn't
> finished each time we've tried it as we've had to cancel it.

The usual cause of slow deletes is that (a) the table is the target of
some foreign key references from other large tables, and (b) the
referencing columns in those tables aren't indexed, or (in older PG
versions such as 7.4) aren't exactly the same datatype as the master
column. This forces the FK actions to use inefficient sequential-scan
plans. Fix the index situation and then start a fresh session to ensure
you have fresh FK-action plans.

Please also think *hard* about running something more modern than 7.4.2.
That release series is at 7.4.12 --- you are missing nearly two years'
worth of critical bug fixes.

regards, tom lane


From: Arnau <arnaulist(at)andromeiberica(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Delete performance
Date: 2006-02-22 09:09:50
Message-ID: 43FC2A5E.4080601@andromeiberica.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi all,

> The usual cause of slow deletes is that (a) the table is the target of
> some foreign key references from other large tables, and (b) the
> referencing columns in those tables aren't indexed.

This is a thing I don't understand, as far as I know the foreign keys
references to primary keys and postgresql creates itself and index over
the primary key, so those columns always should be indexed. Taking into
account Tom's observation I'm missing something, could you explain it to
all of us :)

Thanks
--
Arnau


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: arnaulist(at)andromeiberica(dot)com
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Delete performance
Date: 2006-02-22 14:31:41
Message-ID: 11337.1140618701@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Arnau <arnaulist(at)andromeiberica(dot)com> writes:
>> The usual cause of slow deletes is that (a) the table is the target of
>> some foreign key references from other large tables, and (b) the
>> referencing columns in those tables aren't indexed.

> This is a thing I don't understand, as far as I know the foreign keys
> references to primary keys and postgresql creates itself and index over
> the primary key, so those columns always should be indexed. Taking into
> account Tom's observation I'm missing something, could you explain it to
> all of us :)

The referencED column is forced to have an index. The referencING
column is not. The cases where you need an index on the latter are
precisely updates/deletes of the referencED column.

In the old version you are using you can also get burnt by datatype
mismatches --- the foreign key mechanism will allow that as long as
it can find an equality operator for the two types, but that equality
operator might not be indexable.

regards, tom lane


From: Arnau <arnaulist(at)andromeiberica(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Delete performance
Date: 2006-02-23 10:27:43
Message-ID: 43FD8E1F.8020704@andromeiberica.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi all,

Maybe the direction this thread has taken is a bit out of the scope
of this mailing list, but I think it's very interesting and can be
useful for newbie users.

>
>>>The usual cause of slow deletes is that (a) the table is the target of
>>>some foreign key references from other large tables, and (b) the
>>>referencing columns in those tables aren't indexed.
>
>
>> This is a thing I don't understand, as far as I know the foreign keys
>>references to primary keys and postgresql creates itself and index over
>>the primary key, so those columns always should be indexed. Taking into
>>account Tom's observation I'm missing something, could you explain it to
>>all of us :)
>
>
> The referencED column is forced to have an index. The referencING
> column is not. The cases where you need an index on the latter are
> precisely updates/deletes of the referencED column.
>
> In the old version you are using you can also get burnt by datatype
> mismatches --- the foreign key mechanism will allow that as long as
> it can find an equality operator for the two types, but that equality
> operator might not be indexable.

Lets put an example

CREATE TABLE departments
(
id INT2
CONSTRAINT pk_dept_id PRIMARY KEY,
name VARCHAR(50)
CONSTRAINT nn_dept_name NOT NULL
);

CREATE TABLE users
(
id INT8
CONSTRAINT pk_users_id PRIMARY KEY,
name VARCHAR(50)
CONSTRAINT nn_users_name NOT NULL,
department_id INT2
CONSTRAINT fk_users_deptid REFERENCES departments(id)
CONSTRAINT nn_users_deptid NOT NULL
)

Do we should create the following index?

CREATE INDEX idx_users_deptid ON users(department_id)

Could we say as rule of thumb the following: "Create an index for
each table's foreign key"?

Regards
--
Arnau


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: arnaulist(at)andromeiberica(dot)com
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Delete performance
Date: 2006-02-23 16:35:00
Message-ID: 22825.1140712500@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Arnau <arnaulist(at)andromeiberica(dot)com> writes:
>> The referencED column is forced to have an index. The referencING
>> column is not. The cases where you need an index on the latter are
>> precisely updates/deletes of the referencED column.

> Lets put an example

> CREATE TABLE departments
> (
> id INT2
> CONSTRAINT pk_dept_id PRIMARY KEY,
> name VARCHAR(50)
> CONSTRAINT nn_dept_name NOT NULL
> );

> CREATE TABLE users
> (
> id INT8
> CONSTRAINT pk_users_id PRIMARY KEY,
> name VARCHAR(50)
> CONSTRAINT nn_users_name NOT NULL,
> department_id INT2
> CONSTRAINT fk_users_deptid REFERENCES departments(id)
> CONSTRAINT nn_users_deptid NOT NULL
> )

> Do we should create the following index?

> CREATE INDEX idx_users_deptid ON users(department_id)

Yes, if you are concerned about the performance of updates/deletes on
the departments table. The reason the system doesn't make such an index
automatically is that there are common scenarios where you seldom or
never update the master table, and so the index wouldn't repay the cost
it creates for updates of the slave table.

regards, tom lane


From: Arnau <arnaulist(at)andromeiberica(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Delete performance
Date: 2006-02-23 16:45:12
Message-ID: 43FDE698.4060707@andromeiberica.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

> Yes, if you are concerned about the performance of updates/deletes on
> the departments table. The reason the system doesn't make such an index
> automatically is that there are common scenarios where you seldom or
> never update the master table, and so the index wouldn't repay the cost
> it creates for updates of the slave table.

Thanks for the explanations :-)

regards
--
Arnau