Re: Running update in chunks?

Lists: pgsql-general
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Richard Huxton" <dev(at)archonet(dot)com>,pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 19:06:03
Message-ID: 20130121190604.120600@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Richard Huxton wrote:

> The only differences I can think of are WAL logging (transaction
> log) and index updates (the temp table has no indexes).

What about foreign keys? Are there any tables which reference the
updated column in a foreign key declaration? Do they have indexes
on that column?

-Kevin


From: Tim Uckun <timuckun(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 20:09:31
Message-ID: CAGuHJrMPKf-E6wb40F7UCWFxFA9=6mi4e=N8OHqBnZGpqS50qQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Just to close this up and give some guidance to future googlers...

There are no foreign keys. The table is empty when I start. I copy the
data into it from a text file.

Removing indexes didn't help much (made a very slight difference).

running a query CREATE TEMP TABLE tt AS SELECT .... using a massive
join takes about 8 seconds. I presume that's the baseline for the disk
and RAM given my current postgres configuration. Note that this is
not a satisfactory option for me because I can't do what I want in one
step (the update I specified is one of many).

running a very simple update "UPDATE imports set make_id = null"
takes over 50 seconds so that's the minimum amount of time any update
is going to take.

Running a complex update where I join all the tables together and
update all the fields takes about 106 seconds.

Just running a complex select with the joins takes no time at all.

I tried chunking the updates using chunks of 100 records and 1000
records (where ID between X and Y repeatedly) and it was even slower.

Conclusion. Updates on postgres are slow (given the default
postgresql.conf). I presume this is due to MVCC or the WAL or
something and there are probably some things I can do to tweak the
conf file to make them go faster but out of the box running an update
on a table with lots of rows is going to cost you a lot. Removing the
indexes doesn't help that much.

Suggestion for the PG team. Deliver a more realistic postgres.conf by
default. The default one seems to be aimed at ten year old PCs with
very little RAM and disk space. At least deliver additional conf files
for small, medium, large, huge setups.


From: Richard Huxton <dev(at)archonet(dot)com>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 21:23:50
Message-ID: 50FDB1E6.7040400@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 21/01/13 20:09, Tim Uckun wrote:
> Just to close this up and give some guidance to future googlers...
Careful, future googlers.
> Conclusion. Updates on postgres are slow
Nope.

> (given the default
> postgresql.conf). I presume this is due to MVCC or the WAL or
> something and there are probably some things I can do to tweak the
> conf file to make them go faster but out of the box running an update
> on a table with lots of rows is going to cost you a lot.
Unlikely. Do you really think that a PostgreSQL installation typically
runs 100 times slower on updates than inserts and every other user has
just said "oh, that's ok then"? Or is it more likely that something
peculiar is broken on your setup.

> Removing the indexes doesn't help that much.
>
> Suggestion for the PG team. Deliver a more realistic postgres.conf by
> default. The default one seems to be aimed at ten year old PCs with
> very little RAM and disk space. At least deliver additional conf files
> for small, medium, large, huge setups.
--
Richard Huxton


From: patrick keshishian <pkeshish(at)gmail(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 21:31:55
Message-ID: CAN0yQBogobRJPHsuPwTh3v5qZtphhw3YUc5iNp0gQrtQCeEtbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Jan 21, 2013 at 1:23 PM, Richard Huxton <dev(at)archonet(dot)com> wrote:
> On 21/01/13 20:09, Tim Uckun wrote:
>>
>> Just to close this up and give some guidance to future googlers...
>
> Careful, future googlers.
>
>> Conclusion. Updates on postgres are slow
>
> Nope.
>
>
>> (given the default
>> postgresql.conf). I presume this is due to MVCC or the WAL or
>> something and there are probably some things I can do to tweak the
>> conf file to make them go faster but out of the box running an update
>> on a table with lots of rows is going to cost you a lot.
>
> Unlikely. Do you really think that a PostgreSQL installation typically runs
> 100 times slower on updates than inserts and every other user has just said
> "oh, that's ok then"? Or is it more likely that something peculiar is broken
> on your setup.
>
>
>> Removing the indexes doesn't help that much.
>>
>> Suggestion for the PG team. Deliver a more realistic postgres.conf by
>> default. The default one seems to be aimed at ten year old PCs with
>> very little RAM and disk space. At least deliver additional conf files
>> for small, medium, large, huge setups.

I'd be curious to see results of the same "update" on a standard HDD
vs the SSD, and maybe on a more typical database deployment hardware
vs a macbook air.

--patrick


From: Tim Uckun <timuckun(at)gmail(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 21:45:24
Message-ID: CAGuHJrMSXQePXC0Lg6ztLO3ZADSBsdyr8gPHKZP4BrLPxK98hQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> Nope.

If you have any suggestions I am all ears. For the purposes of this
discussion we can narrow down the problem this update statement.

Update imports set make_id = null.

There are 98K records in the table. There is no index on the make_id
field. Standard untouched postgresql.conf from the brew install of
postgres.

> Unlikely. Do you really think that a PostgreSQL installation typically runs
> 100 times slower on updates than inserts and every other user has just said
> "oh, that's ok then"? Or is it more likely that something peculiar is broken
> on your setup.

I really don't know. That's why I am here asking. I don't think
anything particular is broken with my system. As mentioned above the
setup is really simple. Standard postgres install, the default conf
file, update one field on one table. It takes fifty plus seconds.

I concede that if I was to go into the postgres.conf and make some
changes it will probably run faster (maybe much faster) but I wanted
to exhaust other factors before I went messing with the default
install.


From: Tim Uckun <timuckun(at)gmail(dot)com>
To: patrick keshishian <pkeshish(at)gmail(dot)com>
Cc: Richard Huxton <dev(at)archonet(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 21:48:36
Message-ID: CAGuHJrNk7gMQMTJ-PQiXAwb6mXM+x7rRLkhnf7ONs2J7fQM=2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> I'd be curious to see results of the same "update" on a standard HDD
> vs the SSD, and maybe on a more typical database deployment hardware
> vs a macbook air.
>

I haven't tried it on any other machine yet. CREATE TEMP TABLE tt as
SELECT ... takes eight seconds so presumably the disk is not the
choke point.


From: patrick keshishian <pkeshish(at)gmail(dot)com>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 22:07:53
Message-ID: CAN0yQBruwRuETXecK3JZRewXjyWNWr-r+KDD3F0aBK4xw40eVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Jan 21, 2013 at 1:48 PM, Tim Uckun <timuckun(at)gmail(dot)com> wrote:
>> I'd be curious to see results of the same "update" on a standard HDD
>> vs the SSD, and maybe on a more typical database deployment hardware
>> vs a macbook air.
>>
>
>
> I haven't tried it on any other machine yet. CREATE TEMP TABLE tt as
> SELECT ... takes eight seconds so presumably the disk is not the
> choke point.

you are making an assumption that a fresh write is the same as a
re-write. try the test.

--patrick