Re: Number of Columns and Update

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Number of Columns and Update
Date: 2014-12-22 21:40:13
Message-ID: 54988FBD.1050506@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 12/22/2014 10:53 PM, Robert DiFalco wrote:
> This may fall into the category of over-optimization but I've become
> curious.
>
> I have a user table with about 14 columns that are all 1:1 data - so they
> can't be normalized.
>
> When I insert a row all columns need to be set. But when I update, I
> sometimes only update 1-2 columns at a time. Does the number of columns
> impact update speed?
>
> For example:
> UPDATE users SET email = ? WHERE id = ?;

Yes, the number of columns in the table matters. The update is just as
expensive regardless of how many of the columns you update.

When a row is updated, PostgreSQL creates a new version of the whole
row. The new row version takes more space when the table has more
columns, leading to more bloating of the table, which generally slows
things down. In most applications the difference isn't big enough to
matter, but it can be significant if you have very wide rows, and you
update a lot.

PostgreSQL 9.4 made an improvement on this. In earlier versions, the new
row version was also included completely in the WAL record, which added
overhead. In 9.4, any columns at the beginning or end of the row that
are not modified are left out of the WAL record, as long as the new row
version is stored on the same page as the old one (which is common). For
updating a single column, or a few columns that are next to each other,
that's the same as saying that only the modified part of the row is
WAL-logged.

> I can easily break this up into logical tables like user_profile,
> user_credential, user_contact_info, user_summary, etc with each table only
> having 1-4 columns. But with the multiple tables I would often be joining
> them to bring back a collection of columns.

That would help with the above-mentioned issues, but dealing with
multiple tables certainly adds a lot of overhead too. Most likely you're
better off just having the single table, after all.

- Heikki

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Dunstan 2014-12-22 21:41:54 Re: Number of Columns and Update
Previous Message Robert DiFalco 2014-12-22 20:53:03 Number of Columns and Update