pg_dump insert with column names speedup

Lists: pgsql-hackers
From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_dump insert with column names speedup
Date: 2013-10-04 14:23:52
Message-ID: CAApHDvrORPUhLipG8-cbThnKPSq3QKCH7Osq9yJZsN00+kBRmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's a small patch which greatly increases the speed of
pg_dump --column-inserts.

Previous to this patch a string was being build for the names of the
columns for each row in the table... I've now changed this to only do this
once for the table.

I also did some clean ups to remove calling the va_args function to write
to the backup file when it was not needed, it instead now uses the puts
type function which should be faster.

Here are some benchmark results on how it performs:

*** patched ***
D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5850 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5790 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5700 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5760 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 5730 ms

*** unpatched ***

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9580 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9330 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9250 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9230 ms

D:\9.4\bin>timer pg_dump --column-inserts -f d:\postgres_backup.sql postgres
Ran for 9240 ms

The database contained a single 1 million record table with 6 columns.

Regards

David Rowley

Attachment Content-Type Size
pg_dump_colinsert_v0.1.patch application/octet-stream 2.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump insert with column names speedup
Date: 2013-10-04 15:28:57
Message-ID: 45954.1380900537@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> Here's a small patch which greatly increases the speed of
> pg_dump --column-inserts.

The reason why no one's paid any attention to the speed of that code path
is that if you care about dump/restore speed, you should be using the COPY
code paths instead. Is it really worth adding code and complexity to
pg_dump for this?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump insert with column names speedup
Date: 2013-10-04 17:21:16
Message-ID: CA+TgmoZUtmcjsnDuEMJSi3_WSW1T242turZixXmdtUp28H0xSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
>> Here's a small patch which greatly increases the speed of
>> pg_dump --column-inserts.
>
> The reason why no one's paid any attention to the speed of that code path
> is that if you care about dump/restore speed, you should be using the COPY
> code paths instead. Is it really worth adding code and complexity to
> pg_dump for this?

One possible reason to care about this is if you're trying to move
data to another database. The INSERT format is more portable.

Also, this isn't really adding any net code or complexity AFAICS.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump insert with column names speedup
Date: 2013-10-04 17:39:25
Message-ID: 20131004173925.GW2706@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> >> Here's a small patch which greatly increases the speed of
> >> pg_dump --column-inserts.
> >
> > The reason why no one's paid any attention to the speed of that code path
> > is that if you care about dump/restore speed, you should be using the COPY
> > code paths instead. Is it really worth adding code and complexity to
> > pg_dump for this?
>
> One possible reason to care about this is if you're trying to move
> data to another database. The INSERT format is more portable.
>
> Also, this isn't really adding any net code or complexity AFAICS.

Agreed- this looks more like a "gee, that makes a lot of sense" than a
"wow, that's way more complicated". Not a whole lot of point in
building up a known-to-be-constant string on every iteration of the
loop.

Thanks,

Stephen


From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump insert with column names speedup
Date: 2013-10-04 23:52:31
Message-ID: CAApHDvq7qxY3FwHHuFiHw0q_Qdqou71cX6eh8Njun5ym0NqwHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 5, 2013 at 6:39 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> > On Fri, Oct 4, 2013 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > >> Here's a small patch which greatly increases the speed of
> > >> pg_dump --column-inserts.
> > >
> > > The reason why no one's paid any attention to the speed of that code
> path
> > > is that if you care about dump/restore speed, you should be using the
> COPY
> > > code paths instead. Is it really worth adding code and complexity to
> > > pg_dump for this?
> >
> > One possible reason to care about this is if you're trying to move
> > data to another database. The INSERT format is more portable.
> >
> > Also, this isn't really adding any net code or complexity AFAICS.
>
> Agreed- this looks more like a "gee, that makes a lot of sense" than a
> "wow, that's way more complicated". Not a whole lot of point in
> building up a known-to-be-constant string on every iteration of the
> loop.
>
>
These words made me think that the changes I made were not quite enough to
satisfy this what you said.
I understand that most people won't use the --column-inserts feature, but
that's not really a great reason to have not very clever and wasteful code
in there. This fruit was so low hanging it was pretty much touching the
ground, so I couldn't resist fixing it when I saw it.

The attached revised patch goes a little further and prepares everything
that is constant on processing the first row, this now includes the "INSERT
INTO tablename " part. I don't think the changes make the code any harder
to read, the code which builds the staticStmt fits into my small laptop
screen.

The timings with my benchmark look something like:

Unpatched: 9200 ms
Version 0.1: 5700 ms
Version 0.2: 5250 ms

So it does shave off a bit more, for what it's worth.

David

> Thanks,
>
> Stephen
>

Attachment Content-Type Size
pg_dump_colinsert_v0.2.patch application/octet-stream 3.6 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump insert with column names speedup
Date: 2013-11-15 20:04:37
Message-ID: 52867E55.8050606@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

src/bin/pg_dump/pg_dump.c:1604: trailing whitespace.
+ staticStmt = createPQExpBuffer();
src/bin/pg_dump/pg_dump.c:1612: trailing whitespace.
+ else
src/bin/pg_dump/pg_dump.c:1614: trailing whitespace.
+ if (column_inserts)


From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump insert with column names speedup
Date: 2013-11-15 21:05:10
Message-ID: CAApHDvoZKYKNDFRbNcmzrTaQe7Gjx2Nbu=rjCE1HQvZaGMaztg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Nov 16, 2013 at 9:04 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> src/bin/pg_dump/pg_dump.c:1604: trailing whitespace.
> + staticStmt = createPQExpBuffer();
> src/bin/pg_dump/pg_dump.c:1612: trailing whitespace.
> + else
> src/bin/pg_dump/pg_dump.c:1614: trailing whitespace.
> + if (column_inserts)
>

The tailing white space is fixed in the attached patch.

Regards

David Rowley

Attachment Content-Type Size
pg_dump_colinsert_v0.3.patch application/octet-stream 3.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump insert with column names speedup
Date: 2013-11-15 23:03:27
Message-ID: 27744.1384556607@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> The tailing white space is fixed in the attached patch.

Applied with minor cosmetic adjustments.

regards, tom lane