Re: Producer/Consumer Issues in the COPY across network

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Producer/Consumer Issues in the COPY across network
Date: 2008-02-26 11:00:33
Message-ID: 1204023633.4252.225.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm looking at ways to reduce the number of network calls and/or the
waiting time while we perform network COPY.

The COPY calls in libpq allow asynchronous actions, yet are coded in a
synchronous manner in pg_dump, Slony and psql \copy.

Does anybody have any experience with running COPY in asynchronous
mode?

When we're running a COPY over a high latency link then network time is
going to become dominant, so potentially, running COPY asynchronously
might help performance for loads or initial Slony configuration. This is
potentially more important on Slony where we do both a PQgetCopyData()
and PQputCopyData() in a tight loop.

I also note that PQgetCopyData always returns just one row. Is there an
underlying buffering between the protocol (which always sends one
message per row) and libpq (which is one call per row)? It seems
possible for us to request a number of rows from the server up to a
preferred total transfer size.

PQputCopyData seems to be more efficient with smaller rows.

Ideas? Experience?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Producer/Consumer Issues in the COPY across network
Date: 2008-02-26 11:29:53
Message-ID: 20080226112953.GC14945@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 26, 2008 at 11:00:33AM +0000, Simon Riggs wrote:
> I'm looking at ways to reduce the number of network calls and/or the
> waiting time while we perform network COPY.
>
> The COPY calls in libpq allow asynchronous actions, yet are coded in a
> synchronous manner in pg_dump, Slony and psql \copy.

I don't think it's the synchronous/asynchronous mode that's making the
difference. Rather, usually the network stack will coalesce packets
into larger chunks to improve performance. I wonder whether it's COPY
interacting badly with the TCP_NODELAY option (which disables the
coalescing).

> When we're running a COPY over a high latency link then network time is
> going to become dominant, so potentially, running COPY asynchronously
> might help performance for loads or initial Slony configuration. This is
> potentially more important on Slony where we do both a PQgetCopyData()
> and PQputCopyData() in a tight loop.

When you check the packets being sent, are you showing only one record
being sent per packet? If so, there's your problem.

> I also note that PQgetCopyData always returns just one row. Is there an
> underlying buffering between the protocol (which always sends one
> message per row) and libpq (which is one call per row)? It seems
> possible for us to request a number of rows from the server up to a
> preferred total transfer size.

AIUI the server merely streams the rows to you, the client doesn't get
to say how many :)

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
> -- John F Kennedy


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Producer/Consumer Issues in the COPY across network
Date: 2008-02-28 01:57:49
Message-ID: 1204163869.4252.758.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2008-02-26 at 12:29 +0100, Martijn van Oosterhout wrote:

> > When we're running a COPY over a high latency link then network time is
> > going to become dominant, so potentially, running COPY asynchronously
> > might help performance for loads or initial Slony configuration. This is
> > potentially more important on Slony where we do both a PQgetCopyData()
> > and PQputCopyData() in a tight loop.
>
> When you check the packets being sent, are you showing only one record
> being sent per packet? If so, there's your problem.

I've not inspected the packet flow. It seemed easier to ask.

> > I also note that PQgetCopyData always returns just one row. Is there an
> > underlying buffering between the protocol (which always sends one
> > message per row) and libpq (which is one call per row)? It seems
> > possible for us to request a number of rows from the server up to a
> > preferred total transfer size.
>
> AIUI the server merely streams the rows to you, the client doesn't get
> to say how many :)

Right, but presumably we generate a new message per PQgetCopyData()
request? So my presumption is we need to wait for that to be generated
each time?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Producer/Consumer Issues in the COPY across network
Date: 2008-02-28 14:39:53
Message-ID: 20080228143953.GA27658@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 28, 2008 at 01:57:49AM +0000, Simon Riggs wrote:
> >
> > AIUI the server merely streams the rows to you, the client doesn't get
> > to say how many :)
>
> Right, but presumably we generate a new message per PQgetCopyData()
> request? So my presumption is we need to wait for that to be generated
> each time?

No, PQgetCopyData() doesn't send anything. It merely reads what's in
the kernel socket buffer to a local buffer and when it has a complete
line it mallocs a string and returns it to you.

Similarly, PQputCopyData() doesn't expect anything from the server
during transmission.

That's why I was wondering about the rows per packet. Sending bigger
packets reduces overall overhead.

(The malloc/free per row doesn't seem too efficient.)

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
> -- John F Kennedy


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Producer/Consumer Issues in the COPY across network
Date: 2008-02-28 21:37:58
Message-ID: 1204234678.4223.26.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2008-02-28 at 15:39 +0100, Martijn van Oosterhout wrote:

> That's why I was wondering about the rows per packet. Sending bigger
> packets reduces overall overhead.
>
> (The malloc/free per row doesn't seem too efficient.)

I guess neither of us know then. Oh well. That's good 'cos it sounds
like something worth looking into if anybody has a protocol sniffer and
some time. I'll skip on that test 'cos its not really my area.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk