Client/Server compression?

Lists: pgsql-hackers
From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Client/Server compression?
Date: 2002-03-14 14:43:58
Message-ID: 1016117038.27780.68.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just curious, and honestly I haven't looked, but is there any form of
compression between clients and servers? Has this been looked at?

Greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 18:20:01
Message-ID: 200203141820.g2EIK1401227@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Copeland wrote:
> Just curious, and honestly I haven't looked, but is there any form of
> compression between clients and servers? Has this been looked at?

This issues has never come up before. It is sort of like compressing an
FTP session. No one really does that. Is there value in trying it with
PostgreSQL?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 18:32:04
Message-ID: 1016130725.27761.75.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Well, it occurred to me that if a large result set were to be identified
before transport between a client and server, a significant amount of
bandwidth may be saved by using a moderate level of compression.
Especially with something like result sets, which I tend to believe may
lend it self well toward compression.

Unlike FTP which may be transferring (and often is) previously
compressed data, raw result sets being transfered between the server and
a remote client, IMOHO, would tend to compress rather well as I doubt
much of it would be true random data.

This may be of value for users with low bandwidth connectivity to their
servers or where bandwidth may already be at a premium.

The zlib exploit posting got me thinking about this.

Greg

On Thu, 2002-03-14 at 12:20, Bruce Momjian wrote:
> Greg Copeland wrote:
> > Just curious, and honestly I haven't looked, but is there any form of
> > compression between clients and servers? Has this been looked at?
>
> This issues has never come up before. It is sort of like compressing an
> FTP session. No one really does that. Is there value in trying it with
> PostgreSQL?
>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 19:35:38
Message-ID: 200203141935.g2EJZcj06341@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Copeland wrote:

Checking application/pgp-signature: FAILURE
-- Start of PGP signed section.
> Well, it occurred to me that if a large result set were to be identified
> before transport between a client and server, a significant amount of
> bandwidth may be saved by using a moderate level of compression.
> Especially with something like result sets, which I tend to believe may
> lend it self well toward compression.
>
> Unlike FTP which may be transferring (and often is) previously
> compressed data, raw result sets being transfered between the server and
> a remote client, IMOHO, would tend to compress rather well as I doubt
> much of it would be true random data.
>

I should have said compressing the HTTP protocol, not FTP.

> This may be of value for users with low bandwidth connectivity to their
> servers or where bandwidth may already be at a premium.

But don't slow links do the compression themselves, like PPP over a
modem?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: "Arguile" <arguile(at)lucentstudios(dot)com>
To: "PostgresSQL Hackers Mailing List" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:03:39
Message-ID: LLENKEMIODLDJNHBEFBOKEFFEGAA.arguile@lucentstudios.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
>
> Greg Copeland wrote:
> > Well, it occurred to me that if a large result set were to be identified
> > before transport between a client and server, a significant amount of
> > bandwidth may be saved by using a moderate level of compression.
> > Especially with something like result sets, which I tend to believe may
> > lend it self well toward compression.
>
> I should have said compressing the HTTP protocol, not FTP.
>
> > This may be of value for users with low bandwidth connectivity to their
> > servers or where bandwidth may already be at a premium.
>
> But don't slow links do the compression themselves, like PPP over a
> modem?

Yes, but that's packet level compression. You'll never get even close to the
result you can achieve compressing the set as a whole.

Speaking of HTTP, it's fairly common for web servers (Apache has mod_gzip)
to gzip content before sending it to the client (which unzips it silently);
especially when dealing with somewhat static content (so it can be cached
zipped). This can provide great bandwidth savings.

I'm sceptical of the benefit such compressions would provide in this setting
though. We're dealing with sets that would have to be compressed every time
(no caching) which might be a bit expensive on a database server. Having it
as a default off option for psql migtht be nice, but I wonder if it's worth
the time, effort, and cpu cycles.


From: Paul Ramsey <pramsey(at)refractions(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Greg Copeland <greg(at)CopelandConsulting(dot)Net>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:08:11
Message-ID: 3C91032B.21478BF8@refractions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
>
> Greg Copeland wrote:
>
> Checking application/pgp-signature: FAILURE
> -- Start of PGP signed section.
> > Well, it occurred to me that if a large result set were to be identified
> > before transport between a client and server, a significant amount of
> > bandwidth may be saved by using a moderate level of compression.
> > Especially with something like result sets, which I tend to believe may
> > lend it self well toward compression.
> >
> > Unlike FTP which may be transferring (and often is) previously
> > compressed data, raw result sets being transfered between the server and
> > a remote client, IMOHO, would tend to compress rather well as I doubt
> > much of it would be true random data.
> >
>
> I should have said compressing the HTTP protocol, not FTP.
>
> > This may be of value for users with low bandwidth connectivity to their
> > servers or where bandwidth may already be at a premium.
>
> But don't slow links do the compression themselves, like PPP over a
> modem?

Yes, and not really. Modems have very very very small buffers, so the
compression is extremely ineffectual. Link-level compression can be
*highly* effective in making client/server communication snappy, since
faster processors are tending to push the speed bottleneck onto the
wire. We use HTTP Content-Encoding of gzip for our company and the
postgis.refractions.net site, and save about 60% on all the text content
on the wire. For highly redundant data (like result sets) the savings
would be even greater. I have nothing but good things to say about
client/server compression.

--
__
/
| Paul Ramsey
| Refractions Research
| Email: pramsey(at)refractions(dot)net
| Phone: (250) 885-0632
\_


From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Greg Copeland <greg(at)CopelandConsulting(dot)Net>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:14:31
Message-ID: 1016136871.3406.20.camel@jiro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-03-14 at 14:35, Bruce Momjian wrote:
> Greg Copeland wrote:
>
> Checking application/pgp-signature: FAILURE
> -- Start of PGP signed section.
> > Well, it occurred to me that if a large result set were to be identified
> > before transport between a client and server, a significant amount of
> > bandwidth may be saved by using a moderate level of compression.
> > Especially with something like result sets, which I tend to believe may
> > lend it self well toward compression.
> >
> > Unlike FTP which may be transferring (and often is) previously
> > compressed data, raw result sets being transfered between the server and
> > a remote client, IMOHO, would tend to compress rather well as I doubt
> > much of it would be true random data.
>
> I should have said compressing the HTTP protocol, not FTP.

Except that lots of people compress HTTP traffic (or rather should, if
they were smart). Bandwidth is much more expensive than CPU time, and
most browsers have built-in support for gzip-encoded data. Take a look
at mod_gzip or mod_deflate (2 Apache modules) for more info on this.

IMHO, compressing data would be valuable iff there are lots of people
with a low-bandwidth link between Postgres and their database clients.
In my experience, that is rarely the case. For example, people using
Postgres as a backend for a dynamically generated website usually have
their database on the same server (for a low-end site), or on a separate
server connected via 100mbit ethernet to a bunch of webservers. In this
situation, compressing the data between the database and the webservers
will just add more latency and increase the load on the database.

Perhaps I'm incorrect though -- are there lots of people using Postgres
with a slow link between the database server and the clients?

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC


From: "Mark Pritchard" <mark(at)tangent(dot)net(dot)au>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Greg Copeland" <greg(at)CopelandConsulting(dot)Net>
Cc: "PostgresSQL Hackers Mailing List" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:26:44
Message-ID: EGECIAPHKLJFDEJBGGOBMELLHPAA.mark@tangent.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

You can get some tremendous gains by compressing HTTP sessions - mod_gzip
for Apache does this very well.

I believe SlashDot saves in the order of 30% of their bandwidth by using
compression, as do sites like http://www.whitepages.com.au/ and
http://www.yellowpages.com.au/

The mod_gzip trick is effectively very similar to what Greg is proposing. Of
course, how often would you connect to your database over anything less than
a fast (100mbit+) LAN connection?

In any case the conversation regarding FE/BE protocol changes occurs
frequently, and this thread would certainly impact that protocol. Has any
thought ever been put into using an existing standard such as HTTP instead
of the current postgres proprietary protocol? There are a lot of advantages:

* You could leverage the existing client libraries (java.net.URL etc) to
make writing PG clients (JDBC/ODBC/custom) an absolute breeze.

* Results sets / server responses could be returned in XML.

* The protocol handles extensions well (X-* headers)

* Load balancing across a postgres cluster would be trivial with any number
of software/hardware http load balancers.

* The prepared statement work needs to hit the FE/BE protocol anyway...

If the project gurus thought this was worthwhile, I could certainly like to
have a crack at it.

Regards,

Mark

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Bruce Momjian
> Sent: Friday, 15 March 2002 6:36 AM
> To: Greg Copeland
> Cc: PostgresSQL Hackers Mailing List
> Subject: Re: [HACKERS] Client/Server compression?
>
>
> Greg Copeland wrote:
>
> Checking application/pgp-signature: FAILURE
> -- Start of PGP signed section.
> > Well, it occurred to me that if a large result set were to be identified
> > before transport between a client and server, a significant amount of
> > bandwidth may be saved by using a moderate level of compression.
> > Especially with something like result sets, which I tend to believe may
> > lend it self well toward compression.
> >
> > Unlike FTP which may be transferring (and often is) previously
> > compressed data, raw result sets being transfered between the server and
> > a remote client, IMOHO, would tend to compress rather well as I doubt
> > much of it would be true random data.
> >
>
> I should have said compressing the HTTP protocol, not FTP.
>
> > This may be of value for users with low bandwidth connectivity to their
> > servers or where bandwidth may already be at a premium.
>
> But don't slow links do the compression themselves, like PPP over a
> modem?
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Greg Copeland <greg(at)CopelandConsulting(dot)Net>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:29:19
Message-ID: 17329.1016137759@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>> This may be of value for users with low bandwidth connectivity to their
>> servers or where bandwidth may already be at a premium.

> But don't slow links do the compression themselves, like PPP over a
> modem?

Even if the link doesn't compress, shoving the feature into PG itself
isn't necessarily the answer. I'd suggest running such a connection
through an ssh tunnel, which would give you encryption as well as
compression.

regards, tom lane


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:39:43
Message-ID: 1016138383.27761.90.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-03-14 at 13:35, Bruce Momjian wrote:
> Greg Copeland wrote:
>
> Checking application/pgp-signature: FAILURE
> -- Start of PGP signed section.
> > Well, it occurred to me that if a large result set were to be identified
> > before transport between a client and server, a significant amount of
> > bandwidth may be saved by using a moderate level of compression.
> > Especially with something like result sets, which I tend to believe may
> > lend it self well toward compression.
> >
> > Unlike FTP which may be transferring (and often is) previously
> > compressed data, raw result sets being transfered between the server and
> > a remote client, IMOHO, would tend to compress rather well as I doubt
> > much of it would be true random data.
> >
>
> I should have said compressing the HTTP protocol, not FTP.
>
> > This may be of value for users with low bandwidth connectivity to their
> > servers or where bandwidth may already be at a premium.
>
> But don't slow links do the compression themselves, like PPP over a
> modem?

Yes and no. Modem compression doesn't understand the nature of the data
that is actually flowing through it. As a result, a modem is going to
speed an equal amount of time trying to compress the PPP/IP/NETBEUI
protocols as it does trying to compress the data contained within those
protocol envelopes. Furthermore, modems tend to have a very limited
amount of time to even attempt to compress, combined with the fact that
they have very limited buffer space, usually limits its ability to
provide effective compression. Because of these issues, it not uncommon
for a modem to actually yield a larger compressed block than was the
input.

I'd also like to point out that there are also other low speed
connections available which are in use which do not make use of modems
as well as modems which do not support compression (long haul modems for
example).

As for your specific example of HTTP versus FTP, I would also like to
point out that it is becoming more and more common for gzip'd data to be
transported within the HTTP protocol whereby each end is explicitly
aware of the compression taking place on the link with knowledge of what
to do with it.

Also, believe it or not, one of the common uses of SSH is to provide
session compression. It is not unheard of for people to disable the
encryption to simply use it for a compression tunnel which also provides
for modest session obscurantism.

Greg


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:43:50
Message-ID: 1016138631.27761.95.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-03-14 at 14:14, Neil Conway wrote:
> On Thu, 2002-03-14 at 14:35, Bruce Momjian wrote:
> > Greg Copeland wrote:
> >
> > Checking application/pgp-signature: FAILURE
> > -- Start of PGP signed section.
> > > Well, it occurred to me that if a large result set were to be identified
> > > before transport between a client and server, a significant amount of
> > > bandwidth may be saved by using a moderate level of compression.
> > > Especially with something like result sets, which I tend to believe may
> > > lend it self well toward compression.
> > >
> > > Unlike FTP which may be transferring (and often is) previously
> > > compressed data, raw result sets being transfered between the server and
> > > a remote client, IMOHO, would tend to compress rather well as I doubt
> > > much of it would be true random data.
> >
> > I should have said compressing the HTTP protocol, not FTP.
>
> Except that lots of people compress HTTP traffic (or rather should, if
> they were smart). Bandwidth is much more expensive than CPU time, and
> most browsers have built-in support for gzip-encoded data. Take a look
> at mod_gzip or mod_deflate (2 Apache modules) for more info on this.
>
> IMHO, compressing data would be valuable iff there are lots of people
> with a low-bandwidth link between Postgres and their database clients.
> In my experience, that is rarely the case. For example, people using
> Postgres as a backend for a dynamically generated website usually have
> their database on the same server (for a low-end site), or on a separate
> server connected via 100mbit ethernet to a bunch of webservers. In this
> situation, compressing the data between the database and the webservers
> will just add more latency and increase the load on the database.
>
> Perhaps I'm incorrect though -- are there lots of people using Postgres
> with a slow link between the database server and the clients?
>

What about remote support of these databases where a VPN may not be
available? In my past experience, this was very common as many
companies do not was to expose their database, even via a VPN to the out
side world, while allowing only modem access. Not to mention, road
warriors that may need to remotely support their databases may find
value here too. Would they not?

...I think I'm pretty well coming to the conclusion that it may be of
some value...even if only for a limited number of users.

Greg


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-14 20:52:31
Message-ID: 1016139152.31943.102.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-03-14 at 14:29, Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> >> This may be of value for users with low bandwidth connectivity to their
> >> servers or where bandwidth may already be at a premium.
>
> > But don't slow links do the compression themselves, like PPP over a
> > modem?
>
> Even if the link doesn't compress, shoving the feature into PG itself
> isn't necessarily the answer. I'd suggest running such a connection
> through an ssh tunnel, which would give you encryption as well as
> compression.
>
> regards, tom lane

Couldn't the same be said for SSL support?

I'd also like to point out that it's *possible* that this could also be
a speed boost under certain work loads where extra CPU is available as
less data would have to be transfered through the OS, networking layers,
and device drivers. Until zero copy transfers becomes common on all
platforms for all devices, I would think that it's certainly *possible*
that this *could* offer a possible improvement...well, perhaps a break
even at any rate...

Such claims, again, given specific workloads for compressed file systems
are not unheard off as less device I/O has to take place.

Greg


From: Kyle <kaf(at)nwlink(dot)com>
To: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-15 00:52:36
Message-ID: 15505.17876.355421.333413@doppelbock.patentinvestor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On the subject on client/server compression, does the server
decompress toast data before sending it to the client? Is so, why
(other than requiring modifications to the protocol)?

On the flip side, does/could the client toast insert/update data
before sending it to the server?

-Kyle


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Kyle <kaf(at)nwlink(dot)com>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-15 01:43:55
Message-ID: 200203150143.g2F1htm27170@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyle wrote:
> On the subject on client/server compression, does the server
> decompress toast data before sending it to the client? Is so, why
> (other than requiring modifications to the protocol)?
>
> On the flip side, does/could the client toast insert/update data
> before sending it to the server?

It has to decrypt it so the server functions can process it too. Hard
to avoid that. Of course, in some cases, it doesn't need to be
processed on the server, just passed, so it would have to be done
conditionally.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Arguile <arguile(at)lucentstudios(dot)com>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-15 18:47:20
Message-ID: 1016218040.24597.15.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-03-14 at 14:03, Arguile wrote:

[snip]

> I'm sceptical of the benefit such compressions would provide in this setting
> though. We're dealing with sets that would have to be compressed every time
> (no caching) which might be a bit expensive on a database server. Having it
> as a default off option for psql migtht be nice, but I wonder if it's worth
> the time, effort, and cpu cycles.
>

I dunno. That's a good question. For now, I'm making what tends to be
a safe assumption (opps...that word), that most database servers will be
I/O bound rather than CPU bound. *IF* that assumption hold true, it
sounds like it may make even more sense to implement this. I do know
that in the past, I've seen 90+% compression ratios on many databases
and 50% - 90% compression ratios on result sets using tunneled
compression schemes (which were compressing things other than datasets
which probably hurt overall compression ratios). Depending on the work
load and the available resources on a database system, it's possible
that latency could actually be reduced depending on where you measure
this. That is, do you measure latency as first packet back to remote or
last packet back to remote. If you use last packet, compression may
actually win.

My current thoughts are to allow for enabled/disabled compression and
variable compression settings (1-9) within a database configuration.
Worse case, it may be fun to implement and I'm thinking there may
actually be some surprises as an end result if it's done properly.

In looking at the communication code, it looks like only an 8k buffer is
used. I'm currently looking to bump this up to 32k as most OS's tend to
have a sweet throughput spot with buffer sizes between 32k and 64k.
Others, depending on the devices in use, like even bigger buffers.
Because of the fact that this may be a minor optimization, especially on
a heavily loaded server, we may want to consider making this a
configurable parameter.

Greg


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Kyle <kaf(at)nwlink(dot)com>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-15 19:04:38
Message-ID: 1016219080.24599.23.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-03-14 at 19:43, Bruce Momjian wrote:
> Kyle wrote:
> > On the subject on client/server compression, does the server
> > decompress toast data before sending it to the client? Is so, why
> > (other than requiring modifications to the protocol)?
> >
> > On the flip side, does/could the client toast insert/update data
> > before sending it to the server?
>
> It has to decrypt it so the server functions can process it too. Hard
> to avoid that. Of course, in some cases, it doesn't need to be
> processed on the server, just passed, so it would have to be done
> conditionally.
>

Along those lines, it occurred to me if the compressor somehow knew the
cardinality of the data rows involved with the result set being
returned, a compressor data dictionary (...think of it as a heads up on
patterns to be looking for) could be created using the unique
cardinality values which, I'm thinking, could dramatically improve the
level of compression for data being transmitted.

Just some food for thought. After all, these two seem to be somewhat
related as you wouldn't want the communication layer attempting to
recompress data which was natively compressed and needed to be
transparently transmitted.

Greg


From: Jan Wieck <janwieck(at)yahoo(dot)com>
To: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
Cc: Arguile <arguile(at)lucentstudios(dot)com>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-15 19:18:34
Message-ID: 200203151918.g2FJIYC06067@saturn.janwieck.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Copeland wrote:
> On Thu, 2002-03-14 at 14:03, Arguile wrote:
>
> [snip]
>
> > I'm sceptical of the benefit such compressions would provide in this setting
> > though. We're dealing with sets that would have to be compressed every time
> > (no caching) which might be a bit expensive on a database server. Having it
> > as a default off option for psql migtht be nice, but I wonder if it's worth
> > the time, effort, and cpu cycles.
> >
>
> I dunno. That's a good question. For now, I'm making what tends to be
> a safe assumption (opps...that word), that most database servers will be
> I/O bound rather than CPU bound. *IF* that assumption hold true, it

If you have too much CPU idle time you wasted money by
oversizing the machine. And as soon as you add SORT BY to
your queries, you'll see some CPU used.

I only make the assumption that whenever there is a database
server, there is an application server as well (or multiple
of them). Scenarios that require direct end-user connectivity
to the database server (alas Access->MSSQL) should NOT be
encouraged.

The db and app should be very close together, coupled with a
dedicated backbone net. No need for encryption, and if volume
is a problem, gigabit is the answer.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: Kyle <kaf(at)nwlink(dot)com>
To: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-16 01:44:09
Message-ID: 15506.41833.4054.850914@doppelbock.patentinvestor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Copeland wrote:
> [cut]
> My current thoughts are to allow for enabled/disabled compression and
> variable compression settings (1-9) within a database configuration.
> Worse case, it may be fun to implement and I'm thinking there may
> actually be some surprises as an end result if it's done properly.
>
> [cut]
>
> Greg

Wouldn't Tom's suggestion of riding on top of ssh would give similar
results? Anyway, it'd probably be a good proof of concept of whether
or not it's worth the effort. And that brings up the question: how
would you measure the benefit? I'd assume you'd get a good cut in
network traffic, but you'll take a hit in cpu time. What's an
acceptable tradeoff?

That's one reason I was thinking about the toast stuff. If the
backend could serve toast, you'd get an improvement in server to
client network traffic without the server spending cpu time on
compression since the data has previously compressed.

Let me know if this is feasible (or slap me if this is how things
already are): when the backend detoasts data, keep both copies in
memory. When it comes time to put data on the wire, instead of
putting the whole enchilada down give the client the compressed toast
instead. And yeah, I guess this would require a protocol change to
flag the compressed data. But it seems like a way to leverage work
already done.

-kf


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Kyle <kaf(at)nwlink(dot)com>
Cc: PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-16 04:09:20
Message-ID: 1016251761.24597.66.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2002-03-15 at 19:44, Kyle wrote:
[snip]

> Wouldn't Tom's suggestion of riding on top of ssh would give similar
> results? Anyway, it'd probably be a good proof of concept of whether
> or not it's worth the effort. And that brings up the question: how
> would you measure the benefit? I'd assume you'd get a good cut in
> network traffic, but you'll take a hit in cpu time. What's an
> acceptable tradeoff?

Good question. I've been trying to think of meaningful testing methods,
however, I can still think of reasons all day long where it's not an
issue of a "tradeoff". Simply put, if you have a low bandwidth
connection, as long as there are extra cycles available on the server,
who really cares...except for the guy at the end of the slow connection.

As for SSH, well, that should be rather obvious. It often is simply not
available. While SSH is nice, I can think of many situations this is a
win/win. At least in business settings...where I'm assuming the goal is
to get Postgres into. Also, along those lines, if SSH is the answer,
then surely the SSL support should be removed too...as SSH provides for
encryption too. Simply put, removing SSL support makes about as much
sense as asserting that SSH is the final compression solution.

Also, it keeps being stated that a tangible tradeoff between CPU and
bandwidth must be realized. This is, of course, a false assumption.
Simply put, if you need bandwidth, you need bandwidth. Its need is not
a function of CPU, rather, it's a lack of bandwidth. Having said that,
I of course would still like to have something meaningful which reveals
the impact on CPU and bandwidth.

I'm talking about something that would be optional. So, what's the cost
of having a little extra optional code in place? The only issue, best I
can tell, is can it be implemented in a backward compatible manner.

>
> That's one reason I was thinking about the toast stuff. If the
> backend could serve toast, you'd get an improvement in server to
> client network traffic without the server spending cpu time on
> compression since the data has previously compressed.
>
> Let me know if this is feasible (or slap me if this is how things
> already are): when the backend detoasts data, keep both copies in
> memory. When it comes time to put data on the wire, instead of
> putting the whole enchilada down give the client the compressed toast
> instead. And yeah, I guess this would require a protocol change to
> flag the compressed data. But it seems like a way to leverage work
> already done.
>

I agree with that, however, I'm guessing that implementation would
require a significantly larger effort than what I'm suggesting...then
again, probably because I'm not aware of all the code yet. Pretty much,
the basic implementation could be in place by the end of this weekend
with only a couple hours worth of work...and then, mostly because I
still don't know lots of the code. The changes you are talking about is
going to require not only protocol changes but changes at several layers
within the engine.

Of course, something else to keep in mind is that using the TOAST
solution requires that TOAST already be in use. What I'm suggesting
benefits (size wise) all types of data being sent back to a client.

Greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
Cc: Kyle <kaf(at)nwlink(dot)com>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-16 20:38:01
Message-ID: 15280.1016311081@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Copeland <greg(at)CopelandConsulting(dot)Net> writes:
> I'm talking about something that would be optional. So, what's the cost
> of having a little extra optional code in place?

It costs just as much in maintenance effort even if hardly anyone uses
it. Actually, probably it costs *more*, since seldom-used features
tend to break without being noticed until late beta or post-release,
when it's a lot more painful to fix 'em.

FWIW, I was not in favor of the SSL addition either, since (just as you
say) it does nothing that couldn't be done with an SSH tunnel. If I had
sole control of this project I would rip out the SSL code, in preference
to fixing its many problems. For your entertainment I will attach the
section of my private TODO list that deals with SSL problems, and you
may ask yourself whether you'd rather see that development time expended
on fixing a feature that really adds zero functionality, or on fixing
things that are part of Postgres' core functionality. (Also note that
this list covers *only* problems in libpq's SSL support. Multiply this
by jdbc, odbc, etc to get an idea of what we'd be buying into to support
our own encryption handling across-the-board.)

The short answer: we should be standing on the shoulders of the SSH
people, not reimplementing (probably badly) what they do well.

regards, tom lane

SSL support problems
--------------------

Fix USE_SSL code in fe-connect: move to CONNECTION_MADE case, always
do initial connect() in nonblock mode. Per my msg 10/26/01 21:43

Even better would be to be able to do the SSL negotiation in nonblock mode.
Seems like it should be possible from looking at openssl man pages:
SSL_connect is documented to work on a nonblock socket. Need to pay attention
to SSL_WANT_READ vs WANT_WRITE return codes, however, to determine how to set
polling flag.

Error handling for SSL connections is a joke in general, not just lack
of attention to WANT READ/WRITE.

Nonblock socket operations are somewhat broken by SSL because of assumption
that library will only block waiting for read-ready. Under SSL it could
theoretically block waiting for write-ready, though that should be a
relatively small problem normally. Possibly add some API to distinguish which
case applies? Not clear that it's needed, since worst possible penalty is a
busy-wait loop, and it doesn't seem probable that we could ever so block.
(Sure? COPY IN could well block that way ... of course COPY IN hardly works
in nonblock mode anyway ...)

Fix docs that probably say SSL-enabled lib doesn't support nonblock.
Note extreme sloppiness of SSL docs in general, eg the PQREQUIRESSL env var
is not docd...

Ought to add API to set allow_ssl_try = FALSE to suppress initial SSL try in
an SSL-enabled lib. (Perhaps requiressl = -1? Probably a separate var is
better.)

Also fix connectDB so that params are accepted but ignored if no SSL support
--- or perhaps better, should requiressl=1 fail in that case?

Connection restart after protocol error is a tad ugly: closing/reopening sock
is bad for callers, cf note at end of PQconnectPoll, if the sock # should
happen to have changed. Fortunately that's just a legacy-server case
(pre-7.0)


From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyle <kaf(at)nwlink(dot)com>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-16 21:17:41
Message-ID: 1016313462.24599.149.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Some questions for you at the end of this Tom...which I'd been thinking
about...and you touched on...hey, you did tell me to ask! :)

On Sat, 2002-03-16 at 14:38, Tom Lane wrote:
> Greg Copeland <greg(at)CopelandConsulting(dot)Net> writes:
> > I'm talking about something that would be optional. So, what's the cost
> > of having a little extra optional code in place?
>
> It costs just as much in maintenance effort even if hardly anyone uses
> it. Actually, probably it costs *more*, since seldom-used features
> tend to break without being noticed until late beta or post-release,
> when it's a lot more painful to fix 'em.

That wasn't really what I was asking...

>
> FWIW, I was not in favor of the SSL addition either, since (just as you
> say) it does nothing that couldn't be done with an SSH tunnel. If I had
> sole control of this project I would rip out the SSL code, in preference

Except we seemingly don't see eye to eye on it. SSH just is not very
useful in many situations simply because it may not always be
available. Now, bring Win32 platforms into the mix and SSH really isn't
an option at all...not without bringing extra boxes to the mix. Ack!

I guess I don't really understand why you seem to feel that items such
as compression and encryption don't belong...compression I can sorta
see, however, without supporting evidence one way or another, I guess I
don't understand resistance without knowing the whole picture. I would
certainly hope the jury would be out on this until some facts to paint a
picture are at least available. Encryption, on the other hard, clearly
DOES belong in the database (and not just I think so) and should not be
thrust onto other applications, such as SSH, when it may not be
available or politically risky to use. That of course, doesn't even
address the issues of where it may be unpractical for some users, types
of applications or platforms. SSH is a fine application which addresses
many issues, however, it certainly is not an end-all do all
encryption/compression solution. Does that mean SSL should be the
native encryption solution? I'm not sure I have an answer to that,
however, encryption should be natively available IMOHO.

As for the laundry list of items...those are simply issues that should
of been worked out prior to it being merged into the code..it migrated
to being a maintenance issue. That's not really applicable to most
situations if an implementation is well coded and complete prior to it
being merged into the code base. Lastly, stating that a maintenance
cost of one implementation is a shared cost for all unrelated sections
of code is naive at best. Generally speaking, the level of maintenance
is inversely proportional to the quality of a specific design and
implementation.

At this point in time, I'm fairly sure I'm going to code up a
compression layer to play with. If it never gets accepted, I'm pretty
sure I'm okay with that. I guess if it's truly worthy, it can always
reside in the contributed section. On the other hand, if value can be
found in such an implementation and all things being equal, I guess I
wouldn't understand why it wouldn't be accepted.

================================
questions
================================

If I implement compression between the BE and the FE libpq, does that
mean that it needs to be added to the other interfaces as well? Do all
interfaces (JDBC, ODBC, etc) receive the same BE messages?

Is there any documentation which covers the current protocol
implementation? Specifically, I'm interested in the negotiation
section...I have been read code already.

Have you never had to support a database via modem? I have and I can
tell you, compression was God-sent. You do realize that this situation
if more common that you seem to think it is? Maybe not for Postgres
databases now...but for databases in general.

Greg


From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Copeland <greg(at)CopelandConsulting(dot)Net>
Cc: Kyle <kaf(at)nwlink(dot)com>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-17 11:47:33
Message-ID: 5.1.0.14.1.20020317194454.02d3ed90@192.228.128.13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

You can also use stunnel for SSL. Preferable to having SSL in postgresql
I'd think.

Cheerio,
Link.

At 03:38 PM 3/16/02 -0500, Tom Lane wrote:

>FWIW, I was not in favor of the SSL addition either, since (just as you
>say) it does nothing that couldn't be done with an SSH tunnel. If I had


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Copeland <greg(at)copelandconsulting(dot)net>
Cc: Kyle <kaf(at)nwlink(dot)com>, PostgresSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Client/Server compression?
Date: 2002-03-17 17:47:39
Message-ID: 20102.1016387259@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Copeland <greg(at)copelandconsulting(dot)net> writes:
> Except we seemingly don't see eye to eye on it. SSH just is not very
> useful in many situations simply because it may not always be
> available. Now, bring Win32 platforms into the mix and SSH really isn't
> an option at all...not without bringing extra boxes to the mix. Ack!

Not so. See http://www.openssh.org/windows.html.

> If I implement compression between the BE and the FE libpq, does that
> mean that it needs to be added to the other interfaces as well?

Yes.

> Is there any documentation which covers the current protocol
> implementation?

Yes. See the protocol chapter in the developer's guide.

> Have you never had to support a database via modem?

Yes. ssh has always worked fine for me ;-)

> You do realize that this situation
> if more common that you seem to think it is?

I was not the person claiming that low-bandwidth situations are of no
interest. I was the person claiming that the Postgres project should
not expend effort on coding and maintaining our own solutions, when
there are perfectly good solutions available that we can sit on top of.

Yes, a solution integrated into Postgres would be easier to use and
perhaps a bit more efficient --- but do the incremental advantages of
an integrated solution justify the incremental cost? I don't think so.
The advantages seem small to me, and the long-term costs not so small.

regards, tom lane