From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Andy Colson <andy(at)squeakycode(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: texteq/byteaeq: avoid detoast [REVIEW] |
Date: | 2011-01-17 15:28:27 |
Message-ID: | 20110117152827.GB19587@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 17, 2011 at 11:05:09AM +0100, Magnus Hagander wrote:
> On Mon, Jan 17, 2011 at 09:13, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> > 2011/1/17 KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> >> Are you talking about an idea to apply toast id as an alternative key?
> >
> > No, probably. I'm just talking about whether "diff -q A.txt B.txt" and
> > "diff -q A.gz ?B.gz" always returns the same result or not.
Interesting.
> > ... I found it depends on version of gzip. So, if we use such logic,
> > we cannot improve toast compression logic because the data is migrated
> > by pg_upgrade.
>
> Yeah, that might be a bad tradeoff.
>
> I wonder if we can trust the *equality* test, but not the inequality?
> E.g. if compressed(A) == compressed(B) we know they're the same, but
> if compressed(A) != compressed(B) we don't know they're not they still
> might be.
Exactly.
> I guess with two different versions or even completely different
> algorithms we could end up with exactly the same compressed value for
> different plaintexts (it's not a cryptographic hash after all), so
> that's probably not an acceptable comparison either.
It's safe to assume that will never happen. If compressed(A) == compressed(B)
when A != B, we would have a lossy compression algorithm.
As you say, though, _inequality_ implies nothing for an arbitrary decompressor.
One can trivially construct many inputs to the zlib decompressor that yield the
same output. "gzip -1" ... "gzip -9" do this, for example. So the main win
here would come if we tightly controlled the compressor, such that we could
infer something from compressed(A) != compressed(B). That would be an
intriguing path to explore.
nm
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-01-17 15:31:59 | Re: Replication logging |
Previous Message | Simon Riggs | 2011-01-17 15:27:37 | Re: pg_basebackup for streaming base backups |