Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-10 17:14:02
Message-ID: 1389374042.1310.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Florian Pflug <fgp(at)phlo(dot)org> writes:
>>> For float 4 and float8, wasn't the consensus that the potential
>>> lossy-ness of addition makes this impossible anyway, even
>>> without the NaN issue? But...
>>
>> Well, that was my opinion, I'm not sure if it was consensus ;-).
>
> I'd say your example showing how it could produce completely
> bogus results was pretty convincing...

Aggregates on approximate (floating-point) numbers are not nearly
as consistent as many people probably assume.  Picture for a minute
a table where a column contains positive floating point numbers
happen to be located in the heap in increasing order, perhaps as
the result of a CLUSTER on an index on the column.  SELECT
sum(colname) FROM tablename; would tend to give the most accurate
answer possible when a sequence scan was run -- unless there
happened to be a seqscan already half-way through the heap.  Then
the result would be different.  So the exact same query against the
exact same data, with no intervening modifications or maintenance
activity could give one answer most of the time, and give various
other answers depending on concurrent SELECT queries.

Given that this is already the case with aggregates on floating
point approximate numbers, why should we rule out an optimization
which only makes rounding errors more likely to be visible?  The
real issue here is that if you are using an approximate data type
and expecting exact answers, you will have problems.

That's not to say that approximations are useless.  If you
represent the circumference of the earth with a double precision
number you're dealing with an expected rounding error of about a
foot.  That's close enough for many purposes.  The mistake is
assuming that it will be exact or that rounding errors cannot
accumulate.  In situations where SQL does not promise particular
ordering of operations, it should not be assumed; so any
expectations of a specific or repeatable result from a sum or
average of approximate numbers is misplaced.

>> But NaN is an orthogonal problem I think.

Agreed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2014-01-10 17:15:10 Re: Standalone synchronous master
Previous Message Robert Haas 2014-01-10 17:07:20 Re: Add CREATE support to event triggers