Quick Links

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject:	Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date:	2014-01-10 17:14:02
Message-ID:	1389374042.1310.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Florian Pflug <fgp(at)phlo(dot)org> writes:
>>> For float 4 and float8, wasn't the consensus that the potential
>>> lossy-ness of addition makes this impossible anyway, even
>>> without the NaN issue? But...
>>
>> Well, that was my opinion, I'm not sure if it was consensus ;-).
>
> I'd say your example showing how it could produce completely
> bogus results was pretty convincing...

Aggregates on approximate (floating-point) numbers are not nearly
as consistent as many people probably assume. Picture for a minute
a table where a column contains positive floating point numbers
happen to be located in the heap in increasing order, perhaps as
the result of a CLUSTER on an index on the column. SELECT
sum(colname) FROM tablename; would tend to give the most accurate
answer possible when a sequence scan was run -- unless there
happened to be a seqscan already half-way through the heap. Then
the result would be different. So the exact same query against the
exact same data, with no intervening modifications or maintenance
activity could give one answer most of the time, and give various
other answers depending on concurrent SELECT queries.

Given that this is already the case with aggregates on floating
point approximate numbers, why should we rule out an optimization
which only makes rounding errors more likely to be visible? The
real issue here is that if you are using an approximate data type
and expecting exact answers, you will have problems.

That's not to say that approximations are useless. If you
represent the circumference of the earth with a double precision
number you're dealing with an expected rounding error of about a
foot. That's close enough for many purposes. The mistake is
assuming that it will be exact or that rounding errors cannot
accumulate. In situations where SQL does not promise particular
ordering of operations, it should not be assumed; so any
expectations of a specific or repeatable result from a sum or
average of approximate numbers is misplaced.

>> But NaN is an orthogonal problem I think.

Agreed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: [PATCH] Negative Transition Aggregate Functions (WIP) at 2014-01-09 17:44:18 from Florian Pflug

Responses

Re: [PATCH] Negative Transition Aggregate Functions (WIP) at 2014-01-10 17:30:14 from Tom Lane
Re: [PATCH] Negative Transition Aggregate Functions (WIP) at 2014-01-10 18:08:24 from Florian Pflug

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hannu Krosing	2014-01-10 17:15:10	Re: Standalone synchronous master
Previous Message	Robert Haas	2014-01-10 17:07:20	Re: Add CREATE support to event triggers