Re: Feedback about Drupal SQL debugging

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Jean-Michel Pouré <jm(at)poure(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feedback about Drupal SQL debugging
Date: 2009-08-21 23:03:40
Message-ID: 407d949e0908211603i53553592m5923c7fa18ade891@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/8/21 Jean-Michel Pouré <jm(at)poure(dot)com>:
> PostgreSQL requires all non-aggregated fields to be present in the GROUP
> BY clause (I fixed 10 such issues in Drupal code).
> http://drupal.org/node/555530
>
> Why can't PostgreSQL add the required field automatically? Could this be
> added to PostgreSQL to-do-list?

This is a more complex (and more interesting) topic than what your
blog discusses.

Firstly understand what MySQL is *actually* doing:

select a,b,c from tab group by a

Only sorts and groups by "a" as instructed. The b columns and c
columns are not included in the grouping. So if you have data like:

a,b,c
1,1,1
1,2,2
2,1,1
2,2,2

You'll get two groups because there are only two values of "a". One
group will have a=1 and one group will have a=2. Which value you get
for b and c will be completely arbitrary and unpredictable.

If Postgres added b,c to the GROUP BY it would produce four groups,
because there four different values of <a,b,c>. You *can* get
something similar to MySQL's behaviour using DISTINCT ON:

select distinct on (a) a,b,c from a ORDER BY a,b,c

But Postgres insists you have an ORDER BY which has to agree with the
DISTINCT ON columns and provide some extra column(s) to determine
which values of b,c are chosen.

If Postgres changed on this front it would be to support the SQL
Standard concept of "functional dependency". In cases where some
columns are guaranteed to be unique you can leave them out of the
GROUP BY but still use them in the select list. This isn't MySQL's
behaviour of just allowing you to leave them out and hope that it
doesn't matter which row's values are used. The database has to
actually determine that it really doesn't matter. Typically that would
be because you've grouped by a set of columns which form the key of a
unique constraint, in which case every other column from that table
would also necessarily be the same since they would all come from the
same row of that table.

--
greg
http://mit.edu/~gsstark/resume.pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2009-08-21 23:21:47 Re: Feedback about Drupal SQL debugging
Previous Message Andrew Dunstan 2009-08-21 23:01:36 Re: Feedback about Drupal SQL debugging