Re: Join issue on a maximum value

Lists: pgsql-sql
From: Heflin <hhogan(at)tampabay(dot)rr(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Join issue on a maximum value
Date: 2004-04-21 18:29:34
Message-ID: 4086BD8E.90900@tampabay.rr.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

OK, it's been a while since I've had to do anything remotely complex in
SQL, so this may just be a pure brain block on my part.

I have 2 tables, auction and image, defined like this:

Table "public.auction"
Column | Type |
Modifiers
-----------------+---------+-----------------------------------------------------------------
auction_id | integer | not null default
nextval('public.auction_auction_id_seq'::text)
auction_descrip | text |
auction_owner | text |
Indexes:
"auction_pkey" primary key, btree (auction_id)

Table "public.image"
Column | Type |
Modifiers
-------------+---------+-------------------------------------------------------------
image_id | integer | not null default
nextval('public.image_image_id_seq'::text)
auction_id | integer | not null
image_descr | text |
Indexes:
"image_pkey" primary key, btree (image_id)
Foreign-key constraints:
"$1" FOREIGN KEY (auction_id) REFERENCES auction(auction_id) ON
UPDATE RESTRICT ON DELETE RESTRICT

Current data in the tables:

play=# select * from auction
play-# ;
auction_id | auction_descrip | auction_owner
------------+-----------------+---------------
1 | Mabel Auction 1 | Mabel
2 | Mabel Auction 2 | Mabel
3 | Mabel Auction 3 | Mabel
4 | Fred Auction 1 | Fred
5 | Fred Auction 2 | Fred

play=# select * from image;
image_id | auction_id | image_descr
----------+------------+-------------
1 | 1 | image 1
2 | 1 | image 2
3 | 2 | image 3
4 | 3 | image 4
5 | 3 | image 5
6 | 4 | image 7
7 | 3 | image 8

So a basic JOIN gets this:

SELECT auction.auction_id, image.image_id, image.image_descr
FROM auction JOIN image ON auction.auction_id = image.auction_id
WHERE auction.auction_owner = 'Mabel';

auction_id | image_id | image_descr
------------+----------+-------------
1 | 1 | image 1
1 | 2 | image 2
2 | 3 | image 3
3 | 4 | image 4
3 | 5 | image 5
3 | 7 | image 8
(6 rows)

Now the problem: I can't seem to remember how to get only the max value
for the image_id for each auction_id so that the result set would be:

auction_id | image_id | image_descr
------------+----------+-------------
1 | 2 | image 2
2 | 3 | image 3
3 | 7 | image 8

Playing with the max() aggregate seems to be the correct path, but for
the life of me I can't seem to get the syntax to the point that it
produces what I need. Any help would be greatly appreciated!

Thanks,

-Heflin


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Heflin <hhogan(at)tampabay(dot)rr(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Join issue on a maximum value
Date: 2004-04-21 18:51:22
Message-ID: 20040421185122.GA31195@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Wed, Apr 21, 2004 at 14:29:34 -0400,
Heflin <hhogan(at)tampabay(dot)rr(dot)com> wrote:
>
> So a basic JOIN gets this:
>
> SELECT auction.auction_id, image.image_id, image.image_descr
> FROM auction JOIN image ON auction.auction_id = image.auction_id
> WHERE auction.auction_owner = 'Mabel';
>
> Now the problem: I can't seem to remember how to get only the max value
> for the image_id for each auction_id so that the result set would be:

The postgres specific way of doing this is:
SELECT DISTINCT ON (auction.auction_id)
auction.auction_id, image.image_id, image.image_descr
FROM auction JOIN image ON auction.auction_id = image.auction_id
WHERE auction.auction_owner = 'Mabel'
ORDER BY auction.auction_id, image.image_id DESC
;

The more standard way to do it would be joining auction and image
with a group by and max to get the highest image_id and then joining
that result to image again to get the corresponding description.


From: Jeremy Semeiks <jrs(at)denny(dot)farviolet(dot)com>
To: Heflin <hhogan(at)tampabay(dot)rr(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Join issue on a maximum value
Date: 2004-04-21 19:12:48
Message-ID: 20040421191248.GA18969@64.81.242.180
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Wed, Apr 21, 2004 at 02:29:34PM -0400, Heflin wrote:
> OK, it's been a while since I've had to do anything remotely complex in
> SQL, so this may just be a pure brain block on my part.
>
...
>
> So a basic JOIN gets this:
>
> SELECT auction.auction_id, image.image_id, image.image_descr
> FROM auction JOIN image ON auction.auction_id = image.auction_id
> WHERE auction.auction_owner = 'Mabel';
>
> auction_id | image_id | image_descr
> ------------+----------+-------------
> 1 | 1 | image 1
> 1 | 2 | image 2
> 2 | 3 | image 3
> 3 | 4 | image 4
> 3 | 5 | image 5
> 3 | 7 | image 8
> (6 rows)
>
> Now the problem: I can't seem to remember how to get only the max value
> for the image_id for each auction_id so that the result set would be:
>
> auction_id | image_id | image_descr
> ------------+----------+-------------
> 1 | 2 | image 2
> 2 | 3 | image 3
> 3 | 7 | image 8
>
> Playing with the max() aggregate seems to be the correct path, but for
> the life of me I can't seem to get the syntax to the point that it
> produces what I need. Any help would be greatly appreciated!

A simple way to write this is to use DISTINCT ON combined with ORDER
BY. In this case, these clauses can substitute for the MAX aggregate:

select distinct on (auction.auction_id) auction.auction_id, image.image_id,
image.image_descr
from auction join image using (auction_id)
where auction.auction_owner = 'Mabel'
order by auction.auction_id, image.image_id desc;

(I haven't tested this.)

I think you need a subselect in there if you want to use the MAX
aggregate.

- Jeremy


From: Edmund Bacon <ebacon(at)onesystem(dot)com>
To: Heflin <hhogan(at)tampabay(dot)rr(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Join issue on a maximum value
Date: 2004-04-21 19:21:54
Message-ID: 4086C9D2.9030707@onesystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Two choices that work:

Either add another JOIN in which retrieves the MAX(image_id) for each
auction:

SELECT auction.auction_id, image.image_id, image.image_descr
FROM auction
JOIN image USING(auction_id)
JOIN ( SELECT auction_id, MAX(image_id) AS image_id
FROM image
GROUP BY auction_id) max_aid USING (image_id)
WHERE owner = 'Mabel'
ORDER by auction.auction_id;

OR use a sub-select:

SELECT auction.auction_id, image_id, image.image_descr
FROM auction
JOIN image USING (auction_id)
WHERE image_id = ( SELECT max(image_id)
FROM image
WHERE auction_id = auction.auction_id)
AND image_owner = 'Mabel';

Test both with your data - My experience is that the sub-select runs
slower than throwing in the extra join.

Heflin wrote:

> OK, it's been a while since I've had to do anything remotely complex
> in SQL, so this may just be a pure brain block on my part.
>
> I have 2 tables, auction and image, defined like this:
>
> Table "public.auction"
> Column | Type |
> Modifiers
> -----------------+---------+-----------------------------------------------------------------
>
> auction_id | integer | not null default
> nextval('public.auction_auction_id_seq'::text)
> auction_descrip | text |
> auction_owner | text |
> Indexes:
> "auction_pkey" primary key, btree (auction_id)
>
>
> Table "public.image"
> Column | Type |
> Modifiers
> -------------+---------+-------------------------------------------------------------
>
> image_id | integer | not null default
> nextval('public.image_image_id_seq'::text)
> auction_id | integer | not null
> image_descr | text |
> Indexes:
> "image_pkey" primary key, btree (image_id)
> Foreign-key constraints:
> "$1" FOREIGN KEY (auction_id) REFERENCES auction(auction_id) ON
> UPDATE RESTRICT ON DELETE RESTRICT
>
>
> Current data in the tables:
>
> play=# select * from auction
> play-# ;
> auction_id | auction_descrip | auction_owner
> ------------+-----------------+---------------
> 1 | Mabel Auction 1 | Mabel
> 2 | Mabel Auction 2 | Mabel
> 3 | Mabel Auction 3 | Mabel
> 4 | Fred Auction 1 | Fred
> 5 | Fred Auction 2 | Fred
>
>
> play=# select * from image;
> image_id | auction_id | image_descr
> ----------+------------+-------------
> 1 | 1 | image 1
> 2 | 1 | image 2
> 3 | 2 | image 3
> 4 | 3 | image 4
> 5 | 3 | image 5
> 6 | 4 | image 7
> 7 | 3 | image 8
>
> So a basic JOIN gets this:
>
> SELECT auction.auction_id, image.image_id, image.image_descr
> FROM auction JOIN image ON auction.auction_id = image.auction_id
> WHERE auction.auction_owner = 'Mabel';
>
> auction_id | image_id | image_descr
> ------------+----------+-------------
> 1 | 1 | image 1
> 1 | 2 | image 2
> 2 | 3 | image 3
> 3 | 4 | image 4
> 3 | 5 | image 5
> 3 | 7 | image 8
> (6 rows)
>
> Now the problem: I can't seem to remember how to get only the max
> value for the image_id for each auction_id so that the result set
> would be:
>
> auction_id | image_id | image_descr
> ------------+----------+-------------
> 1 | 2 | image 2
> 2 | 3 | image 3
> 3 | 7 | image 8
>
> Playing with the max() aggregate seems to be the correct path, but for
> the life of me I can't seem to get the syntax to the point that it
> produces what I need. Any help would be greatly appreciated!
>
> Thanks,
>
> -Heflin
>
>
>

--
Edmund Bacon <ebacon(at)onesystem(dot)com>


From: Heflin <hhogan(at)tampabay(dot)rr(dot)com>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Join issue on a maximum value
Date: 2004-04-21 20:28:10
Message-ID: 4086D95A.5000609@tampabay.rr.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Bruno Wolff III wrote:

>On Wed, Apr 21, 2004 at 14:29:34 -0400,
> Heflin <hhogan(at)tampabay(dot)rr(dot)com> wrote:
>
>
>>So a basic JOIN gets this:
>>
>>SELECT auction.auction_id, image.image_id, image.image_descr
>>FROM auction JOIN image ON auction.auction_id = image.auction_id
>>WHERE auction.auction_owner = 'Mabel';
>>
>>Now the problem: I can't seem to remember how to get only the max value
>>for the image_id for each auction_id so that the result set would be:
>>
>>
>
>The postgres specific way of doing this is:
>SELECT DISTINCT ON (auction.auction_id)
> auction.auction_id, image.image_id, image.image_descr
> FROM auction JOIN image ON auction.auction_id = image.auction_id
> WHERE auction.auction_owner = 'Mabel'
> ORDER BY auction.auction_id, image.image_id DESC
>;
>
>The more standard way to do it would be joining auction and image
>with a group by and max to get the highest image_id and then joining
>that result to image again to get the corresponding description.
>
>
>
Thanks!

I was actually trying to do it the more standard way, but I've been
bungling up the syntax. I'm going to play with that some more, since it
might be useful elsewhere.

The thing that disturbs me about your syntax is that I don't really see
an assurance that I'll get the correct image_id. Any chance you can
tell me why this works?

Thanks again,

-Heflin


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Heflin <hhogan(at)tampabay(dot)rr(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Join issue on a maximum value
Date: 2004-04-22 01:51:45
Message-ID: 20040422015145.GB2160@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Wed, Apr 21, 2004 at 16:28:10 -0400,
Heflin <hhogan(at)tampabay(dot)rr(dot)com> wrote:
> >
> >The postgres specific way of doing this is:
> >SELECT DISTINCT ON (auction.auction_id)
> > auction.auction_id, image.image_id, image.image_descr
> > FROM auction JOIN image ON auction.auction_id = image.auction_id
> > WHERE auction.auction_owner = 'Mabel'
> > ORDER BY auction.auction_id, image.image_id DESC
> >;
>
> The thing that disturbs me about your syntax is that I don't really see
> an assurance that I'll get the correct image_id. Any chance you can
> tell me why this works?

The postgres semantic is that when dinstinct on is combined with order by
the first distinct row in the order defined by the order by is the one
returned. This is described in the documentation and there is an example
of a query taking advantage of this.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heflin <hhogan(at)tampabay(dot)rr(dot)com>
Cc: Bruno Wolff III <bruno(at)wolff(dot)to>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Join issue on a maximum value
Date: 2004-04-22 03:30:58
Message-ID: 18805.1082604658@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Heflin <hhogan(at)tampabay(dot)rr(dot)com> writes:
> Bruno Wolff III wrote:
>> The postgres specific way of doing this is:
>> SELECT DISTINCT ON (auction.auction_id)
>> auction.auction_id, image.image_id, image.image_descr
>> FROM auction JOIN image ON auction.auction_id = image.auction_id
>> WHERE auction.auction_owner = 'Mabel'
>> ORDER BY auction.auction_id, image.image_id DESC

> The thing that disturbs me about your syntax is that I don't really see
> an assurance that I'll get the correct image_id. Any chance you can
> tell me why this works?

The ORDER BY DESC is what forces the max image_id to be selected. Read
the discussion of SELECT DISTINCT ON in the SELECT reference page; the
"weather report" example may be illuminating.

regards, tom lane