Re: difference in query performance due to the inclusion of a polygon geometry field

Lists: pgsql-general
From: ivan marchesini <ivan(dot)marchesini(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Mauro Rossi <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>
Subject: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 11:28:04
Message-ID: 1354274884.5599.26.camel@beedo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Dear users,
we are facing with a strange efficiency problem.

We have two tables:
1) the first largest (30000 rows) contains a timestamp field (named
"tempo_calcolo") and other fields.
2) the second contains only few rows (150), a PostGIS geometric POLYGON
field (named "the_geom") and other fields.

The query we want to execute is a join between the two table based on a
common field, filtered by the timestamp field and ordered by the same
field.

Including and excluding the PostGIS geometric POLYGON field (that is not
used neither in the joining or in the filtering) the execution times are
really different.

Without the geometric field it takes around 86 ms !!
With the geometric field it takes around 14000 ms !!

These are the EXPLAIN ANALYSE results obtained when we perform the query
without the geometric field:
____________________________
"Sort (cost=1217.29..1240.40 rows=9246 width=133) (actual
time=2.964..3.095 rows=1474 loops=1)"
" Sort Key: storico_zone_allertamento.tempo_calcolo"
" Sort Method: quicksort Memory: 440kB"
" InitPlan 1 (returns $0)"
" -> Result (cost=0.00..0.02 rows=1 width=0) (actual
time=0.022..0.022 rows=1 loops=1)"
" -> Hash Join (cost=6.01..608.21 rows=9246 width=133) (actual
time=0.143..1.916 rows=1474 loops=1)"
" Hash Cond: (storico_zone_allertamento.gid_fk =
zone_allertamento.gid)"
" -> Index Scan using
index_storico_zone_allertamento_tempo_calcolo on
storico_zone_allertamento (cost=0.00..475.06 rows=9246 width=112)
(actual time=0.042..0.440 rows=1474 loops=1)"
" Index Cond: (tempo_calcolo > $0)"
" -> Hash (cost=4.34..4.34 rows=134 width=25) (actual
time=0.090..0.090 rows=134 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" -> Seq Scan on zone_allertamento (cost=0.00..4.34
rows=134 width=25) (actual time=0.004..0.035 rows=134 loops=1)"
"Total runtime: 3.285 ms"
__________________________

These are the EXPLAIN ANALYSE results obtained when we perform the query
with the geometric field:
___________________________
"Sort (cost=776019.29..776042.40 rows=9246 width=196111) (actual
time=3.039..3.163 rows=1474 loops=1)"
" Sort Key: storico_zone_allertamento.tempo_calcolo"
" Sort Method: quicksort Memory: 567kB"
" InitPlan 1 (returns $0)"
" -> Result (cost=0.00..0.02 rows=1 width=0) (actual
time=0.009..0.009 rows=1 loops=1)"
" -> Hash Join (cost=6.01..608.21 rows=9246 width=196111) (actual
time=0.139..1.957 rows=1474 loops=1)"
" Hash Cond: (storico_zone_allertamento.gid_fk =
zone_allertamento.gid)"
" -> Index Scan using
index_storico_zone_allertamento_tempo_calcolo on
storico_zone_allertamento (cost=0.00..475.06 rows=9246 width=112)
(actual time=0.029..0.431 rows=1474 loops=1)"
" Index Cond: (tempo_calcolo > $0)"
" -> Hash (cost=4.34..4.34 rows=134 width=196003) (actual
time=0.091..0.091 rows=134 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 19kB"
" -> Seq Scan on zone_allertamento (cost=0.00..4.34
rows=134 width=196003) (actual time=0.006..0.038 rows=134 loops=1)"
"Total runtime: 3.355 ms"
__________________________

It seems that the difference is primary in the "width" and then in the
"cost" of the Sort.

A similar task performed on two similar tables (both larger than the
previous ones!!) but including and excluding a PostGIS geometric POINT
(!!) field instead of POLYGON one, doesn't result in significant query
execution time difference.

Do you have please any suggestion to improve our query efficiency??

Thanks in advance

Ivan and Mauro

--
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Perugia (Italy)
Socio fondatore GFOSS "Geospatial Free and Open Source Software" http://www.gfoss.it
e-mail: ivan(dot)marchesini(at)irpi(dot)cnr(dot)it
ivan(dot)marchesini(at)gmail(dot)com
fax (mailfax): +39 1782092534
jabber: geoivan73(at)jabber(dot)org
skype: geoivan73


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 11:41:24
Message-ID: 50B89B64.8050600@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 11/30/12 3:28 AM, ivan marchesini wrote:
> Without the geometric field it takes around 86 ms !!
> With the geometric field it takes around 14000 ms !!
>
> These are the EXPLAIN ANALYSE results obtained when we perform the query
> without the geometric field:
> ____________________________
> ...
> "Total runtime: 3.285 ms"
> __________________________
>
>
> These are the EXPLAIN ANALYSE results obtained when we perform the query
> with the geometric field:
> ___________________________
> ....
> "Total runtime: 3.355 ms"
> __________________________

your examples don't seem to match your description.

both are within 100 microseconds.

I will say, sorting, formatting, and sending 200K * 1400 rows is going
to take a lot longer than 133 bytes * 1400 rows, if I can believe the
output of the explain analyzes.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: <ivan(dot)marchesini(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Cc: "Mauro Rossi" <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 11:56:12
Message-ID: D960CB61B694CF459DCFB4B0128514C208C0CAA9@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

ivan marchesini wrote:
> we are facing with a strange efficiency problem.
>
> We have two tables:
> 1) the first largest (30000 rows) contains a timestamp field (named
> "tempo_calcolo") and other fields.
> 2) the second contains only few rows (150), a PostGIS geometric POLYGON
> field (named "the_geom") and other fields.
>
> The query we want to execute is a join between the two table based on a
> common field, filtered by the timestamp field and ordered by the same
> field.
>
> Including and excluding the PostGIS geometric POLYGON field (that is not
> used neither in the joining or in the filtering) the execution times are
> really different.
>
> Without the geometric field it takes around 86 ms !!
> With the geometric field it takes around 14000 ms !!
>
> These are the EXPLAIN ANALYSE results obtained when we perform the query
> without the geometric field:
> ____________________________
> "Sort (cost=1217.29..1240.40 rows=9246 width=133) (actual
> time=2.964..3.095 rows=1474 loops=1)"
> " Sort Key: storico_zone_allertamento.tempo_calcolo"
> " Sort Method: quicksort Memory: 440kB"
> " InitPlan 1 (returns $0)"
> " -> Result (cost=0.00..0.02 rows=1 width=0) (actual
> time=0.022..0.022 rows=1 loops=1)"
> " -> Hash Join (cost=6.01..608.21 rows=9246 width=133) (actual
> time=0.143..1.916 rows=1474 loops=1)"
> " Hash Cond: (storico_zone_allertamento.gid_fk =
> zone_allertamento.gid)"
> " -> Index Scan using
> index_storico_zone_allertamento_tempo_calcolo on
> storico_zone_allertamento (cost=0.00..475.06 rows=9246 width=112)
> (actual time=0.042..0.440 rows=1474 loops=1)"
> " Index Cond: (tempo_calcolo > $0)"
> " -> Hash (cost=4.34..4.34 rows=134 width=25) (actual
> time=0.090..0.090 rows=134 loops=1)"
> " Buckets: 1024 Batches: 1 Memory Usage: 8kB"
> " -> Seq Scan on zone_allertamento (cost=0.00..4.34
> rows=134 width=25) (actual time=0.004..0.035 rows=134 loops=1)"
> "Total runtime: 3.285 ms"
> __________________________
>
>
> These are the EXPLAIN ANALYSE results obtained when we perform the query
> with the geometric field:
> ___________________________
> "Sort (cost=776019.29..776042.40 rows=9246 width=196111) (actual
> time=3.039..3.163 rows=1474 loops=1)"
> " Sort Key: storico_zone_allertamento.tempo_calcolo"
> " Sort Method: quicksort Memory: 567kB"
> " InitPlan 1 (returns $0)"
> " -> Result (cost=0.00..0.02 rows=1 width=0) (actual
> time=0.009..0.009 rows=1 loops=1)"
> " -> Hash Join (cost=6.01..608.21 rows=9246 width=196111) (actual
> time=0.139..1.957 rows=1474 loops=1)"
> " Hash Cond: (storico_zone_allertamento.gid_fk =
> zone_allertamento.gid)"
> " -> Index Scan using
> index_storico_zone_allertamento_tempo_calcolo on
> storico_zone_allertamento (cost=0.00..475.06 rows=9246 width=112)
> (actual time=0.029..0.431 rows=1474 loops=1)"
> " Index Cond: (tempo_calcolo > $0)"
> " -> Hash (cost=4.34..4.34 rows=134 width=196003) (actual
> time=0.091..0.091 rows=134 loops=1)"
> " Buckets: 1024 Batches: 1 Memory Usage: 19kB"
> " -> Seq Scan on zone_allertamento (cost=0.00..4.34
> rows=134 width=196003) (actual time=0.006..0.038 rows=134 loops=1)"
> "Total runtime: 3.355 ms"
> __________________________
>
>
> It seems that the difference is primary in the "width" and then in the
> "cost" of the Sort.
>
> A similar task performed on two similar tables (both larger than the
> previous ones!!) but including and excluding a PostGIS geometric POINT
> (!!) field instead of POLYGON one, doesn't result in significant query
> execution time difference.
>
> Do you have please any suggestion to improve our query efficiency??

That's strange.

The plans that you posted suggest that query execution time is
almost identical in both cases (3.285 vs. 3.355 milliseconds).

How do you measure the execution time?

If it is a measurement on the client side, I'd suppose that
the time is consumed by the transfer of the data to the client
or the processing on the client.

Yours,
Laurenz Albe


From: ivan marchesini <ivan(dot)marchesini(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-general(at)postgresql(dot)org, Mauro Rossi <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 13:25:24
Message-ID: 1354281924.6704.86.camel@beedo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi

thanks for all your answers.

Concerning the problem of the same "running time" I'm really surprised
but I can confirm that the EXPLAIN ANALIZE time is different from the
query execution time.
probably it happen something like this:
http://bytes.com/topic/postgresql/answers/422841-explain-analyze-total-runtime-walltime
But we have tried to remove the fk constraints without any improvement

Concerning the client problem, e have tried to execute the query
directly on the server using psql.
The difference is still in the order of seconds: around 10 ms VS 5000 ms

many thanks...

other suggestions?

ivan and mauro

--
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Perugia (Italy)
Socio fondatore GFOSS "Geospatial Free and Open Source Software" http://www.gfoss.it
e-mail: ivan(dot)marchesini(at)irpi(dot)cnr(dot)it
ivan(dot)marchesini(at)gmail(dot)com
fax (mailfax): +39 1782092534
jabber: geoivan73(at)jabber(dot)org
skype: geoivan73


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: <ivan(dot)marchesini(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>, "Mauro Rossi" <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 14:16:15
Message-ID: D960CB61B694CF459DCFB4B0128514C208C99D90@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

ivan marchesini wrote:
> Concerning the problem of the same "running time" I'm really surprised
> but I can confirm that the EXPLAIN ANALIZE time is different from the
> query execution time.
> probably it happen something like this:
> http://bytes.com/topic/postgresql/answers/422841-explain-analyze-total-runtime-walltime
> But we have tried to remove the fk constraints without any improvement
>
> Concerning the client problem, e have tried to execute the query
> directly on the server using psql.
> The difference is still in the order of seconds: around 10 ms VS 5000 ms
>
> many thanks...
>
> other suggestions?

As I said -- could the time be spent on the client?

Yours,
Laurenz Albe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: ivan(dot)marchesini(at)gmail(dot)com, pgsql-general(at)postgresql(dot)org, "Mauro Rossi" <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 15:51:50
Message-ID: 13311.1354290710@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> writes:
> As I said -- could the time be spent on the client?

It is probably some combination of

(1) time to fetch the wide geometry values from the table's TOAST table
(2) time to convert the geometry values to text form
(3) time to transmit the larger volume of data to the client
(4) client-side processing time

None of these costs are expended in an EXPLAIN ANALYZE, which is
why the time reported for that doesn't change materially.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, ivan(dot)marchesini(at)gmail(dot)com, pgsql-general(at)postgresql(dot)org, Mauro Rossi <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 16:12:36
Message-ID: CAHyXU0xUxwM6UVqP0vx3wMjT0rk4BKis9QYTBKviJweJg8s5og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Nov 30, 2012 at 9:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> writes:
>> As I said -- could the time be spent on the client?
>
> It is probably some combination of
>
> (1) time to fetch the wide geometry values from the table's TOAST table
> (2) time to convert the geometry values to text form
> (3) time to transmit the larger volume of data to the client
> (4) client-side processing time
>
> None of these costs are expended in an EXPLAIN ANALYZE, which is
> why the time reported for that doesn't change materially.

Also possibly decompression time too.

merlin


From: Ivan Marchesini <ivan(dot)marchesini(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Mauro Rossi <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>, pgsql-general(at)postgresql(dot)org
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 16:30:44
Message-ID: CADxXfpCaFgLuqWGUa8+4RAZmEG61TKcxCTWwu75rxCp1q+2D_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi Laurenz
Thanks but as I said I have tried using psql directly on the server. And
still there is a strong difference between the queries execution time with
and without the geometric field.
May be it is a problem of clients but I obtain the same problem with
pgadmin, psql and other servers. So I suppose it is not. :-)

Thanks
Ivan

inviato da smartphone
Il giorno 30/nov/2012 15:16, "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> ha
scritto:

> ivan marchesini wrote:
> > Concerning the problem of the same "running time" I'm really surprised
> > but I can confirm that the EXPLAIN ANALIZE time is different from the
> > query execution time.
> > probably it happen something like this:
> >
> http://bytes.com/topic/postgresql/answers/422841-explain-analyze-total-runtime-walltime
> > But we have tried to remove the fk constraints without any improvement
> >
> > Concerning the client problem, e have tried to execute the query
> > directly on the server using psql.
> > The difference is still in the order of seconds: around 10 ms VS 5000 ms
> >
> > many thanks...
> >
> > other suggestions?
>
> As I said -- could the time be spent on the client?
>
> Yours,
> Laurenz Albe
>


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-11-30 21:46:50
Message-ID: 50B9294A.3070807@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 11/30/12 8:30 AM, Ivan Marchesini wrote:
> Thanks but as I said I have tried using psql directly on the server.
> And still there is a strong difference between the queries execution
> time with and without the geometric field.
> May be it is a problem of clients but I obtain the same problem with
> pgadmin, psql and other servers. So I suppose it is not. :-)

query without Geometry is returning 1474 tuples of 133 bytes each.
thats about 200 kbytes worth of tuples which have to be sorted, then
converted to client format.

query WITH geometry is returning 1474 tuples of 196111 bytes each. thats
about 300MB which have to be sorted, then converted to client format.

you're surprised handling 300MB takes longer than 200KB?

--
john r pierce N 37, W 122
santa cruz ca mid-left coast


From: ivan marchesini <ivan(dot)marchesini(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, Mauro Rossi <mauro(dot)rossi(at)irpi(dot)cnr(dot)it>
Subject: Re: difference in query performance due to the inclusion of a polygon geometry field
Date: 2012-12-02 10:52:42
Message-ID: 1354445562.22442.79.camel@beedo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi Tom,
many thanks for your answer,

Il giorno ven, 30/11/2012 alle 10.51 -0500, Tom Lane ha scritto:
> (1) time to fetch the wide geometry values from the table's TOAST table

Yes.. this could be the point!!!!!

In fact I do not face problems when a execute a select using similar
tables (also larger than those I described in the previous e-mail) but
containing POINTS instead of POLYGONS

I suppose that for the points (that are short) postgresql do not use
TOAST tables. On the contrary postgres use TOAST table with the polygons
(that are long).

Can you suggest a workaround to solve this problem?
Should I ask to the PostGIS user mailing list?

Best regards,
Ivan

--
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Perugia (Italy)
Socio fondatore GFOSS "Geospatial Free and Open Source Software" http://www.gfoss.it
e-mail: ivan(dot)marchesini(at)irpi(dot)cnr(dot)it
ivan(dot)marchesini(at)gmail(dot)com
fax (mailfax): +39 1782092534
jabber: geoivan73(at)jabber(dot)org
skype: geoivan73