Re: Query hangs when getting too complex...

Lists: pgsql-general
From: Paulo Jan <admin(at)digital(dot)ddnet(dot)es>
To: pgsql-general(at)postgresql(dot)org
Subject: Query hangs when getting too complex...
Date: 2001-12-19 13:20:26
Message-ID: 3C20941A.44021ECF@digital.ddnet.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all:

I have here a query that hangs everytime I try to execute it, even
though IMO it isn't *that* complicated (nor is the database *that* big).
The relevant tables are:

Table "archivos"
Attribute | Type | Modifier
-------------------+--------------------------+----------
idarchivo | integer | not null
codigofoto | numeric(18,0) |
idsexo | integer |
idprovincia | integer |
idccaa | integer |
idpais | integer |
place | character varying(255) |
fechafoto | timestamp with time zone |
nino | numeric(1,0) | not null
joven | numeric(1,0) | not null
adulto | numeric(1,0) | not null
anciano | numeric(1,0) | not null
posado | numeric(1,0) | not null
title | character varying(255) |
description | character varying(1000) |
limitations | character varying(500) |

Table "archivos_fulltext_en"
Attribute | Type | Modifier
-------------+--------------------------+----------
idarchivo | integer |
title | character varying(255) |
description | character varying(1000) |
place | character varying(255) |
country | character varying(255) |
state | character varying(255) |
province | character varying(255) |
sex | character varying(50) |
codigofoto | numeric(18,0) |
fechafoto | timestamp with time zone |
revision | integer |
race | character varying(4096) |
tesauro_en | text |
class | character varying(4096) |
personaje | character varying(4096) |

Where "archivos" is, let's say, the "real" table, and it contains
information about photographs. "archivos_fulltext_en" is a denormalized
version that I created to do full text searches, with all the text
fields in "archivos", plus some other fields ("personaje", "race", etc.)
that were originally stored in other tables (because they required
many-to-one relationships).
And the query that is giving me problems is:

SELECT count(idarchivo) FROM archivos WHERE
revision <= 3 AND (EXISTS
(SELECT idarchivo FROM archivos_fulltext_en WHERE revision <= 3
AND archivos_fulltext_en.idarchivo=archivos.idarchivo
AND LOWER(TRANSLATE(archivos_fulltext_en.title ||
archivos_fulltext_en.description || archivos_fulltext_en.place ||
archivos_fulltext_en.province || archivos_fulltext_en.state ||
archivos_fulltext_en.country || archivos_fulltext_en.race ||
archivos_fulltext_en.sex || archivos_fulltext_en.class ||
archivos_fulltext_en.tesauro_en, '[áéíóúÁÉÍÓÚ]', '[aeiouAEIOU]')) LIKE
'%actress%'))
AND idsexo=2 AND archivos.joven = 1 AND posado=1
AND fechafoto BETWEEN '01/1/1976' AND '19/12/2001'

When performing this query, Postgres just hangs there, and leaves the
following in the log:

Dec 19 13:57:18 master postgres[29995]: [381] DEBUG:
StartTransactionCommand
Dec 19 13:57:18 master postgres[29995]: [382-1] DEBUG: query: SELECT
count(idarchivo) FROM archivos WHERE
Dec 19 13:57:18 master postgres[29995]: [382-2] revision <= 3 AND
Dec 19 13:57:18 master postgres[29995]: [382-3] (EXISTS (SELECT
idarchivo FROM archivos_fulltext_en WHERE revision
Dec 19 13:57:18 master postgres[29995]: [382-4] <= 3 AND
archivos_fulltext_en.idarchivo=archivos.idarchivo AND
LOWER(TRANSLATE(archivos_fulltext_en.title ||
Dec 19 13:57:18 master postgres[29995]: [382-5]
archivos_fulltext_en.description || archivos_fulltext_en.place ||
archivos_fulltext_en.province || archivos_fulltext_en.state
Dec 19 13:57:18 master postgres[29995]: [382-6] ||
archivos_fulltext_en.country || archivos_fulltext_en.race ||
archivos_fulltext_en.sex || archivos_fulltext_en.class ||
Dec 19 13:57:18 master postgres[29995]: [382-7]
archivos_fulltext_en.tesauro_en, '[áéíóúÁÉÍÓÚ]', '[aeiouAEIOU]')) LIKE
'%actress%'))
Dec 19 13:57:18 master postgres[29995]: [382-8] AND idsexo=2 AND
archivos.joven = 1 AND
Dec 19 13:57:18 master postgres[29995]: [382-9] posado=1 AND fechafoto
BETWEEN '01/1/1976' AND '19/12/2001';
Dec 19 13:57:18 master postgres[29995]: [383] DEBUG: parse tree:

And it just stays there forever. "ps -auxwww" shows the backend as
"idle":

postgres 29995 6.0 5.8 57108 53808 pts/1 R 13:31 1:37
postgres: postgres covermaster [local] idle

And eventually I have to abort it and even kill the process by hand
(kill -9)

Now: the interesting thing is that if I remove just one of the
conditions in the above query, it works. No matter which one:
"idsexo=2", "posado=1", "fechafoto BETWEEN '01/1/1976' AND
'19/12/2001'"... it just works.
As for the number of records in each table:

covermaster=# select count(idarchivo) from archivos;
count
-------
27340
(1 row)

covermaster=# select count(idarchivo) from archivos_fulltext_en;
count
-------
4249

The difference is due to the fact that, so far, the only photos that
are online are the ones where revision <= 3, so I just denormalized the
records that fulfilled that condition:

covermaster=# select count(idarchivo) from archivos where revision <= 3;
count
-------
4249
(1 row)

All this is taking place in a Dual Athlon server with 1Gb. RAM, running
Red Hat 6.2 and Postgres 7.1.2 (installed from RPMS). I have set the
postgres.conf file to:

sort_mem = 8192
shared_buffers = 6144

(And have increased the kernel shared memory to:

kernel.shmall=67108864
kernel.shmmax=67108864).

All the relevant fields in the above tables (revision, idsexo, joven,
etc.) are indexed, and I ran VACUUM ANALYZE right before the query
without it giving any errors.
Any ideas? The above query isn't even the most complex one that I'll be
performing; eventually there might be 5 or 6 more conditions to be
added... but of course, that can't happen if I don't solve this first.

Paulo Jan.
DDnet.


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Paulo Jan <admin(at)mail(dot)ddnet(dot)es>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Query hangs when getting too complex...
Date: 2001-12-19 14:25:17
Message-ID: 20011219061828.A70705-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


> SELECT count(idarchivo) FROM archivos WHERE
> revision <= 3 AND (EXISTS
> (SELECT idarchivo FROM archivos_fulltext_en WHERE revision <= 3
> AND archivos_fulltext_en.idarchivo=archivos.idarchivo
> AND LOWER(TRANSLATE(archivos_fulltext_en.title ||
> archivos_fulltext_en.description || archivos_fulltext_en.place ||
> archivos_fulltext_en.province || archivos_fulltext_en.state ||
> archivos_fulltext_en.country || archivos_fulltext_en.race ||
> archivos_fulltext_en.sex || archivos_fulltext_en.class ||
> archivos_fulltext_en.tesauro_en, '[]', '[aeiouAEIOU]')) LIKE
> '%actress%'))
> AND idsexo=2 AND archivos.joven = 1 AND posado=1
> AND fechafoto BETWEEN '01/1/1976' AND '19/12/2001'

That's wierd. What does explain show for the query?

BTW: If you're always using the archivos_fulltext_en values like the above
and inserts/updates are less frequent than these selects, you might want
to do a trigger that does the lower(translate()) for you when the rows are
added.
As an additional note, I don't think the query above does what you want in
any case unless there are delimiters at the beginings or end of the
strings you're concatenating, and if any of the strings is null the result
of the concatenation is null. (Imagine that description ends in act and
place starts with ress for example)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paulo Jan <admin(at)mail(dot)ddnet(dot)es>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query hangs when getting too complex...
Date: 2001-12-19 15:14:12
Message-ID: 29545.1008774852@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Paulo Jan <admin(at)mail(dot)ddnet(dot)es> writes:
> I have here a query that hangs everytime I try to execute it, even
> though IMO it isn't *that* complicated (nor is the database *that* big).

Very bizarre. Can you attach to the backend process with gdb and get a
backtrace to see where it's hung up?

regards, tom lane


From: Paulo Jan <admin(at)digital(dot)ddnet(dot)es>
To: pgsql-general(at)postgresql(dot)org
Cc: fiol(at)w3ping(dot)com
Subject: Re: Query hangs when getting too complex...
Date: 2001-12-19 15:16:53
Message-ID: 3C20AF65.AD2C0996@digital.ddnet.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Stephan Szabo wrote:
>
> > SELECT count(idarchivo) FROM archivos WHERE
> > revision <= 3 AND (EXISTS
> > (SELECT idarchivo FROM archivos_fulltext_en WHERE revision <= 3
> > AND archivos_fulltext_en.idarchivo=archivos.idarchivo
> > AND LOWER(TRANSLATE(archivos_fulltext_en.title ||
> > archivos_fulltext_en.description || archivos_fulltext_en.place ||
> > archivos_fulltext_en.province || archivos_fulltext_en.state ||
> > archivos_fulltext_en.country || archivos_fulltext_en.race ||
> > archivos_fulltext_en.sex || archivos_fulltext_en.class ||
> > archivos_fulltext_en.tesauro_en, '[áéíóúÁÉÍÓÚ]', '[aeiouAEIOU]')) LIKE
> > '%actress%'))
> > AND idsexo=2 AND archivos.joven = 1 AND posado=1
> > AND fechafoto BETWEEN '01/1/1976' AND '19/12/2001'
>
> That's wierd. What does explain show for the query?
>

Oh, yes, I forgot *that* one:

covermaster=# explain
SELECT count(idarchivo) FROM archivos WHERE
revision <= 3 AND
(EXISTS (SELECT idarchivo FROM archivos_fulltext_en WHERE revision
<= 3 AND archivos_fulltext_en.idarchivo=archivos.idarchivo AND
LOWER(TRANSLATE(archivos_fulltext_en.title ||
archivos_fulltext_en.description || archivos_fulltext_en.place ||
archivos_fulltext_en.province || archivos_fulltext_en.state ||
archivos_fulltext_en.country || archivos_fulltext_en.race ||
archivos_fulltext_en.sex || archivos_fulltext_en.class ||
archivos_fulltext_en.tesauro_en, '[áéíóúÁÉÍÓÚ]', '[aeiouAEIOU]')) LIKE
'%politic%'))
AND idsexo=2 AND archivos.joven = 1 AND
posado=1 AND fechafoto BETWEEN '01/1/1976' AND '19/12/2001';
NOTICE: QUERY PLAN:

Aggregate (cost=15.59..15.59 rows=1 width=4)
-> Index Scan using archivos_idx_1 on archivos (cost=0.00..15.59
rows=1 width=4)
SubPlan
-> Index Scan using archivos_fulltext_en_idx_1 on
archivos_fulltext_en (cost=0.00..2.06 rows=1 width=4)

"archivos_idx_1" is an index on the "revision" column of "archivos",
while "archivos_fulltext_en_idx_1" is an index on
archivos_fulltext_en.idarchivo.

If I take out one condition of the above query, say "idsexo=2", EXPLAIN
shows:

covermaster=# explain
SELECT count(idarchivo) FROM archivos WHERE
revision <= 3 AND
(EXISTS (SELECT idarchivo FROM archivos_fulltext_en WHERE revision
<= 3 AND archivos_fulltext_en.idarchivo=archivos.idarchivo AND
LOWER(TRANSLATE(archivos_fulltext_en.title ||
archivos_fulltext_en.description || archivos_fulltext_en.place ||
archivos_fulltext_en.province || archivos_fulltext_en.state ||
archivos_fulltext_en.country || archivos_fulltext_en.race ||
archivos_fulltext_en.sex || archivos_fulltext_en.class ||
archivos_fulltext_en.tesauro_en, '[áéíóúÁÉÍÓÚ]', '[aeiouAEIOU]')) LIKE
'%politic%'))
AND archivos.joven = 1 AND
posado=1 AND fechafoto BETWEEN '01/1/1976' AND '19/12/2001';
NOTICE: QUERY PLAN:

Aggregate (cost=15.58..15.58 rows=1 width=4)
-> Index Scan using archivos_idx_1 on archivos (cost=0.00..15.58
rows=1 width=4)
SubPlan
-> Index Scan using archivos_fulltext_en_idx_1 on
archivos_fulltext_en (cost=0.00..2.06 rows=1 width=4)

Any ideas? Personally, I don't see anything odd here (other than the
fact that there's nothing odd <g>).

> As an additional note, I don't think the query above does what you want in
> any case unless there are delimiters at the beginings or end of the
> strings you're concatenating, and if any of the strings is null the result
> of the concatenation is null. (Imagine that description ends in act and
> place starts with ress for example)

I'm aware of these... erm, "additional factors", and they are accounted
for.

Paulo Jan.
DDnet.


From: Paulo Jan <admin(at)digital(dot)ddnet(dot)es>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query hangs when getting too complex...
Date: 2001-12-19 18:09:05
Message-ID: 3C20D7C1.FFFE9D98@digital.ddnet.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
>
> Paulo Jan <admin(at)mail(dot)ddnet(dot)es> writes:
> > I have here a query that hangs everytime I try to execute it, even
> > though IMO it isn't *that* complicated (nor is the database *that* big).
>
> Very bizarre. Can you attach to the backend process with gdb and get a
> backtrace to see where it's hung up?
>

Uh... I don't have much experience using gdb (actually, I just started
learning <g>), but anyway:

1) I started psql and then looked at the PID of the "postgres" backend
process.
2) I typed "gdb postgres (PID)":

Attaching to program: /usr/bin/postgres, Pid 30492
Reading symbols from /usr/lib/libssl.so.0...(no debugging symbols
found)...
done.
Reading symbols from /usr/lib/libcrypto.so.0...(no debugging symbols
found)...
done.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols
found)...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libresolv.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /usr/lib/libreadline.so.3...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libtermcap.so.2...done.
Reading symbols from /usr/lib/gconv/ISO8859-1.so...done.
---Type <return> to continue, or q <return> to quit---
Reading symbols from /lib/libnss_files.so.2...done.
0x40276ba2 in __libc_recv () from /lib/libc.so.6

3) I typed the query in psql and started it. psql hangs...
4) ...and in the meantime, I type "bt" in the gdb window:

(gdb) bt
#0 0x40276ba2 in __libc_recv () from /lib/libc.so.6
#1 0x80c7c5a in StreamClose ()
#2 0x80c7d57 in pq_getbytes ()
#3 0x80fc54c in HandleFunctionRequest ()
#4 0x80fc5d5 in HandleFunctionRequest ()
#5 0x80fd9de in PostgresMain ()
#6 0x80e8035 in PostmasterMain ()
#7 0x80e7c1c in PostmasterMain ()
#8 0x80e6e77 in PostmasterMain ()
#9 0x80e6826 in PostmasterMain ()
#10 0x80c89fe in main ()
#11 0x401dc9cb in __libc_start_main (main=0x80c8890 <main>, argc=3,
argv=0xbffffbc4, init=0x8065a78 <_init>, fini=0x8145f2c <_fini>,
rtld_fini=0x4000aea0 <_dl_fini>, stack_end=0xbffffbbc)
at ../sysdeps/generic/libc-start.c:92

Then, just in case, I decided to do a backtrace of "psql", so I exited
gdb and did a "gdb psql (PID of psql)", and then a "bt":

Attaching to program: /usr/bin/psql, Pid 30491
Reading symbols from /usr/lib/libpq.so.2...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libssl.so.0...(no debugging symbols
found)...
done.
Reading symbols from /usr/lib/libcrypto.so.0...(no debugging symbols
found)...
done.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols
found)...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libresolv.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /usr/lib/libreadline.so.3...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libtermcap.so.2...done.
---Type <return> to continue, or q <return> to quit---
Reading symbols from /lib/libnss_files.so.2...done.
0x4028215e in __select () from /lib/libc.so.6
(gdb) bt
#0 0x4028215e in __select () from /lib/libc.so.6
#1 0x4002487c in ?? () from /usr/lib/libpq.so.2
#2 0x4001c2da in PQgetResult () from /usr/lib/libpq.so.2
#3 0x4001c47f in PQexec () from /usr/lib/libpq.so.2
#4 0x804d79d in strcpy () at ../sysdeps/generic/strcpy.c:30
#5 0x804f041 in strcpy () at ../sysdeps/generic/strcpy.c:30
#6 0x80502b6 in strcpy () at ../sysdeps/generic/strcpy.c:30
#7 0x401ed9cb in __libc_start_main (main=0x804fcc0 <strcpy+22720>,
argc=2,
argv=0xbffffb74, init=0x8049be0 <_init>, fini=0x805797c <_fini>,
rtld_fini=0x4000aea0 <_dl_fini>, stack_end=0xbffffb6c)
at ../sysdeps/generic/libc-start.c:92

Did I do it right? Is there anything informative here?

Paulo Jan.
DDnet.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paulo Jan <admin(at)mail(dot)ddnet(dot)es>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query hangs when getting too complex...
Date: 2001-12-19 18:44:37
Message-ID: 1862.1008787477@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Paulo Jan <admin(at)mail(dot)ddnet(dot)es> writes:
> 1) I started psql and then looked at the PID of the "postgres" backend
> process.
> 2) I typed "gdb postgres (PID)":
> 3) I typed the query in psql and started it. psql hangs...
> 4) ...and in the meantime, I type "bt" in the gdb window:

Not quite the right sequence: I think what you've captured is just the
backend waiting to receive the query. Start the failing query, and
*after* it hangs attach to the backend process with gdb.

regards, tom lane


From: Paulo Jan <admin(at)digital(dot)ddnet(dot)es>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query hangs when getting too complex...
Date: 2001-12-19 18:56:22
Message-ID: 3C20E2D6.1E1E191C@digital.ddnet.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> Not quite the right sequence: I think what you've captured is just the
> backend waiting to receive the query. Start the failing query, and
> *after* it hangs attach to the backend process with gdb.
>
> regards, tom lane

Here it is, but it looks the same to me:

[root(at)master /root]# gdb postgres 30592
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux"...
(no debugging symbols found)...

/root/30592: No such file or directory.
Attaching to program: /usr/bin/postgres, Pid 30592
Reading symbols from /usr/lib/libssl.so.0...(no debugging symbols
found)...
done.
Reading symbols from /usr/lib/libcrypto.so.0...(no debugging symbols
found)...
done.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols
found)...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libresolv.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /usr/lib/libreadline.so.3...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libtermcap.so.2...done.
Reading symbols from /usr/lib/gconv/ISO8859-1.so...done.
---Type <return> to continue, or q <return> to quit---
Reading symbols from /lib/libnss_files.so.2...done.
0x80d58d7 in pprint ()
(gdb) bt
#0 0x80d58d7 in pprint ()
#1 0x80fc729 in pg_parse_and_rewrite ()
#2 0x80fc9a1 in pg_exec_query_string ()
#3 0x80fdab6 in PostgresMain ()
#4 0x80e8035 in PostmasterMain ()
#5 0x80e7c1c in PostmasterMain ()
#6 0x80e6e77 in PostmasterMain ()
#7 0x80e6826 in PostmasterMain ()
#8 0x80c89fe in main ()
#9 0x401dc9cb in __libc_start_main (main=0x80c8890 <main>, argc=3,
argv=0xbffffbc4, init=0x8065a78 <_init>, fini=0x8145f2c <_fini>,
rtld_fini=0x4000aea0 <_dl_fini>, stack_end=0xbffffbbc)
at ../sysdeps/generic/libc-start.c:92

Paulo Jan.
DDnet.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paulo Jan <admin(at)mail(dot)ddnet(dot)es>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query hangs when getting too complex...
Date: 2001-12-19 18:56:41
Message-ID: 1996.1008788201@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Paulo Jan <admin(at)mail(dot)ddnet(dot)es> writes:
> Here it is, but it looks the same to me:

No, it's not the same at all...

> #0 0x80d58d7 in pprint ()
> #1 0x80fc729 in pg_parse_and_rewrite ()
> #2 0x80fc9a1 in pg_exec_query_string ()
> #3 0x80fdab6 in PostgresMain ()

[ eyeballs pprint ] Oh my. Nothing like a fixed-size buffer to blow up
on you :-(. pprint will crash and burn if indentLev exceeds 80/3, and
I'll bet that's exactly what's happening on your complex query. Wonder
why no one noticed this before?

Try turning down your debug level so that the detailed query plan
doesn't get dumped. In the meantime, I'll do something to fix this
for 7.2.

regards, tom lane