Re: planner fails on HEAD

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: planner fails on HEAD
Date: 2011-12-04 07:40:31
Message-ID: CAFj8pRAm9bhxzi4Hnc9V=5kUsyJBwacYiX+8cP94AhwK3yTTtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I have a relative simple query

SELECT q.object_id
FROM queue q
JOIN
outgoing.cps_forms f
ON f.id = q.object_id AND q.object_type = 'cp'
JOIN
flat_file_ex fe
ON fe.id = q.rejected_flat_file_id
WHERE q.rejected_result = 'ACTV';

The planner fails on this query

#0 0x00cf7424 in __kernel_vsyscall ()
#1 0x004752f1 in raise () from /lib/libc.so.6
#2 0x00476d5e in abort () from /lib/libc.so.6
#3 0x083a1dfe in ExceptionalCondition (conditionName=0x8505474
"!(innerstartsel <= innerendsel)", errorType=0x83db178
"FailedAssertion", fileName=0x8505140 "costsize.c", lineNumber=1937)
at assert.c:57
#4 0x08244cea in cost_mergejoin (path=0x93acdd4, root=0x93935d4,
sjinfo=0xbfbc9504) at costsize.c:1937
#5 0x0826f859 in create_mergejoin_path (root=0x93935d4,
joinrel=0x93aad80, jointype=JOIN_INNER, sjinfo=0xbfbc9504,
outer_path=0x93ac0f8, inner_path=0x93ac080,
restrict_clauses=0x93acce0, pathkeys=0x0,
mergeclauses=0x93adcb4, outersortkeys=0x93adc98,
innersortkeys=0x93adcd0) at pathnode.c:1576
#6 0x0824cee4 in sort_inner_and_outer (root=0x93935d4,
joinrel=0x93aad80, outerrel=0x93a9a20, innerrel=0x9393e04,
jointype=JOIN_INNER, sjinfo=0xbfbc9504, restrictlist=0x93acce0) at
joinpath.c:306
#7 add_paths_to_joinrel (root=0x93935d4, joinrel=0x93aad80,
outerrel=0x93a9a20, innerrel=0x9393e04, jointype=JOIN_INNER,
sjinfo=0xbfbc9504, restrictlist=0x93acce0) at joinpath.c:103
#8 0x0824ea12 in make_join_rel (root=0x93935d4, rel1=0x9393e04,
rel2=0x93a9a20) at joinrels.c:733
#9 0x0824ee48 in make_rels_by_clause_joins (root=0x93935d4, level=2)
at joinrels.c:268
#10 join_search_one_level (root=0x93935d4, level=2) at joinrels.c:99
#11 0x082410bf in standard_join_search (root=0x93935d4,
levels_needed=3, initial_rels=0x93ac998) at allpaths.c:1127
#12 0x082412cf in make_rel_from_joinlist (root=0x93935d4,
joinlist=<value optimized out>) at allpaths.c:1058
#13 0x08241390 in make_one_rel (root=0x93935d4, joinlist=0x93aad64) at
allpaths.c:103
#14 0x082593d0 in query_planner (root=0x93935d4, tlist=0x939f740,
tuple_fraction=0, limit_tuples=-1, cheapest_path=0xbfbc98dc,
sorted_path=0xbfbc98d8, num_groups=0xbfbc98d0) at planmain.c:259
#15 0x0825b24d in grouping_planner (root=0x93935d4, tuple_fraction=0)
at planner.c:1240
#16 0x0825cfbd in subquery_planner (glob=0x939f37c, parse=0x9370b08,
parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0,
subroot=0xbfbc9a7c) at planner.c:524
#17 0x0825d8dd in standard_planner (parse=0x9370b08, cursorOptions=0,
boundParams=0x0) at planner.c:196
#18 0x082d1b2e in pg_plan_query (querytree=0x9370b08, cursorOptions=0,
boundParams=0x0) at postgres.c:720
#19 0x082d1c33 in pg_plan_queries (querytrees=0x939f360,
cursorOptions=0, boundParams=0x0) at postgres.c:779
#20 0x082d26fc in exec_simple_query (argc=2, argv=0x92f95a4,
username=0x92f94a0 "pavel") at postgres.c:944
#21 PostgresMain (argc=2, argv=0x92f95a4, username=0x92f94a0 "pavel")
at postgres.c:3859
#22 0x082844ae in BackendRun (port=0x9316600) at postmaster.c:3587
#23 BackendStartup (port=0x9316600) at postmaster.c:3272
#24 0x08284b58 in ServerLoop () at postmaster.c:1350
#25 0x082856f3 in PostmasterMain (argc=3, argv=0x92f8308) at postmaster.c:1110
#26 0x0821cd00 in main (argc=3, argv=0x92f8308) at main.c:199

with little bit modified query planner does

ohs=# explain SELECT q.object_id
FROM queue q
JOIN
outgoing.cps_forms f
ON f.id = q.object_id AND q.object_type = 'cp'
JOIN
flat_file_ex fe
ON fe.id = q.rejected_flat_file_id
WHERE q.rejected_result = 'ACTVa';
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=0.00..154.05 rows=1 width=4)
-> Nested Loop (cost=0.00..145.77 rows=1 width=8)
-> Seq Scan on queue q (cost=0.00..137.49 rows=1 width=8)
Filter: ((object_type = 'cp'::bpchar) AND
(rejected_result = 'ACTVa'::bpchar))
-> Index Only Scan using cps_forms_pkey on cps_forms f
(cost=0.00..8.27 rows=1 width=4)
Index Cond: (id = q.object_id)
-> Index Only Scan using flat_file_ex_pkey on flat_file_ex fe
(cost=0.00..8.27 rows=1 width=4)
Index Cond: (id = q.rejected_flat_file_id)
(8 rows)

Data and necessary indexes should be correct

ohs=# \dt+
List of relations
Schema │ Name │ Type │ Owner │ Size │ Description
────────┼───────────────────────┼───────┼──────────┼─────────┼─────────────
public │ cps_form │ table │ postgres │ 48 kB │
public │ flat_file_ex │ table │ pavel │ 2632 kB │
public │ np_form │ table │ pavel │ 432 kB │
public │ np_return_number_form │ table │ pavel │ 48 kB │
public │ queue │ table │ pavel │ 568 kB │
(5 rows)

ohs=# \dt+ outgoing.*
List of relations
Schema │ Name │ Type │ Owner │ Size │
Description
──────────┼───────────────────────┼───────┼─────────────┼────────────┼─────────────
outgoing │ comlog │ table │ dialtelecom │ 8192 bytes │
outgoing │ cps_forms │ table │ dialtelecom │ 40 kB │
outgoing │ cps_forms_hist │ table │ dialtelecom │ 72 kB │
outgoing │ flatfiles │ table │ dialtelecom │ 1328 kB │
outgoing │ flatfiles_hist │ table │ dialtelecom │ 8192 bytes │
outgoing │ log │ table │ dialtelecom │ 8192 bytes │
outgoing │ np_forms │ table │ dialtelecom │ 400 kB │
outgoing │ np_forms_hist │ table │ dialtelecom │ 560 kB │
outgoing │ opids_soap_properties │ table │ dialtelecom │ 16 kB │
outgoing │ order_numbers │ table │ dialtelecom │ 0 bytes │
(10 rows)

ohs=# select version();
version
──────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 9.2devel on i686-pc-linux-gnu, compiled by gcc (GCC) 4.5.1
20100924 (Red Hat 4.5.1-4), 32-bit
(1 row)

[pavel(at)nemesis ohs]$ uname -a
Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51
UTC 2011 i686 i686 i386 GNU/Linux

I am able to send data by some private stream

Regards

Pavel Stehule


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-04 07:49:24
Message-ID: CAFj8pRCRQOoi3jfBFh03mxdRXvVyaKU0ETbyZW6wR5GyU_WWQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

a plan for modified query is

ohs=# explain analyze SELECT object_id,
inserted,
'ASSIGN_RSLT',
order_id,
2,
seqnum,
rejected_flat_file_id,
true
FROM (
SELECT q.object_id,
fe.inserted,
q.order_id,
q.seqnum,
q.rejected_flat_file_id,
q.rejected_result
FROM queue q
JOIN
outgoing.cps_forms f
ON f.id = q.object_id AND q.object_type = 'cp'
JOIN
flat_file_ex fe
ON fe.id = q.rejected_flat_file_id
offset 0) x
WHERE rejected_result = 'ACTV';

QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Subquery Scan on x (cost=11.68..192.72 rows=1 width=24) (actual
time=1.748..12.398 rows=139 loops=1)
Filter: (x.rejected_result = 'ACTV'::bpchar)
Rows Removed by Filter: 17
-> Limit (cost=11.68..192.65 rows=6 width=29) (actual
time=1.739..11.655 rows=156 loops=1)
-> Nested Loop (cost=11.68..192.65 rows=6 width=29) (actual
time=1.732..11.036 rows=156 loops=1)
-> Hash Join (cost=11.68..138.77 rows=15 width=21)
(actual time=1.459..6.987 rows=186 loops=1)
Hash Cond: (q.object_id = f.id)
-> Seq Scan on queue q (cost=0.00..126.24
rows=186 width=21) (actual time=0.032..4.658 rows=186 loops=1)
Filter: (object_type = 'cp'::bpchar)
Rows Removed by Filter: 4313
-> Hash (cost=9.08..9.08 rows=208 width=4)
(actual time=1.402..1.402 rows=208 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 5kB
-> Seq Scan on cps_forms f
(cost=0.00..9.08 rows=208 width=4) (actual time=0.008..0.576 rows=208
loops=1)
-> Index Scan using flat_file_ex_pkey on flat_file_ex
fe (cost=0.00..3.58 rows=1 width=12) (actual time=0.008..0.010 rows=1
loops=186)
Index Cond: (id = q.rejected_flat_file_id)
Total runtime: 12.846 ms
(16 rows)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-04 20:29:06
Message-ID: 14624.1323030546@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> #3 0x083a1dfe in ExceptionalCondition (conditionName=0x8505474
> "!(innerstartsel <= innerendsel)", errorType=0x83db178
> "FailedAssertion", fileName=0x8505140 "costsize.c", lineNumber=1937)
> at assert.c:57

[ scratches head ... ] Given that it got past the previous assertions,
surely that ought to be impossible. Could we see the values of
cost_mergejoin's local variables, please?

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-04 21:51:29
Message-ID: CAFj8pRBstZu74BDT4g-NwQtbWqnWtn2xOfsUZKc5gBcp1fWvyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> #3  0x083a1dfe in ExceptionalCondition (conditionName=0x8505474
>> "!(innerstartsel <= innerendsel)", errorType=0x83db178
>> "FailedAssertion", fileName=0x8505140 "costsize.c", lineNumber=1937)
>> at assert.c:57
>
> [ scratches head ... ]  Given that it got past the previous assertions,
> surely that ought to be impossible.  Could we see the values of
> cost_mergejoin's local variables, please?

It is strange

when I put a fprintf(stderr, "const literal") to exactly before or
somewhere after assertion, then assertion is ok. Without fprintf
assertion fails again

it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
configured just with --enable-debug and --enable-cassert

when I put elog before calculation outerstartsel,
innerstartsel,outerendsel and innerendsel then it fails

the output is (last elog result)

outer_skip_rows: 0.0
inner_skip_rows: 1.0
outer_rows: 208.000
inner_rows: 1.00000

when I append elog to show selectivity, then it work again - related
selectivity is

outerstartsel: 0.000
outerendsel: 1.000
innerstartsel: 0.166667
innerendsel: 0.166667

Regards

Pavel Stehule

>
>                        regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-04 22:11:07
Message-ID: 18665.1323036667@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> 2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> [ scratches head ... ] Given that it got past the previous assertions,
>> surely that ought to be impossible. Could we see the values of
>> cost_mergejoin's local variables, please?

> It is strange

> when I put a fprintf(stderr, "const literal") to exactly before or
> somewhere after assertion, then assertion is ok. Without fprintf
> assertion fails again

> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
> configured just with --enable-debug and --enable-cassert

Hmm. I'm betting that gcc has flushed one value to memory but the other
one is still in a register that's wider than memory, creating a roundoff
hazard. Can you look at the generated assembly code?

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-04 22:52:41
Message-ID: CAFj8pRAW+ZOH7i=oysnCXtmmhhi8SabvJ+VhX+xSC+pHmthJHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> 2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>> [ scratches head ... ]  Given that it got past the previous assertions,
>>> surely that ought to be impossible.  Could we see the values of
>>> cost_mergejoin's local variables, please?
>
>> It is strange
>
>> when I put a fprintf(stderr, "const literal") to exactly before or
>> somewhere after assertion, then assertion is ok. Without fprintf
>> assertion fails again
>
>> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
>> configured just with --enable-debug and --enable-cassert
>
> Hmm.  I'm betting that gcc has flushed one value to memory but the other
> one is still in a register that's wider than memory, creating a roundoff
> hazard.  Can you look at the generated assembly code?

I can, but tomorrow evening,

I'll send a code

Regards

Pavel
>
>                        regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-04 22:55:41
Message-ID: 19320.1323039341@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
> configured just with --enable-debug and --enable-cassert

Is this x86? I can't reproduce it on x86_64.

It's fairly easy to get a set of values such that innerstartsel *should*
equal innerendsel; but if one value has been rounded to memory precision
and the other hasn't, the assert could certainly fail.

Some digging around yields the information that the gcc hackers do not
consider this a bug, or at least adamantly refuse to do anything about it:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
Comment 47 is particularly relevant to our situation:

To summarize, this defect effectively states that:
assert( (x/y) == (x/y) )
may cause an assertion if compiled with optimization.

Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4
indicates that an explicit cast to double should help. Would
you check if the problem goes away if the Asserts are changed to

Assert((double) outerstartsel <= (double) outerendsel);
Assert((double) innerstartsel <= (double) innerendsel);

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-05 14:01:52
Message-ID: CAHyXU0yXnbGZCUAtRi4eiMUVabHJ5UjH9p9k=Z8jgA2Co4Zy5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 4, 2011 at 4:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
>> configured just with --enable-debug and --enable-cassert
>
> Is this x86?  I can't reproduce it on x86_64.

reading all the comments in the gcc bug report, this is because x86
targets the x87 fpu by default which is where the bug is -- it's a
hardware problem. x86_64 targets sse which has stricter standards for
rounding. most x86 processors support sse -- is there a reason why we
don't target sse?

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-05 15:30:54
Message-ID: 27668.1323099054@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Sun, Dec 4, 2011 at 4:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
> configured just with --enable-debug and --enable-cassert
>>
>> Is this x86? I can't reproduce it on x86_64.

> reading all the comments in the gcc bug report, this is because x86
> targets the x87 fpu by default which is where the bug is -- it's a
> hardware problem. x86_64 targets sse which has stricter standards for
> rounding. most x86 processors support sse -- is there a reason why we
> don't target sse?

Well, older machines won't have sse, and in any case I think x86 is not
the only architecture with the issue, just the most popular one.
Floating-point registers that are wider than standard double are hardly
an unusual idea.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-05 18:17:54
Message-ID: CAFj8pRCLihXFXcK=TWV7-ftHYmG4VDdAG-LUcQhmd=TZOfkZUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
>> configured just with --enable-debug and --enable-cassert
>
> Is this x86?  I can't reproduce it on x86_64.
>

yes, this is x86 platform

uname -a
Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51
UTC 2011 i686 i686 i386 GNU/Linux

[pavel(at)nemesis ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
stepping : 11
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi
flexpriority
bogomips : 4785.76
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
stepping : 11
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi
flexpriority
bogomips : 4786.60
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

it is Dell latitude D830

> It's fairly easy to get a set of values such that innerstartsel *should*
> equal innerendsel; but if one value has been rounded to memory precision
> and the other hasn't, the assert could certainly fail.
>
> Some digging around yields the information that the gcc hackers do not
> consider this a bug, or at least adamantly refuse to do anything about it:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
> Comment 47 is particularly relevant to our situation:
>
>        To summarize, this defect effectively states that:
>        assert( (x/y) == (x/y) )
>        may cause an assertion if compiled with optimization.
>
> Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4
> indicates that an explicit cast to double should help.  Would
> you check if the problem goes away if the Asserts are changed to
>
>        Assert((double) outerstartsel <= (double) outerendsel);
>        Assert((double) innerstartsel <= (double) innerendsel);
>

it doesn't help

>                        regards, tom lane

assambler list is attached

Regards

Pavel Stehule

Attachment Content-Type Size
costsize.lst.gz application/x-gzip 171.4 KB

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-05 18:23:20
Message-ID: CAHyXU0yrSbZRaadoRYbvxE8tMcN9WXikzJ34qdSBfdf=Ym0fhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 5, 2011 at 12:17 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hello
>
> 2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>>> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
>>> configured just with --enable-debug and --enable-cassert
>>
>> Is this x86?  I can't reproduce it on x86_64.
>>
>
> yes, this is x86 platform
>
> uname -a
> Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51
> UTC 2011 i686 i686 i386 GNU/Linux
>
> [pavel(at)nemesis ~]$ cat /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
> stepping        : 11
> cpu MHz         : 800.000
> cache size      : 4096 KB
> physical id     : 0
> siblings        : 2
> core id         : 0
> cpu cores       : 2
> apicid          : 0
> initial apicid  : 0
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi
> flexpriority
> bogomips        : 4785.76
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
> processor       : 1
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
> stepping        : 11
> cpu MHz         : 800.000
> cache size      : 4096 KB
> physical id     : 0
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> apicid          : 1
> initial apicid  : 1
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi
> flexpriority
> bogomips        : 4786.60
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
> it is Dell latitude D830
>
>> It's fairly easy to get a set of values such that innerstartsel *should*
>> equal innerendsel; but if one value has been rounded to memory precision
>> and the other hasn't, the assert could certainly fail.
>>
>> Some digging around yields the information that the gcc hackers do not
>> consider this a bug, or at least adamantly refuse to do anything about it:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
>> Comment 47 is particularly relevant to our situation:
>>
>>        To summarize, this defect effectively states that:
>>        assert( (x/y) == (x/y) )
>>        may cause an assertion if compiled with optimization.
>>
>> Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4
>> indicates that an explicit cast to double should help.  Would
>> you check if the problem goes away if the Asserts are changed to
>>
>>        Assert((double) outerstartsel <= (double) outerendsel);
>>        Assert((double) innerstartsel <= (double) innerendsel);
>>
>
> it doesn't help
>
>>                        regards, tom lane
>
> assambler list is attached

how about:
 Assert((volatile double) outerstartsel <= (volatile double) outerendsel);
etc

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-05 18:30:26
Message-ID: CAFj8pRB99s1XLeK4hP2jk4u4FpMFv5y_DTtjezT80eU9GndWaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/12/5 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Mon, Dec 5, 2011 at 12:17 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> Hello
>>
>> 2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>>>> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was
>>>> configured just with --enable-debug and --enable-cassert
>>>
>>> Is this x86?  I can't reproduce it on x86_64.
>>>
>>
>> yes, this is x86 platform
>>
>> uname -a
>> Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51
>> UTC 2011 i686 i686 i386 GNU/Linux
>>
>> [pavel(at)nemesis ~]$ cat /proc/cpuinfo
>> processor       : 0
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 15
>> model name      : Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
>> stepping        : 11
>> cpu MHz         : 800.000
>> cache size      : 4096 KB
>> physical id     : 0
>> siblings        : 2
>> core id         : 0
>> cpu cores       : 2
>> apicid          : 0
>> initial apicid  : 0
>> fdiv_bug        : no
>> hlt_bug         : no
>> f00f_bug        : no
>> coma_bug        : no
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 10
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
>> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
>> constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
>> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi
>> flexpriority
>> bogomips        : 4785.76
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 36 bits physical, 48 bits virtual
>> power management:
>>
>> processor       : 1
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 15
>> model name      : Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
>> stepping        : 11
>> cpu MHz         : 800.000
>> cache size      : 4096 KB
>> physical id     : 0
>> siblings        : 2
>> core id         : 1
>> cpu cores       : 2
>> apicid          : 1
>> initial apicid  : 1
>> fdiv_bug        : no
>> hlt_bug         : no
>> f00f_bug        : no
>> coma_bug        : no
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 10
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
>> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
>> constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
>> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi
>> flexpriority
>> bogomips        : 4786.60
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 36 bits physical, 48 bits virtual
>> power management:
>>
>> it is Dell latitude D830
>>
>>> It's fairly easy to get a set of values such that innerstartsel *should*
>>> equal innerendsel; but if one value has been rounded to memory precision
>>> and the other hasn't, the assert could certainly fail.
>>>
>>> Some digging around yields the information that the gcc hackers do not
>>> consider this a bug, or at least adamantly refuse to do anything about it:
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
>>> Comment 47 is particularly relevant to our situation:
>>>
>>>        To summarize, this defect effectively states that:
>>>        assert( (x/y) == (x/y) )
>>>        may cause an assertion if compiled with optimization.
>>>
>>> Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4
>>> indicates that an explicit cast to double should help.  Would
>>> you check if the problem goes away if the Asserts are changed to
>>>
>>>        Assert((double) outerstartsel <= (double) outerendsel);
>>>        Assert((double) innerstartsel <= (double) innerendsel);
>>>
>>
>> it doesn't help
>>
>>>                        regards, tom lane
>>
>> assambler list is attached
>
> how about:
>  Assert((volatile double) outerstartsel <= (volatile double) outerendsel);

doesn't help too

Regards

Pavel

> etc
>
> merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: planner fails on HEAD
Date: 2011-12-05 19:30:40
Message-ID: 2327.1323113440@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> 2011/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> Is this x86? I can't reproduce it on x86_64.

> yes, this is x86 platform
> uname -a
> Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51
> UTC 2011 i686 i686 i386 GNU/Linux

I reproduced this with gcc 4.6.0 on Fedora 15 x86, too.

>> Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4
>> indicates that an explicit cast to double should help. Would
>> you check if the problem goes away if the Asserts are changed to
>>
>> Assert((double) outerstartsel <= (double) outerendsel);
>> Assert((double) innerstartsel <= (double) innerendsel);

> it doesn't help

Hmm ... I'm inclined to think this actually *is* a bug, since Jakub is
on record as saying it should work. Nonetheless, we need a workaround,
since gcc versions behaving this way are going to be widespread for a
long time even if we convince them to do something about it (which I
suspect they wouldn't given their imperviousness to complaints about the
main issue).

I'm now thinking the best solution is just to drop these two Asserts.
They're not adding anything very useful given the previous ones (which
should be safe since those involve quantities rounded to integers).

regards, tom lane