memory-related bugs

Lists: pgsql-hackers
From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: memory-related bugs
Date: 2011-03-12 13:32:24
Message-ID: 20110312133224.GA7833@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A suitably-instrumented run of "make installcheck-world" under valgrind turned
up a handful of memory-related bugs:

* memcpy()/strncpy() between overlapping regions
uniqueentry() and dispell_lexize() both deduplicate an array by iteratively
copying elements downward to occlude the duplicates. Before finding a first
duplicate, these functions call memcpy() with identical arguments. Similarly,
resolve_polymorphic_tupdesc() calls TupleDescInitEntry() with an attributeName
pointing directly into the TupleDesc's name bytes, causing the latter to call
strncpy() with identical arguments. The attached mem1v1-memcpy-overlap.patch
fixes these sites by checking for equal pointers before the affected call. For
TupleDescInitEntry(), I considered instead having resolve_polymorphic_tupdesc()
pstrdup() the value.

* read past the end of a Form_pg_type in examine_attribute()
examine_attribute() copies a Form_pg_type naively. Since the nullable columns
at the end of the structure are not present in memory, the memcpy() reads eight
bytes past the end of the source allocation. mem2v1-analyze-overread.patch
updates this code to match how we address the same issue for Form_pg_attribute.

* off-by-one error in shift_jis_20042euc_jis_2004()
This function grabs two bytes at a time, even when only one byte remains; this
makes it read past the end of the input. mem3v1-sjis-offbyone.patch changes it
to not do this and to report an error when the input ends in a byte that would
start a two-byte sequence.

Thanks,
nm

Attachment Content-Type Size
mem1v1-memcpy-overlap.patch text/plain 1.9 KB
mem2v1-analyze-overread.patch text/plain 1.7 KB
mem3v1-sjis-offbyone.patch text/plain 1.7 KB

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-03-12 16:08:23
Message-ID: AANLkTimY3uU5dytT89UMw-mqwB-xXz2Dr9rMGiwk0B-=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 12, 2011 at 1:32 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> A suitably-instrumented run of "make installcheck-world" under valgrind turned
> up a handful of memory-related bugs:

Nice work. How did you instrument things so valgrind knew about palloc
et al? I remember trying this in the past and running into problems. I
think the biggest one was that we write out structs to disk including
padding so trigger lots of reads of uninitialized data warnings.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-03-12 17:44:29
Message-ID: 26254.1299951869@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> A suitably-instrumented run of "make installcheck-world" under valgrind turned
> up a handful of memory-related bugs:

Hmm, interesting work, but I don't think I believe in the necessity for
this kluge:

> + else if (attributeName != &(att->attname))
> + namestrcpy(&(att->attname), attributeName);

The rules against overlapping memcpy/strcpy's source and destination are
meant to cover the case of partial overlap; I find it hard to imagine an
implementation that will mess up when the source and destination are
identical. If we did think it was important to avoid this situation I
would rather find another way, like modifying the caller. Likewise
the other changes to avoid no-op memcpy's do not appear to me to be
bugs, though possibly they might save enough cycles to be worth doing
anyway.

> ! stats->attrtype = (Form_pg_type) palloc(sizeof(FormData_pg_type));
> ! memcpy(stats->attrtype, GETSTRUCT(typtuple), sizeof(FormData_pg_type));
> ...
> ! stats->attrtype = (Form_pg_type) palloc(TYPE_FIXED_PART_SIZE);
> ! memcpy(stats->attrtype, GETSTRUCT(typtuple), TYPE_FIXED_PART_SIZE);

I wonder whether we should instead fix this by copying the correct tuple
length.

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-03-13 09:09:49
Message-ID: 20110313090949.GA29508@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 12, 2011 at 04:08:23PM +0000, Greg Stark wrote:
> On Sat, Mar 12, 2011 at 1:32 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > A suitably-instrumented run of "make installcheck-world" under valgrind turned
> > up a handful of memory-related bugs:
>
>
> Nice work. How did you instrument things so valgrind knew about palloc
> et al? I remember trying this in the past and running into problems.

I peppered aset.c and mcxt.c with various calls to the valgrind hook macros. I
believe the set of hooks has grown recently (I use valgrind 3.6.0), so it may be
that the right facilities didn't exist at that time.

> I
> think the biggest one was that we write out structs to disk including
> padding so trigger lots of reads of uninitialized data warnings.

I used suppressions for call sites that write WAL or pgstat structures. Tuples
are, with limited exceptions, fully-initialized, so I did validate those.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-03-13 09:10:21
Message-ID: 20110313091021.GB29508@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > A suitably-instrumented run of "make installcheck-world" under valgrind turned
> > up a handful of memory-related bugs:
>
> Hmm, interesting work, but I don't think I believe in the necessity for
> this kluge:
>
> > + else if (attributeName != &(att->attname))
> > + namestrcpy(&(att->attname), attributeName);
>
> The rules against overlapping memcpy/strcpy's source and destination are
> meant to cover the case of partial overlap; I find it hard to imagine an
> implementation that will mess up when the source and destination are
> identical. If we did think it was important to avoid this situation I
> would rather find another way, like modifying the caller. Likewise
> the other changes to avoid no-op memcpy's do not appear to me to be
> bugs, though possibly they might save enough cycles to be worth doing
> anyway.

I also find it hard to imagine an implementation that needs these changes to
produce correct behavior. Avoiding undefined behavior has intrinsic value, but
perhaps we do get greater value from the existing code.

> > ! stats->attrtype = (Form_pg_type) palloc(sizeof(FormData_pg_type));
> > ! memcpy(stats->attrtype, GETSTRUCT(typtuple), sizeof(FormData_pg_type));
> > ...
> > ! stats->attrtype = (Form_pg_type) palloc(TYPE_FIXED_PART_SIZE);
> > ! memcpy(stats->attrtype, GETSTRUCT(typtuple), TYPE_FIXED_PART_SIZE);
>
> I wonder whether we should instead fix this by copying the correct tuple
> length.

Seems like a step in the wrong direction. We only use typlen and typbyval
beyond the immediate context.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-09-05 19:00:49
Message-ID: 201109051900.p85J0nJ06919@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch wrote:
> A suitably-instrumented run of "make installcheck-world" under valgrind turned
> up a handful of memory-related bugs:
>
> * memcpy()/strncpy() between overlapping regions
> uniqueentry() and dispell_lexize() both deduplicate an array by iteratively
> copying elements downward to occlude the duplicates. Before finding a first
> duplicate, these functions call memcpy() with identical arguments. Similarly,
> resolve_polymorphic_tupdesc() calls TupleDescInitEntry() with an attributeName
> pointing directly into the TupleDesc's name bytes, causing the latter to call
> strncpy() with identical arguments. The attached mem1v1-memcpy-overlap.patch
> fixes these sites by checking for equal pointers before the affected call. For
> TupleDescInitEntry(), I considered instead having resolve_polymorphic_tupdesc()
> pstrdup() the value.
>
> * read past the end of a Form_pg_type in examine_attribute()
> examine_attribute() copies a Form_pg_type naively. Since the nullable columns
> at the end of the structure are not present in memory, the memcpy() reads eight
> bytes past the end of the source allocation. mem2v1-analyze-overread.patch
> updates this code to match how we address the same issue for Form_pg_attribute.
>
> * off-by-one error in shift_jis_20042euc_jis_2004()
> This function grabs two bytes at a time, even when only one byte remains; this
> makes it read past the end of the input. mem3v1-sjis-offbyone.patch changes it
> to not do this and to report an error when the input ends in a byte that would
> start a two-byte sequence.

Did we conclude any of these were useful?

http://archives.postgresql.org/pgsql-hackers/2011-03/msg00856.php

I know there were concerns about some of them in the thread.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-09-05 19:32:01
Message-ID: 13128.1315251121@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Did we conclude any of these were useful?
> http://archives.postgresql.org/pgsql-hackers/2011-03/msg00856.php
> I know there were concerns about some of them in the thread.

Hmm, I guess this slipped through the cracks. I thought that avoiding
memcpy(x, x, n) was unnecessary, and I had doubts about the style of
some of the other changes, but I think we do need to avoid accessing
past the defined end of a data structure. We've seen cases in the past
where one day that structure is right up against the end of memory and
you get a SIGSEGV; there's no good reason to believe it cannot happen
in these places.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-09-06 19:00:42
Message-ID: 18540.1315335642@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ Sorry for letting this slip through the cracks ... I think I got
distracted by collation bugs :-( ]

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote:
>> Noah Misch <noah(at)leadboat(dot)com> writes:
>>> A suitably-instrumented run of "make installcheck-world" under valgrind turned
>>> up a handful of memory-related bugs:

>> Hmm, interesting work, but I don't think I believe in the necessity for
>> this kluge:
>>
> + else if (attributeName != &(att->attname))
> + namestrcpy(&(att->attname), attributeName);

I'm still of the opinion that there's no real need to avoid memcpy with
identical source and destination, so I didn't apply this first patch.

>> I wonder whether we should instead fix this by copying the correct tuple
>> length.

> Seems like a step in the wrong direction. We only use typlen and typbyval
> beyond the immediate context.

Well, the whole point of exposing the pg_type tuple is to avoid making
assumptions about what parts of it the type-specific analyze routine
will wish to look at. If anything we ought to move in the direction of
allowing the non-fixed fields to be accessible too. I didn't do that,
since it would imply an ABI change (to add a HeapTuple field to
VacAttrStats) and would therefore not be back-patchable. But I did
change the code to use SearchSysCacheCopy, which fixes this bug and
is readily extensible if we do decide to add such a field later.

I applied your third patch (for SJIS2004 conversion) as-is.

Thanks for the report and testing!

regards, tom lane


From: Noah Misch <noah(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-09-07 03:03:59
Message-ID: 20110907030358.GA16899@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 06, 2011 at 03:00:42PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote:
> >> I wonder whether we should instead fix this by copying the correct tuple
> >> length.
>
> > Seems like a step in the wrong direction. We only use typlen and typbyval
> > beyond the immediate context.
>
> Well, the whole point of exposing the pg_type tuple is to avoid making
> assumptions about what parts of it the type-specific analyze routine
> will wish to look at. If anything we ought to move in the direction of
> allowing the non-fixed fields to be accessible too. I didn't do that,
> since it would imply an ABI change (to add a HeapTuple field to
> VacAttrStats) and would therefore not be back-patchable. But I did
> change the code to use SearchSysCacheCopy, which fixes this bug and
> is readily extensible if we do decide to add such a field later.

That is a cleaner approach. Thanks.

--
Noah Misch http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-09-08 20:12:25
Message-ID: CAAZKuFZmTJXBMh36cP47Xp7+L6x2EYbuwGqqE8h4CL2Go5agvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 6, 2011 at 12:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> [ Sorry for letting this slip through the cracks ... I think I got
>  distracted by collation bugs :-( ]
>
> Noah Misch <noah(at)leadboat(dot)com> writes:
>> On Sat, Mar 12, 2011 at 12:44:29PM -0500, Tom Lane wrote:
>>> Noah Misch <noah(at)leadboat(dot)com> writes:
>>>> A suitably-instrumented run of "make installcheck-world" under valgrind turned
>>>> up a handful of memory-related bugs:
>
>>> Hmm, interesting work, but I don't think I believe in the necessity for
>>> this kluge:
>>>
>> +     else if (attributeName != &(att->attname))
>> +             namestrcpy(&(att->attname), attributeName);
>
> I'm still of the opinion that there's no real need to avoid memcpy with
> identical source and destination, so I didn't apply this first patch.

I am leery of memcpy with overlapping regions. I know that it can
cause an infinite loop on ssse3 architectures, having to do with some
backwards-iteration it does:

https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/609290

I have spotted this in the wild in PostgreSQL (which is how I happened
to produce this bug report link so readily), yet very rarely; I mailed
a more detailed report to the security list.

--
fdr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-09-08 20:56:09
Message-ID: 12299.1315515369@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Daniel Farina <daniel(at)heroku(dot)com> writes:
> On Tue, Sep 6, 2011 at 12:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm still of the opinion that there's no real need to avoid memcpy with
>> identical source and destination, so I didn't apply this first patch.

> I am leery of memcpy with overlapping regions. I know that it can
> cause an infinite loop on ssse3 architectures, having to do with some
> backwards-iteration it does:
> https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/609290

The linked page offers no support for your claim of an infinite loop,
and in any case it's discussing a case involving *overlapping* regions,
not *identical* regions.

The reason why this concern is irrelevant for identical regions is that
no matter what order the memcpy routine copies the bytes in, it's
necessarily storing the exact same data into each byte that was there
before. The only way that strange results could be produced is if the
routine transiently set some byte to a value other than its final value,
which would mean that it must be intending to store to that location
more than once, which would be silly and inefficient.

So I'm not going to believe that there's a problem here without a lot
more rigorous evidence than anyone has offered.

regards, tom lane


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: memory-related bugs
Date: 2011-09-09 20:38:47
Message-ID: CAAZKuFa5d97n7XdfVRWDLhB41AYQ+32cYSOov6HRFfG9jw4mXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 8, 2011 at 1:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Daniel Farina <daniel(at)heroku(dot)com> writes:
>> On Tue, Sep 6, 2011 at 12:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm still of the opinion that there's no real need to avoid memcpy with
>>> identical source and destination, so I didn't apply this first patch.
>
>> I am leery of memcpy with overlapping regions.  I know that it can
>> cause an infinite loop on ssse3 architectures, having to do with some
>> backwards-iteration it does:
>> https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/609290
>
> The linked page offers no support for your claim of an infinite loop,
> and in any case it's discussing a case involving *overlapping* regions,
> not *identical* regions.

What do you mean? As per my original bug report, or in this case
(possibly both; I should have dumped the registers, which I'll do if I
see it again...)? I'm able to believe that things are fine with all
known memcpys today in this case.

> The reason why this concern is irrelevant for identical regions is that
> no matter what order the memcpy routine copies the bytes in, it's
> necessarily storing the exact same data into each byte that was there
> before.  The only way that strange results could be produced is if the
> routine transiently set some byte to a value other than its final value,
> which would mean that it must be intending to store to that location
> more than once, which would be silly and inefficient.

This is a good point, I do understand there is a distinction between
the degenerate-case memcpy-to-identical region and the
overlapping-case.

In my naivety, I'd ask what the costs are of pedantic adherence to
this common guideline and, in event someone somewhere does something
that is not expected (or, has a slow-but-not-technically-buggy memcpy)
are broken, how likely the failure will be easy to pick out. But I
don't seriously expect an answer, because I don't think this code path
has been a problem for so many years and measuring those things are
pretty hard.

--
fdr