Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Richard Poole <richard(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
Date: 2013-09-16 13:13:57
Message-ID: 52370415.6060108@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.09.2013 13:15, Andres Freund wrote:
> On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote:
>> On 14.09.2013 02:41, Richard Poole wrote:
>>> The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
>>> on systems that support it. It's based on Christian Kruse's patch from
>>> last year, incorporating suggestions from Andres Freund.
>>
>> I don't understand the logic in figuring out the pagesize, and the smallest
>> supported hugepage size. First of all, even without the patch, why do we
>> round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel
>> will round up the request all by itself. The mmap() man page doesn't say
>> anything about length having to be a multiple of pages size.
>
> I think it does:
> EINVAL We don't like addr, length, or offset (e.g., they are too
> large, or not aligned on a page boundary).

That doesn't mean that they *all* have to be aligned on a page boundary.
It's understandable that 'addr' and 'offset' have to be, but it doesn't
make much sense for 'length'.

> and
> A file is mapped in multiples of the page size. For a file that is not a multiple
> of the page size, the remaining memory is zeroed when mapped, and writes to that
> region are not written out to the file. The effect of changing the size of the
> underlying file of a mapping on the pages that correspond to added or removed
> regions of the file is unspecified.
>
> And no, according to my past experience, the kernel does *not* do any
> such rounding up. It will just fail.

I wrote a little test program to play with different values (attached).
I tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and
on a VM with a fresh Centos 6.4 install with 2.6.32 kernel
(2.6.32-358.18.1.el6.x86_64), and they both work the same:

$ ./mmaptest 100 # mmap 100 bytes

in a different terminal:
$ cat /proc/meminfo | grep HugePages_Rsvd
HugePages_Rsvd: 1

So even a tiny allocation, much smaller than any page size, succeeds,
and it reserves a huge page. I tried the same with larger values; the
kernel always uses huge pages, and rounds up the allocation to a
multiple of the huge page size.

So, let's just get rid of the /sys scanning code.

Robert, do you remember why you put the "pagesize =
sysconf(_SC_PAGE_SIZE);" call in the new mmap() shared memory allocator?

- Heikki

Attachment Content-Type Size
mmaptest.c text/x-csrc 430 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-16 13:18:50 Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
Previous Message MauMau 2013-09-16 12:49:52 Re: UTF8 national character data type support WIP patch and list of open issues.