Re: making use of large TLB pages

Lists: pgsql-hackers
From: Neil Conway <neilc(at)samurai(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: making use of large TLB pages
Date: 2002-09-24 13:50:02
Message-ID: 87lm5r1oqd.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rohit Seth recently added support for the use of large TLB pages on
Linux if the processor architecture supports them (I believe the
SPARC, IA32, and IA64 have hugetlb support, more archs will probably
be added). The patch was merged into Linux 2.5.36, so it will more
than likely be in Linux 2.6. For more information on large TLB pages
and why they are generally viewed to improve database performance, see
here:

http://lwn.net/Articles/6535/ (the patch this refers to is an
earlier implementation, I believe, but the idea is the same)
http://lwn.net/Articles/10293/ (item #4)

I'd like to enable PostgreSQL to use large TLB pages, if the OS and
processor support them. In talking to the author of the TLB patches
for Linux (Rohit Seth), he described the current API:

======
1) Only two system calls. These are:

sys_alloc_hugepages(int key, unsigned long addr, unsigned long len,
int prot, int flag)

sys_free_hugepages(unsigned long addr)

Key will be equal to zero if user wants these huge pages as private.
A positive int value will be used for unrelated apps to share the same
physical huge pages.

addr is the user prefered address. The kernel may decide to allocate
a different virtual address (depending on availability and alignment
factors).

len is the requested size of memory wanted by user app.

prot could get the value of PROT_READ, PROT_WRITE, PROT_EXEC

flag: The only allowed value right now is IPC_CREAT, which in case of
shred hugepages (across processes) tells the kernel to create a new
segment if none is already created. If this flag is not provided and
there is no hugepage segment corresponding to the "key" then ENOENT is
returned. More like on the lines of IPC_CREAT flag for shmget
routine.

On success sys_alloc_hugepages returns the virtual address allocated
by kernel.
=====

So as I understand it, we would basically replace the calls to
shmget(), shmdt(), etc. with these system calls. The behavior will be
slightly different, however -- I'm not sure if this API supports
everything we expect the SysV IPC API to support (e.g. telling the #
of clients attached to a given segment). Can anyone comment on
exactly what functionality we expect when dealing with the storage
mechanism of the shared buffer?

Any comments would be appreciated.

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-25 04:49:34
Message-ID: 27631.1032929374@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> I'd like to enable PostgreSQL to use large TLB pages, if the OS and
> processor support them.

Hmm ... it seems interesting, but I'm hesitant to do a lot of work
to support something that's only available on one hardware-and-OS
combination. (If we were talking about a Windows-specific hack,
you'd already have lost the audience, no? But I digress.)

> So as I understand it, we would basically replace the calls to
> shmget(), shmdt(), etc. with these system calls. The behavior will be
> slightly different, however -- I'm not sure if this API supports
> everything we expect the SysV IPC API to support (e.g. telling the #
> of clients attached to a given segment).

I trust it at least supports inheriting the page mapping over a fork()?

> Can anyone comment on
> exactly what functionality we expect when dealing with the storage
> mechanism of the shared buffer?

The only thing we use beyond the obvious "here's some memory accessible
by both parent and child processes" is the #-of-clients functionality
you mentioned. The reason that that is interesting is it provides a
safety interlock against the case where a postmaster has crashed but
left child backends running. If a new postmaster is started and starts
its own collection of children then we are in very bad hot water,
because the old and new backend sets will be modifying the same database
files without any mutual awareness or interlocks. This *will* lead to
serious, possibly unrecoverable database corruption.

The SysV API provides a reliable interlock to prevent this scenario:
we read the old shared memory block ID from the old postmaster's
postmaster.pid file, and look to see if that block (a) still exists
and (b) still has attached processes (presumably backends). If it's
gone or has no attached processes, it's safe for the new postmaster
to continue startup.

I have little love for the SysV shmem API, but I haven't thought of
an equivalently reliable interlock for this scenario without it.
(For example, something along the lines of requiring each backend
to write its PID into a file isn't very reliable at all: it leaves
a window at each backend start where the backend hasn't yet written
its PID, and it increases by a large factor the risk we've already
seen wherein stale PID entries in lockfiles might by chance match the
PIDs of other, unrelated processes.)

Any ideas for better answers?

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-25 17:07:19
Message-ID: 87vg4uxak8.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > I'd like to enable PostgreSQL to use large TLB pages, if the OS
> > and processor support them.
>
> Hmm ... it seems interesting, but I'm hesitant to do a lot of work
> to support something that's only available on one hardware-and-OS
> combination.

True; further, I personally find the current API a little
cumbersome. For example, we get 4MB pages on Solaris with a few lines
of code:

#if defined(solaris) && defined(__sparc__) /* use intimate shared
memory on SPARC Solaris */ memAddress = shmat(shmid, 0,
SHM_SHARE_MMU);

But given that

(a) Linux on x86 is probably our most popular platform

(b) Every x86 since the Pentium has supported large pages

(c) Other archs, like IA64 and SPARC, also support large pages

I think it's worthwhile implementing this, if possible.

> I trust it at least supports inheriting the page mapping over a
> fork()?

I'll check on this, but I'm pretty sure that it does.

> The SysV API provides a reliable interlock to prevent this scenario:
> we read the old shared memory block ID from the old postmaster's
> postmaster.pid file, and look to see if that block (a) still exists
> and (b) still has attached processes (presumably backends).

If the postmaster is starting up and the segment still exists, could
we assume that's an error condition, and force the admin to manually
fix it? It does make the system less robust, but I'm suspicious of any
attempts to automagically fix a situation in which we *know* something
has gone seriously wrong...

Another possibility might be to still allocate a small SysV shmem
area, and use that to provide the interlock, while we allocate the
buffer area using sys_alloc_hugepages. That's somewhat of a hack, but
I think it would resolve the interlock problem, at least.

> Any ideas for better answers?

Still scratching my head on this one, and I'll let you know if I think
of anything better.

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-25 17:30:19
Message-ID: 11674.1032975019@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> I think it's worthwhile implementing this, if possible.

I wasn't objecting (I work for Red Hat, remember ;-)). I was just
saying there's a limit to the messiness I think we should accept.

>> The SysV API provides a reliable interlock to prevent this scenario:
>> we read the old shared memory block ID from the old postmaster's
>> postmaster.pid file, and look to see if that block (a) still exists
>> and (b) still has attached processes (presumably backends).

> If the postmaster is starting up and the segment still exists, could
> we assume that's an error condition, and force the admin to manually
> fix it?

It wasn't clear from your description whether large-TLB shmem segments
even have IDs that one could use to determine whether "the segment still
exists". If the segments are anonymous then how do you do that?

> It does make the system less robust, but I'm suspicious of any
> attempts to automagically fix a situation in which we *know* something
> has gone seriously wrong...

We've spent a lot of effort on trying to ensure that we (a) start up
when it's safe and (b) refuse to start up when it's not safe. While (b)
is clearly the more critical point, backsliding on (a) isn't real nice
either. People don't like postmasters that randomly fail to start.

> Another possibility might be to still allocate a small SysV shmem
> area, and use that to provide the interlock, while we allocate the
> buffer area using sys_alloc_hugepages. That's somewhat of a hack, but
> I think it would resolve the interlock problem, at least.

Not a bad idea ... I have not got a better one offhand ... but watch
out for SHMMIN settings.

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-28 05:30:45
Message-ID: 87ptuywuii.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Okay, I did some more research into this area. It looks like it will
be feasible to use large TLB pages for PostgreSQL.

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> It wasn't clear from your description whether large-TLB shmem segments
> even have IDs that one could use to determine whether "the segment still
> exists".

There are two types of hugepages:

(a) private: Not shared on fork(), not accessible to processes
other than the one that allocates the pages.

(b) shared: Shared across a fork(), accessible to other
processes: different processes can access the same segment
if they call sys_alloc_hugepages() with the same key.

So for a standalone backend, we can just use private pages (probably
worth using private hugepages rather than malloc, although I doubt it
matters much either way).

> > Another possibility might be to still allocate a small SysV shmem
> > area, and use that to provide the interlock, while we allocate the
> > buffer area using sys_alloc_hugepages. That's somewhat of a hack, but
> > I think it would resolve the interlock problem, at least.
>
> Not a bad idea ... I have not got a better one offhand ... but watch
> out for SHMMIN settings.

As it turns out, this will be completely unnecessary. Since hugepages
are an in-kernel data structure, the kernel takes care of ensuring
that dieing processes don't orphan any unused hugepage segments. The
logic works like this: (for shared hugepages)

(a) sys_alloc_hugepages() without IPC_EXCL will return a
pointer to an existing segment, if there is one that
matches the key. If an existing segment is found, the
usage counter for that segment is incremented. If no
matching segment exists, an error is returned. (I'm pretty
sure the usage counter is also incremented after a fork(),
but I'll double-check that.)

(b) sys_free_hugepages() decrements the usage counter

(c) when a process that has allocated a shared hugepage dies
for *any reason* (even kill -9), the usage counter is
decremented

(d) if the usage counter for a given segment ever reaches
zero, the segment is deleted and the memory is free'd.

If we used a key that would remain the same between runs of the
postmaster, this should ensure that there isn't a possibility of two
independant sets of backends operating on the same data dir. The most
logical way to do this IMHO would be to just hash the data dir, but I
suppose the current method of using the port number should work as
well.

To elaborate on (a) a bit, we'd want to use this logic when allocating
a new set of hugepages on postmaster startup:

(1) call sys_alloc_hugepages() without IPC_EXCL. If it returns
an error, we're in the clear: there's no page matching
that key. If it returns a pointer to a previously existing
segment, panic: it is very likely that there are some
orphaned backends still active.

(2) If the previous call didn't find anything, call
sys_alloc_hugepages() again, specifying IPC_EXCL to create
a new segment.

Now, the question is: how should this be implemented? You recently
did some of the legwork toward supporting different APIs for shared
memory / semaphores, which makes this work easier -- unfortunately,
some additional stuff is still needed. Specifically, support for
hugepages is a configuration option, that may or may not be enabled
(if it's disabled, the syscall returns a specific error). So I believe
the logic is something like:

- if compiling on a Linux system, enable support for hugepages
(the regular SysV stuff is still needed as a backup)

- if we're compiling on a Linux system but the kernel headers
don't define the syscalls we need, use some reasonable
defaults (e.g. the syscall numbers for the current hugepage
syscalls in Linux 2.5)

- at runtime, try to make one of these syscalls. If it fails,
fall back to the SysV stuff.

Does that sound reasonable?

Any other comments would be appreciated.

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-28 15:05:45
Message-ID: 12491.1033225545@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> If we used a key that would remain the same between runs of the
> postmaster, this should ensure that there isn't a possibility of two
> independant sets of backends operating on the same data dir. The most
> logical way to do this IMHO would be to just hash the data dir, but I
> suppose the current method of using the port number should work as
> well.

You should stick as closely as possible to the key logic currently used
for SysV shmem keys. That logic is intended to cope with the case where
someone else is already using the key# that we initially generate, as
well as the case where we discover a collision with a pre-existing
backend set. (We tell the difference by looking for a magic number at
the start of the shmem segment.)

Note that we do not assume the key is the same on each run; that's why
we store it in postmaster.pid.

> (1) call sys_alloc_hugepages() without IPC_EXCL. If it returns
> an error, we're in the clear: there's no page matching
> that key. If it returns a pointer to a previously existing
> segment, panic: it is very likely that there are some
> orphaned backends still active.

s/panic/and the PG magic number appears in the segment header, panic/

> - if we're compiling on a Linux system but the kernel headers
> don't define the syscalls we need, use some reasonable
> defaults (e.g. the syscall numbers for the current hugepage
> syscalls in Linux 2.5)

I think this is overkill, and quite possibly dangerous. If we don't see
the symbols then don't try to compile the code.

On the whole it seems that this allows a very nearly one-to-one mapping
to the existing SysV functionality. We don't have the "number of
connected processes" syscall, perhaps, but we don't need it: if a
hugepages segment exists we can assume the number of connected processes
is greater than 0, and that's all we really need to know.

I think it's okay to stuff this support into the existing
port/sysv_shmem.c file, rather than make a separate file (particularly
given your point that we have to be able to fall back to SysV calls at
runtime). I'd suggest reorganizing the code in that file slightly to
separate the actual syscalls from the controlling logic in
PGSharedMemoryCreate(). Probably also will have to extend the API for
PGSharedMemoryIsInUse() and RecordSharedMemoryInLockFile() to allow
three fields to be recorded in postmaster.pid, not two --- you'll want
a boolean indicating whether the stored key is for a SysV or hugepage
segment.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-29 05:39:35
Message-ID: 200209290539.g8T5dZC04612@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I haven't been following this thread. Can someone answer:

Is TLB Linux-only?
Why use it and non SysV memory?
Is it a lot of code?

---------------------------------------------------------------------------

Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > If we used a key that would remain the same between runs of the
> > postmaster, this should ensure that there isn't a possibility of two
> > independant sets of backends operating on the same data dir. The most
> > logical way to do this IMHO would be to just hash the data dir, but I
> > suppose the current method of using the port number should work as
> > well.
>
> You should stick as closely as possible to the key logic currently used
> for SysV shmem keys. That logic is intended to cope with the case where
> someone else is already using the key# that we initially generate, as
> well as the case where we discover a collision with a pre-existing
> backend set. (We tell the difference by looking for a magic number at
> the start of the shmem segment.)
>
> Note that we do not assume the key is the same on each run; that's why
> we store it in postmaster.pid.
>
> > (1) call sys_alloc_hugepages() without IPC_EXCL. If it returns
> > an error, we're in the clear: there's no page matching
> > that key. If it returns a pointer to a previously existing
> > segment, panic: it is very likely that there are some
> > orphaned backends still active.
>
> s/panic/and the PG magic number appears in the segment header, panic/
>
> > - if we're compiling on a Linux system but the kernel headers
> > don't define the syscalls we need, use some reasonable
> > defaults (e.g. the syscall numbers for the current hugepage
> > syscalls in Linux 2.5)
>
> I think this is overkill, and quite possibly dangerous. If we don't see
> the symbols then don't try to compile the code.
>
> On the whole it seems that this allows a very nearly one-to-one mapping
> to the existing SysV functionality. We don't have the "number of
> connected processes" syscall, perhaps, but we don't need it: if a
> hugepages segment exists we can assume the number of connected processes
> is greater than 0, and that's all we really need to know.
>
> I think it's okay to stuff this support into the existing
> port/sysv_shmem.c file, rather than make a separate file (particularly
> given your point that we have to be able to fall back to SysV calls at
> runtime). I'd suggest reorganizing the code in that file slightly to
> separate the actual syscalls from the controlling logic in
> PGSharedMemoryCreate(). Probably also will have to extend the API for
> PGSharedMemoryIsInUse() and RecordSharedMemoryInLockFile() to allow
> three fields to be recorded in postmaster.pid, not two --- you'll want
> a boolean indicating whether the stored key is for a SysV or hugepage
> segment.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Neil Conway <neilc(at)samurai(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-29 06:04:40
Message-ID: 87bs6hwcuf.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Is TLB Linux-only?

Well, the "TLB" is a feature of the CPU, so no. Many modern processors
support large TLB pages in some fashion.

However, the specific API for using large TLB pages differs between
operating systems. The API I'm planning to implement is the one
provided by recent versions of Linux (2.5.38+).

I've only looked briefly at enabling the usage of large pages on other
operating systems. On Solaris, we already use large pages (due to
using Intimate Shared Memory). On HPUX, you apparently need call
chattr on the executable for it to use large pages. AFAIK the BSDs
don't support large pages for user-land apps -- if I'm incorrect, let
me know.

> Why use it and non SysV memory?

It's faster, at least in theory. I posted these links at the start of
the thread:

http://lwn.net/Articles/6535/
http://lwn.net/Articles/10293/

> Is it a lot of code?

I haven't implemented it yet, so I'm not sure. However, I don't think
it will be a lot of code.

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-29 13:38:27
Message-ID: 200209291338.g8TDcRE06330@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Is TLB Linux-only?
>
> Well, the "TLB" is a feature of the CPU, so no. Many modern processors
> support large TLB pages in some fashion.
>
> However, the specific API for using large TLB pages differs between
> operating systems. The API I'm planning to implement is the one
> provided by recent versions of Linux (2.5.38+).
>
> I've only looked briefly at enabling the usage of large pages on other
> operating systems. On Solaris, we already use large pages (due to
> using Intimate Shared Memory). On HPUX, you apparently need call
> chattr on the executable for it to use large pages. AFAIK the BSDs
> don't support large pages for user-land apps -- if I'm incorrect, let
> me know.
>
> > Why use it and non SysV memory?
>
> It's faster, at least in theory. I posted these links at the start of
> the thread:
>
> http://lwn.net/Articles/6535/
> http://lwn.net/Articles/10293/
>
> > Is it a lot of code?
>
> I haven't implemented it yet, so I'm not sure. However, I don't think
> it will be a lot of code.

OK, personally, I would like to see an actual speedup of PostgreSQL
queries before I would apply such a OS-specific, version-specific patch.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Neil Conway <neilc(at)samurai(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-29 15:21:22
Message-ID: 87znu0vn2l.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> OK, personally, I would like to see an actual speedup of PostgreSQL
> queries before I would apply such a OS-specific, version-specific
> patch.

Don't be silly. A performance improvement is a performance
improvement. According to your logic, using assembly-optimized locking
primitives shouldn't be done unless we've exhausted every possible
optimization in every other part of the system (a process which will
likely never be finished).

If the optimization was for some obscure UNIX variant and/or an
obscure processor, I would agree that it wouldn't be worth the
bother. But given that

(a) Linux on IA32 is likely our most popular platform [1]

(b) In theory, this will help performance where we need it
most, IMHO (high-end systems using large shared buffers)

I think it's at least worth implementing -- if it doesn't provide a
noticeable performance improvement, then we don't need to merge it.

Cheers,

Neil

[1] It's worth noting that the huge tlb patch currently works in IA64,
SPARC, and may well be ported to additional architectures in the
future.

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-29 15:36:23
Message-ID: 11074.1033313783@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>> OK, personally, I would like to see an actual speedup of PostgreSQL
>> queries before I would apply such a OS-specific, version-specific
>> patch.

> Don't be silly. A performance improvement is a performance
> improvement.

No, Bruce was saying that he wanted to see demonstrable improvement
*due to this specific change* before committing to support a
platform-specific API. I agree with him, actually. If you do the
TLB code and can't measure any meaningful performance improvement
when using it vs. when not, I'd not be excited about cluttering the
distribution with it.

> I think it's at least worth implementing -- if it doesn't provide a
> noticeable performance improvement, then we don't need to merge it.

You're on the same page, you just don't realize it...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-29 21:30:12
Message-ID: 200209292130.g8TLUCH19997@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> >> OK, personally, I would like to see an actual speedup of PostgreSQL
> >> queries before I would apply such a OS-specific, version-specific
> >> patch.
>
> > Don't be silly. A performance improvement is a performance
> > improvement.
>
> No, Bruce was saying that he wanted to see demonstrable improvement
> *due to this specific change* before committing to support a
> platform-specific API. I agree with him, actually. If you do the
> TLB code and can't measure any meaningful performance improvement
> when using it vs. when not, I'd not be excited about cluttering the
> distribution with it.
>
> > I think it's at least worth implementing -- if it doesn't provide a
> > noticeable performance improvement, then we don't need to merge it.
>
> You're on the same page, you just don't realize it...

I see what he thought I said, I just can't figure out how he read it
that way.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Jonah H(dot) Harris" <jharris(at)nightstarcorporation(dot)com>
To: "'Neil Conway'" <neilc(at)samurai(dot)com>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'PostgreSQL Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-30 02:49:12
Message-ID: 001901c2682b$f5b7efb0$b77b2344@gemini
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil,

I agree with Bruce and Tom. AFAIK and in my experience I don't think it
will be a significantly measurable increase. Not only that, but the
portability issue itself tends to make it less desireable. I recently
ported SAP DB and the coinciding DevTools over to OpenBSD and learned again
first-hand what a pain in the ass having platform-specific code is. I guess
it's up to you, Neil. If you want to spend the time trying to implement it,
and it does prove to have a significant performance increase I'd say maybe.
IMHO, I just think that time could be better spent improving the current
system rather than trying to add to it in a singular way. Sorry if my
comments are out-of-line on this one but it has been a thread for some time
I'm just kinda tired of reading theory vs proof.

Since you are so set on trying to implement this, I'm just wondering what
documentation has tested evidence of measurable increases in similar
situations? I just like arguments to be backed by proof... and I'm sure
there is documentation on this somewhere.

-Jonah

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Bruce Momjian
Sent: Sunday, September 29, 2002 3:30 PM
To: Tom Lane
Cc: Neil Conway; PostgreSQL Hackers
Subject: Re: [HACKERS] making use of large TLB pages

Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> >> OK, personally, I would like to see an actual speedup of PostgreSQL
> >> queries before I would apply such a OS-specific, version-specific
> >> patch.
>
> > Don't be silly. A performance improvement is a performance
> > improvement.
>
> No, Bruce was saying that he wanted to see demonstrable improvement
> *due to this specific change* before committing to support a
> platform-specific API. I agree with him, actually. If you do the
> TLB code and can't measure any meaningful performance improvement
> when using it vs. when not, I'd not be excited about cluttering the
> distribution with it.
>
> > I think it's at least worth implementing -- if it doesn't provide a
> > noticeable performance improvement, then we don't need to merge it.
>
> You're on the same page, you just don't realize it...

I see what he thought I said, I just can't figure out how he read it
that way.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
message can get through to the mailing list cleanly


From: Neil Conway <neilc(at)samurai(dot)com>
To: "Jonah H(dot) Harris" <jharris(at)nightstarcorporation(dot)com>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'PostgreSQL Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making use of large TLB pages
Date: 2002-09-30 03:03:52
Message-ID: 87lm5kgovb.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jonah H. Harris" <jharris(at)nightstarcorporation(dot)com> writes:
> I agree with Bruce and Tom.

AFAIK Bruce and Tom (and myself) agree that this is a good idea,
provided it makes a noticeable performance difference (and if it
doesn't, it's not worth applying).

> AFAIK and in my experience I don't think it will be a significantly
> measurable increase.

Can you elaborate on this experience?

> Not only that, but the portability issue itself tends to make it
> less desireable.

Well, that's obvious: code that improves PostgreSQL on *all* platforms
is clearly superior to code that only improves it on a couple. That's
not to say that the latter code is absolutely without merit, however.

> Sorry if my comments are out-of-line on this one but it has been a
> thread for some time I'm just kinda tired of reading theory vs
> proof.

Well, ISTM the easiest way to get some "proof" is to implement it and
benchmark the results. IMHO any claims about performance prior to that
are mostly hand waving.

> Since you are so set on trying to implement this, I'm just wondering
> what documentation has tested evidence of measurable increases in
> similar situations?

(/me wonders if people bother reading the threads they reply to)

http://lwn.net/Articles/10293/

According to the HP guys, Oracle saw an 8% performance improvement in
TPC-C when they started using large pages.

To be perfectly honest, I really have no idea if that will translate
into an 8% performance gain for PostgreSQL, or whether the performance
gain only applies if you're using a machine with 16GB of RAM, or
whether the speedup from large pages is really just a correction of
some Oracle deficiency that we don't suffer from, etc. However, I do
think it's worth finding out.

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC