Quick Links

Re: update i386 spinlock for hyperthreading

Lists:	pgsql-generalpgsql-hackerspgsql-patches

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	update i386 spinlock for hyperthreading
Date:	2003-12-26 22:26:48
Message-ID:	3FECB5A8.50708@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Hi,

Intel recommends to add a special pause instruction into spinlock busy
loops. It's necessary for hyperthreading - without it, the cpu can't
figure out that a logical thread does no useful work and incorrectly
awards lots of execution resources to that thread. Additionally, it's
supposed to reduce the time the cpu needs to recover from the
(mispredicted) branch after the spinlock was obtained.
The attached patch adds a new platform hook and implements it for i386.
The new instruction is backward compatible, thus no cpu detection is
necessary.
Additionally I've increased the number of loops from 100 to 1000 - a 3
GHz Pentium 4 might execute 100 loops faster than a single bus
transaction. I don't know if this change is appropriate for all
platforms, or if SPINS_PER_DELAY should be made platform specific.

Mark did a test run with his dbt-2 benchmark on a 4-way Xeon with HT
enabled, and the patch resulted in a 10% performance increase:
Before:
http://developer.osdl.org/markw/dbt2-pgsql/284/
After:
http://developer.osdl.org/markw/dbt2-pgsql/300/

--
Manfred

Attachment	Content-Type	Size
patch-spinlock-i386	text/plain	2.7 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-26 23:22:11
Message-ID:	8093.1072480931@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
> Intel recommends to add a special pause instruction into spinlock busy
> loops. It's necessary for hyperthreading - without it, the cpu can't
> figure out that a logical thread does no useful work and incorrectly
> awards lots of execution resources to that thread. Additionally, it's
> supposed to reduce the time the cpu needs to recover from the
> (mispredicted) branch after the spinlock was obtained.

Don't you have to put it in a specific place in the loop to make that
work? If not, why not? I doubt that rep;nop is magic enough to
recognize the loop that will be generated from s_lock()'s code.

My guess is that it'd be more useful to insert the rep;nop into the
failure branch of the TAS macro and forget about the separate CPU_DELAY
construct. This would allow you to control where exactly rep;nop
appears relative to the xchgb.

> Additionally I've increased the number of loops from 100 to 1000

I think this change is almost certainly counterproductive; for any
platform other than the Xeon, remove "almost".

> + #ifndef HAS_CPU_DELAY
> + #define CPU_DELAY() cpu_delay()
> +
> + static __inline__ void
> + cpu_delay(void)
> + {
> + }
> + #endif

This breaks every non-gcc compiler in the world (or at least all those
that don't recognize __inline__). If you really want to keep CPU_DELAY,
consider

#ifndef CPU_DELAY
#define CPU_DELAY()
#endif

but as stated above, I'm dubious that the bottom of the s_lock loop
is the place to be adding anything anyway.

regards, tom lane

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-27 00:23:31
Message-ID:	3FECD103.5040105@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:

>Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
>
>
>>Intel recommends to add a special pause instruction into spinlock busy
>>loops. It's necessary for hyperthreading - without it, the cpu can't
>>figure out that a logical thread does no useful work and incorrectly
>>awards lots of execution resources to that thread. Additionally, it's
>>supposed to reduce the time the cpu needs to recover from the
>>(mispredicted) branch after the spinlock was obtained.
>>
>>
>
>Don't you have to put it in a specific place in the loop to make that
>work? If not, why not? I doubt that rep;nop is magic enough to
>recognize the loop that will be generated from s_lock()'s code.
>
>
Rep;nop is just a short delay - that's all. It means that the cpu
pipelines have a chance to drain, and that the other thread gets enough
cpu resources. Below is the full instruction documentation, from the
latest ia32 doc set from Intel:
<<<
Improves the performance of spin-wait loops. When executing a spin-wait
loop, a Pentium 4 or Intel Xeon processor suffers a severe performance
penalty when exiting the loop because it detects a possible memory order
violation. The PAUSE instruction provides a hint to the processor that
the code sequence is a spin-wait loop. The processor uses this hint to
avoid the memory order violation in most situations, which greatly
improves processor performance. For this reason, it is recommended that
a PAUSE instruction be placed in all spin-wait loops. An additional
function of the PAUSE instruction is to reduce the power consumed by a
Pentium 4 processor while executing a spin loop. The Pentium 4 processor
can execute a spin-wait loop extremely quickly, causing the processor to
consume a lot of power while it waits for the resource it is spinning on
to become available. Inserting a pause instruction in a spin-wait loop
greatly reduces the processor s power consumption. This instruction was
introduced in the Pentium 4 processors, but is backward compatible with
all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction
operates like a NOP instruction. The Pentium 4 and Intel Xeon processors
implement the PAUSE instruction as a pre-defined delay. The delay is
finite and can be zero for some processors. This instruction does not
change the architectural state of the processor (that is, it performs
essentially a delaying noop operation).
<<<

I think a separate function is better than adding it into TAS: if it's
part of tas, then it would automatically be included by every
SpinLockAcquire call - unnecessary .text bloat. Additionally, there
might be other busy loops, in addition to TAS, that could use a delay
function.

I'll post a new patch that doesn't rely on __inline__ in the i386
independant part.

--
Manfred

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-27 00:46:58
Message-ID:	8576.1072486018@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
> Tom Lane wrote:
>> Don't you have to put it in a specific place in the loop to make that
>> work? If not, why not?
>>
> Rep;nop is just a short delay - that's all.

That view seems to me to be directly contradicted by this statement:

> The PAUSE instruction provides a hint to the processor that
> the code sequence is a spin-wait loop. The processor uses this hint to
> avoid the memory order violation in most situations, which greatly
> improves processor performance.

It's not apparent to me how a short delay translates into avoiding a
memory order violation (possibly some docs on what that means exactly
might help...). I suspect strongly that there needs to be some near
proximity between the PAUSE instruction and the lock-test instruction
for this to work as advertised. It would help if Intel were less coy
about what the instruction really does.

> This instruction does not change the architectural state of the
> processor (that is, it performs essentially a delaying noop
> operation).

This can be rephrased as "we're not telling you what this instruction
really does, because its interesting effects are below the level of the
instruction set architecture". Great. How are we supposed to know
how to use it?

> I think a separate function is better than adding it into TAS: if it's
> part of tas, then it would automatically be included by every
> SpinLockAcquire call - unnecessary .text bloat.

Why do you think it's unnecessary? One thing that I find particularly
vague in the quoted documentation is the statement that the PAUSE
instruction is needed to avoid a delay when *exiting* the spin-wait
loop. Doesn't this mean that a PAUSE is needed in the success path
when the first TAS succeeds (i.e, the normal no-contention path)?
If not, why not? If so, does it go before or after the lock
instruction?

Also, if the principal effect is a "short delay", do we really need it
at all considering that our inner loop in s_lock is rather more than
an "xchgb" followed by a conditional branch? There will be time for
the write queue to drain while we're incrementing and testing our
spin counter (which I trust is in a register...).

The reason I'm so full of questions is that I spent some time several
days ago looking at exactly this issue, and came away with only the
conclusion that I had to find some more-detailed documentation before
I could figure out what we should do about the spinlocks for Xeons.
You have not convinced me that you know more about the issue than I do.
A "10% speedup" is nice, but how do we know that that's what we should
expect to get? Maybe there's a lot more to be won by doing it correctly
(for some value of "correctly").

regards, tom lane

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-27 10:34:16
Message-ID:	3FED6028.7020300@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:

>Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
>
>
>>Tom Lane wrote:
>>
>>
>>>Don't you have to put it in a specific place in the loop to make that
>>>work? If not, why not?
>>>
>>>
>>>
>>Rep;nop is just a short delay - that's all.
>>
>>
>
>That view seems to me to be directly contradicted by this statement:
>
>
>
>>The PAUSE instruction provides a hint to the processor that
>>the code sequence is a spin-wait loop. The processor uses this hint to
>>avoid the memory order violation in most situations, which greatly
>>improves processor performance.
>>
>>
>
>It's not apparent to me how a short delay translates into avoiding a
>memory order violation (possibly some docs on what that means exactly
>might help...). I suspect strongly that there needs to be some near
>proximity between the PAUSE instruction and the lock-test instruction
>for this to work as advertised. It would help if Intel were less coy
>about what the instruction really does.
>
>
My guess: Pentium 4 cpu support something like 250 uops in flight - it
will have a dozend of the spinlock loops in it's pipeline. When the
spinlock is released, it must figure out which of the loops should get
it, and gets lost. My guess is that rep;nop delays the cpu buy at least
100 cpu ticks, and thus the pipeline will be empty before it proceeds. I
don't have a Pentium 4, and the HP testdrive is down. Someone around who
could run my test app?

>
>
>>This instruction does not change the architectural state of the
>>processor (that is, it performs essentially a delaying noop
>>operation).
>>
>>
>
>This can be rephrased as "we're not telling you what this instruction
>really does, because its interesting effects are below the level of the
>instruction set architecture". Great. How are we supposed to know
>how to use it?
>
>
There was a w_spinlock.pdf document with reference code. google still
finds it, but the links are dead :-(

>>I think a separate function is better than adding it into TAS: if it's
>>part of tas, then it would automatically be included by every
>>SpinLockAcquire call - unnecessary .text bloat.
>>
>>
>
>Why do you think it's unnecessary? One thing that I find particularly
>vague in the quoted documentation is the statement that the PAUSE
>instruction is needed to avoid a delay when *exiting* the spin-wait
>loop. Doesn't this mean that a PAUSE is needed in the success path
>when the first TAS succeeds (i.e, the normal no-contention path)?
>
IIRC: No.

>If not, why not? If so, does it go before or after the lock
>instruction?
>
>
Neither: somewhere in the failure path.

>Also, if the principal effect is a "short delay", do we really need it
>at all considering that our inner loop in s_lock is rather more than
>an "xchgb" followed by a conditional branch? There will be time for
>the write queue to drain while we're incrementing and testing our
>spin counter (which I trust is in a register...).
>
>The reason I'm so full of questions is that I spent some time several
>days ago looking at exactly this issue, and came away with only the
>conclusion that I had to find some more-detailed documentation before
>I could figure out what we should do about the spinlocks for Xeons.
>
I'll try to find some more docs and post links.

The 2nd thing I would change is to add a nonatomic test in the slow
path: locked instructions generate lots of bus traffic, and that's a
waste of resources.

Another question: regardless of the placement of rep;nop - 10% speedup
means that the postgres spends far too much time in the spinlock code.
I've looked at the oprofile dumps, and something like 1.2% of the total
cpu time is spent it the TAS macro in LWLockAcquire. That's the hottest
instruction in the whole profile, it eats more cpu cycles than the
memcpy() calls that transfer data to/from kernel.
Is there an easy way find out which LWLock is contended?
--
Manfred

Attachment	Content-Type	Size
rep_nop.cpp	text/x-c++src	3.6 KB
patch-spinlock-i386	text/plain	2.4 KB

From:	ohp(at)pyrenet(dot)fr
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-27 10:46:01
Message-ID:	Pine.UW2.4.53.0312271145050.13478@server.pyrenet.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

I have a bi-Xeon 2.6G hyperthreaded if it helps... feel free

Regards
On Sat, 27 Dec 2003, Manfred Spraul wrote:

> Date: Sat, 27 Dec 2003 11:34:16 +0100
> From: Manfred Spraul <manfred(at)colorfullife(dot)com>
> To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
> Subject: Re: [PATCHES] update i386 spinlock for hyperthreading
>
> Tom Lane wrote:
>
> >Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
> >
> >
> >>Tom Lane wrote:
> >>
> >>
> >>>Don't you have to put it in a specific place in the loop to make that
> >>>work? If not, why not?
> >>>
> >>>
> >>>
> >>Rep;nop is just a short delay - that's all.
> >>
> >>
> >
> >That view seems to me to be directly contradicted by this statement:
> >
> >
> >
> >>The PAUSE instruction provides a hint to the processor that
> >>the code sequence is a spin-wait loop. The processor uses this hint to
> >>avoid the memory order violation in most situations, which greatly
> >>improves processor performance.
> >>
> >>
> >
> >It's not apparent to me how a short delay translates into avoiding a
> >memory order violation (possibly some docs on what that means exactly
> >might help...). I suspect strongly that there needs to be some near
> >proximity between the PAUSE instruction and the lock-test instruction
> >for this to work as advertised. It would help if Intel were less coy
> >about what the instruction really does.
> >
> >
> My guess: Pentium 4 cpu support something like 250 uops in flight - it
> will have a dozend of the spinlock loops in it's pipeline. When the
> spinlock is released, it must figure out which of the loops should get
> it, and gets lost. My guess is that rep;nop delays the cpu buy at least
> 100 cpu ticks, and thus the pipeline will be empty before it proceeds. I
> don't have a Pentium 4, and the HP testdrive is down. Someone around who
> could run my test app?
>
> >
> >
> >>This instruction does not change the architectural state of the
> >>processor (that is, it performs essentially a delaying noop
> >>operation).
> >>
> >>
> >
> >This can be rephrased as "we're not telling you what this instruction
> >really does, because its interesting effects are below the level of the
> >instruction set architecture". Great. How are we supposed to know
> >how to use it?
> >
> >
> There was a w_spinlock.pdf document with reference code. google still
> finds it, but the links are dead :-(
>
> >>I think a separate function is better than adding it into TAS: if it's
> >>part of tas, then it would automatically be included by every
> >>SpinLockAcquire call - unnecessary .text bloat.
> >>
> >>
> >
> >Why do you think it's unnecessary? One thing that I find particularly
> >vague in the quoted documentation is the statement that the PAUSE
> >instruction is needed to avoid a delay when *exiting* the spin-wait
> >loop. Doesn't this mean that a PAUSE is needed in the success path
> >when the first TAS succeeds (i.e, the normal no-contention path)?
> >
> IIRC: No.
>
> >If not, why not? If so, does it go before or after the lock
> >instruction?
> >
> >
> Neither: somewhere in the failure path.
>
> >Also, if the principal effect is a "short delay", do we really need it
> >at all considering that our inner loop in s_lock is rather more than
> >an "xchgb" followed by a conditional branch? There will be time for
> >the write queue to drain while we're incrementing and testing our
> >spin counter (which I trust is in a register...).
> >
> >The reason I'm so full of questions is that I spent some time several
> >days ago looking at exactly this issue, and came away with only the
> >conclusion that I had to find some more-detailed documentation before
> >I could figure out what we should do about the spinlocks for Xeons.
> >
> I'll try to find some more docs and post links.
>
> The 2nd thing I would change is to add a nonatomic test in the slow
> path: locked instructions generate lots of bus traffic, and that's a
> waste of resources.
>
> Another question: regardless of the placement of rep;nop - 10% speedup
> means that the postgres spends far too much time in the spinlock code.
> I've looked at the oprofile dumps, and something like 1.2% of the total
> cpu time is spent it the TAS macro in LWLockAcquire. That's the hottest
> instruction in the whole profile, it eats more cpu cycles than the
> memcpy() calls that transfer data to/from kernel.
> Is there an easy way find out which LWLock is contended?
> --
> Manfred
>

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp(at)pyrenet(dot)fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

From:	ohp(at)pyrenet(dot)fr
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-27 16:56:03
Message-ID:	Pine.UW2.4.53.0312271753550.19309@server.pyrenet.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Hi Manfred,

I'm using unixware 7 but couldn't compile your source with native cc, I
had to compile it with gcc.

here are the results:

Script started on Sat Dec 27 17:50:49 2003
/tmp 17:50:50: ./a.out
MOVETEST called by non-superuser, running with normal priority.
zerotest: -1073736628 ticks;
zerotest: -4372 ticks;
zerotest: -52 ticks;
rep nop: 508 ticks;
rep nop: 88 ticks;
rep nop: 60 ticks;
nop: 256 ticks;
nop: 224 ticks;
nop: 92 ticks;
nop: 64 ticks;
nop: 12 ticks;
nop: 8 ticks;
nop: 0 ticks;
nop: -4 ticks;
rep nop: 56 ticks;
rep nop: 44 ticks;
rep nop: 40 ticks;
zerotest: -4 ticks;
rep nop: 24 ticks;
zerotest: -4 ticks;
nop: 0 ticks;
zerotest: -4 ticks;
nop: -8 ticks;
zerotest: -8 ticks;
rep nop: 32 ticks;
nop: -16 ticks;
zerotest: -4 ticks;
zerotest: -4 ticks;
rep nop: 24 ticks;
zerotest: -4 ticks;
nop: -12 ticks;
^?
/tmp 17:53:34: exit

script done on Sat Dec 27 17:53:38 2003

Regards
On Sat, 27 Dec 2003, Manfred Spraul wrote:

> Date: Sat, 27 Dec 2003 12:22:45 +0100
> From: Manfred Spraul <manfred(at)colorfullife(dot)com>
> To: ohp(at)pyrenet(dot)fr
> Subject: Re: [PATCHES] update i386 spinlock for hyperthreading
>
> Hi Olivier,
>
> I have attached rep_nop.cpp. It's a simple C program (unfortunately
> filled with linux specific gccisms) that benchmarks the time for a
> single rep;nop instruction, compared to a normal nop and nothing at all.
>
> Which OS do you use? I can try to build an app for FreeBSD, and it
> should be possible to compile it under Windows with cygwin as well.
>
> Under Linux:
> $ gcc -O2 -o rep_nop rep_nop.cpp
> $ ./rep_nop
>
> Just run in for a few seconds and send back the output. Please add the
> mailing list back into the cc list.
> --
> Manfred
>

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	ohp(at)pyrenet(dot)fr
Cc:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-27 19:03:10
Message-ID:	3FEDD76E.6080304@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

ohp(at)pyrenet(dot)fr wrote:

>Hi Manfred,
>
>I'm using unixware 7 but couldn't compile your source with native cc, I
>had to compile it with gcc.
>
>here are the results:
>
>
Thanks. The test app compares the time needed for three different short
loops: a loop with six empty function calls, a loop with six function
calls and one nop in the middle, and a loop with a "rep;nop;" in the middle.

Result:
- nop needs 0 cycles - executed in parallel.
- rep;nop between 24 and 60 cycles - long enough that the pipeline is
emptied.

I've searched around for further info regarding the recommended spinlock
algorithm:
- The optimization manual (google for "Intel 248966") contains a section
about pause instructions: The memory ordering violation is from the
multiple simultaneous reads that are executed due to pipelining the busy
loop.
- It references the Application Note AP-949 "Using Spin-Loops on Intel
Pentium 4 Processor and Intel Xeon Processor" for further details.
Unfortunately the app notes are stored on cedar.intel.com, and that
server appears to be down :-(

--
Manfred

From:	Nailah Ogeer <ogeer(at)cs(dot)queensu(dot)ca>
To:	<pgsql-general(at)postgresql(dot)org>
Cc:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Connecting to Postgres
Date:	2003-12-27 19:08:16
Message-ID:	Pine.SOL.4.33.0312271407030.3848-100000@innovate
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

How do you connect to postgres if it is running on a different machine. I
am looking for a telnet command in place of postmaster -D data

From:	Martin Marques <martin(at)bugs(dot)unl(dot)edu(dot)ar>
To:	Nailah Ogeer <ogeer(at)cs(dot)queensu(dot)ca>, <pgsql-general(at)postgresql(dot)org>
Cc:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Connecting to Postgres
Date:	2003-12-27 19:36:21
Message-ID:	200312271636.21660.martin@bugs.unl.edu.ar
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

El Sáb 27 Dic 2003 16:08, Nailah Ogeer escribió:
> How do you connect to postgres if it is running on a different machine. I
> am looking for a telnet command in place of postmaster -D data

Depends on what you want to do.
One way arounf would be:

1) Edit the pg_hba.conf where the PG server is running so that you can
actually connect to the DB.
2) Run this:

$ psql -d database -h host -U user

where database, host and user are self explained. :-)

If what you want is something different please explain yourself.

--
select 'mmarques' || '@' || 'unl.edu.ar' AS email;
-----------------------------------------------------------------
Martín Marqués | mmarques(at)unl(dot)edu(dot)ar
Programador, Administrador, DBA | Centro de Telemática
Universidad Nacional
del Litoral
-----------------------------------------------------------------

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-27 21:11:55
Message-ID:	13783.1072559515@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
> My guess: Pentium 4 cpu support something like 250 uops in flight - it
> will have a dozend of the spinlock loops in it's pipeline. When the
> spinlock is released, it must figure out which of the loops should get
> it, and gets lost. My guess is that rep;nop delays the cpu buy at least
> 100 cpu ticks, and thus the pipeline will be empty before it proceeds.

After digging some more in Intel's documentation, it seems that indeed
PAUSE is defined to delay just long enough to empty the pipeline. So it
doesn't really matter where you put it in the wait loop, and there is no
point in inserting it in the success path; that answers my concerns from
before.

> There was a w_spinlock.pdf document with reference code. google still
> finds it, but the links are dead :-(

I was able to find it as a link from another application note at Intel's
documentation site. Try going to
http://appzone.intel.com/literature/index.asp and searching for AP-949.

Anyway, I've committed your patch with some changes.

> The 2nd thing I would change is to add a nonatomic test in the slow
> path: locked instructions generate lots of bus traffic, and that's a
> waste of resources.

Agreed, but I did not like the way you did it; this concern does not
necessarily apply to all processors, and since we are not using
S_LOCK_FREE at all, it's dubious that it's correctly implemented
everywhere. I modified the IA32 TAS() macro instead.

BTW, I noticed a lot of concern in the Intel app notes about reserving
64 or even 128 bytes for each spinlock to avoid cache line conflicts.
That seems excessive to me (we use a lot of spinlocks for buffers), but
perhaps it is worth looking into.

> Is there an easy way find out which LWLock is contended?

Not from oprofile output, as far as I can think. I've suspected for
some time that the BufMgrLock is a major bottleneck, but have no proof.

regards, tom lane

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-27 21:36:36
Message-ID:	3FEDFB64.8090104@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:

>Anyway, I've committed your patch with some changes.
>
>
Thanks.

>BTW, I noticed a lot of concern in the Intel app notes about reserving
>64 or even 128 bytes for each spinlock to avoid cache line conflicts.
>That seems excessive to me (we use a lot of spinlocks for buffers), but
>perhaps it is worth looking into.
>
This recommendation usually ignored in the Linux kernel. A few very hot
spinlocks have an exclusive cacheline, but most don't.

>>Is there an easy way find out which LWLock is contended?
>>
>>
>
>Not from oprofile output, as far as I can think. I've suspected for
>some time that the BufMgrLock is a major bottleneck, but have no proof.
>
>
I'll try to write a patch that dumps the LWLock usage and ask mark to
run it.

--
Manfred

From:	reina_ga(at)hotmail(dot)com (Tony Reina)
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Connecting to Postgres
Date:	2003-12-28 13:54:30
Message-ID:	272e4be7.0312280554.3a8ffb0@posting.google.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

In postgresql.conf the first entry should be something like 'tcpip'.
Set it to 'true'. That's the equivalent of the postmaster -D.

HTH,
-Tony

ogeer(at)cs(dot)queensu(dot)ca (Nailah Ogeer) wrote in message news:<Pine(dot)SOL(dot)4(dot)33(dot)0312271407030(dot)3848-100000(at)innovate>...
> How do you connect to postgres if it is running on a different machine. I
> am looking for a telnet command in place of postmaster -D data
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>, markw(at)osdl(dot)org
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-29 22:28:08
Message-ID:	3FF0AA78.9050803@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:

>>Is there an easy way find out which LWLock is contended?
>>
>>
>
>Not from oprofile output, as far as I can think. I've suspected for
>some time that the BufMgrLock is a major bottleneck, but have no proof.
>
>
Mark ran a DBT-2 testrun with the attached statistics patch applied: It
collects stats about all lightweight locks and dumps them during
shutdown. The hottest locks are

Lock Acquire %contention sleep calls
8(WALInsertLock) 8679205 0.030410 263934
1(LockMgrLock) 64089418 0.079783 5113215
5(SInvalLock) 68396470 0.001298 88812
0(BufMgrLock) 246307425 0.120293 29629089

The lock numbers are from 7.4, i.e. without the patch that removes
ShmemIndexLock. I've check that 8 is really WALInsertLock in the
assembly output.

The scary part from the system perspective are the 35 million context
switches that were generated by the BufMgrLock and the LockMgrLock. I
remember there were patches that tried other algorithms instead of the
simple LRU for the buffer manager. Has anyone tried to change the
locking of the buffer manager?

The effect of padding the lightweight locks to a full cacheline appears
to be negligable: With the padding, there were around 4 million
performance monitor hits on the 'lock xchg' instructions. Without it
(test run 300), there were 4.2 million hits.

The complete data is at

http://developer.osdl.org/markw/dbt2-pgsql/303/

The db log with the lock stats is at
http://developer.osdl.org/markw/dbt2-pgsql/303/db/log

(Warning: 6.9 MB)

--
Manfred

Attachment	Content-Type	Size
patch-lwl-stat	text/plain	3.2 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	pgsql-hackers(at)postgreSQL(dot)org, markw(at)osdl(dot)org
Subject:	Lock contention (was Re: [PATCHES] update i386 spinlock for hyperthreading)
Date:	2003-12-29 22:50:52
Message-ID:	17465.1072738252@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
> Mark ran a DBT-2 testrun with the attached statistics patch applied: It
> collects stats about all lightweight locks and dumps them during
> shutdown. The hottest locks are

> Lock Acquire %contention sleep calls
> 8(WALInsertLock) 8679205 0.030410 263934
> 5(SInvalLock) 68396470 0.001298 88812
> 1(LockMgrLock) 64089418 0.079783 5113215
> 0(BufMgrLock) 246307425 0.120293 29629089

Okay, that more or less squares with my gut feelings about the system.
Good to have some proof.

> I remember there were patches that tried other algorithms instead of the
> simple LRU for the buffer manager. Has anyone tried to change the
> locking of the buffer manager?

LRU etc have nothing to do with this. The majority of trips into the
buffer manager are for ReadBuffer/ReleaseBuffer. What we need is to
figure a way for those operations to use finer-grain locks so they don't
contend so much. However, given that they need to access and possibly
modify global structures (the buffer lookup hashtable and free list),
it's not obvious how to do it. If we could think of a way to divide
those structures into separately lockable portions, we might get somewhere.

> The effect of padding the lightweight locks to a full cacheline appears
> to be negligable:

Good, that squares with what the Linux kernel people seem to think.

regards, tom lane

From:	Christopher Browne <cbbrowne(at)libertyrms(dot)info>
To:	pgsql-patches(at)postgresql(dot)org
Subject:	Some Documentation Changes
Date:	2003-12-30 02:01:45
Message-ID:	60ad5bxdva.fsf_-_@dev6.int.libertyrms.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

1. In keeping with the recent discussion that there should be more
said about views, stored procedures, and triggers, in the tutorial, I
have added a bit of verbiage to that end.

2. Some formatting changes to the datetime discussion, as well as
addition of a citation of a relevant book on calendars.

Index: advanced.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/advanced.sgml,v
retrieving revision 1.38
diff -c -u -r1.38 advanced.sgml
--- advanced.sgml 29 Nov 2003 19:51:36 -0000 1.38
+++ advanced.sgml 30 Dec 2003 01:58:24 -0000
@@ -65,10 +65,24 @@

<para>
Views can be used in almost any place a real table can be used.
- Building views upon other views is not uncommon.
+ Building views upon other views is not uncommon. You may cut down
+ on the difficulty of building complex queries by constructing them
+ in smaller, easier-to-verify pieces, using views. Views may be
+ used to reveal specific table columns to users that legitimately
+ need access to some of the data, but who shouldn't be able to look
+ at the whole table.
</para>
- </sect1>

+ <para>
+ Views differ from <quote> real tables </quote> in that they are
+ not, by default, updatable. If they join together several tables,
+ it may be troublesome to update certain columns since the
+ <emphasis>real</emphasis> update that must take place requires
+ identifying the relevant rows in the source tables. This is
+ discussed further in <xref linkend="rules-views-update">.
+ </para>
+
+ </sect1>

<sect1 id="tutorial-fk">
<title>Foreign Keys</title>
@@ -387,6 +401,169 @@
</para>
</sect1>

+ <sect1 id="tutorial-storedprocs">
+ <title> Stored Procedures </title>
+
+ <indexterm zone="tutorial-storedprocs">
+ <primary>stored procedures</primary>
+ </indexterm>
+
+ <para> Stored procedures are code that runs inside the database
+ system. Numerous languages may be used to implement functions and
+ procedures; most built-in code is implemented in C. The
+ <quote>basic</quote> loadable procedural language for
+ <productname>PostgreSQL</productname> is <xref linkid="plpgsql">.
+ Numerous other languages may also be used, including <xref
+ linkid="plperl">, <xref linkid="pltcl">, and <xref
+ linkid="plpython">.
+ </para>
+
+ <para> There are several ways that stored procedures are really
+ helpful:
+
+ <itemizedlist>
+
+ <listitem><para> To centralize data validation code into the
+ database </para>
+
+ <para> Your system may use client software written in several
+ languages, perhaps with a <quote>web application</quote>
+ implemented in PHP, a <quote>server application</quote> implemented
+ in Java, and a <quote> report writer</quote> implemented in Perl.
+ In the absence of stored procedures, you will likely find that data
+ validation code must be implemented multiple times, in multiple
+ languages, once for each application.</para>
+
+ <para> By implementing data validation in stored procedures,
+ running in the database, it can behave uniformly for all these
+ systems, and you do not need to worry about synchronizing
+ validation procedures across the languages.</para>
+
+ </listitem>
+
+ <listitem><para> Reducing round trips between client and server
+ </para>
+
+ <para>A stored procedure may submit multiple queries, looking up
+ information and adding in links to additional tables. This takes
+ place without requiring that the client submit multiple queries,
+ and without requiring any added network traffic.
+ </para>
+
+ <para> As a matter of course, the queries share a single
+ transaction context, and there may also be savings in the
+ evaluation of query plans, that will be similar between invocations
+ of a given stored procedure. </para></listitem>
+
+ <listitem><para> To simplify queries. </para>
+
+ <para> For instance, if you are commonly checking the TLD on domain
+ names, you might create a stored procedure for this purpose, and so
+ be able to use queries such as <command> select domain, tld(domain)
+ from domains; </command> instead of having to put verbose code
+ using <function>substr()</function> into each query.
+ </para>
+
+ <para> It is particularly convenient to use scripting languages
+ like Perl, Tcl, and Python to <quote>grovel through strings</quote>
+ since they are designed for <quote>text processing.</quote></para>
+
+ <para> The binding to the R statistical language allows
+ implementing complex statistical queries inside the database,
+ instead of having to draw the data out.
+ </listitem>
+
+ <listitem><para> Increasing the level of abstraction</para>
+
+ <para> If data is accessed exclusively through stored procedures,
+ then the structures of tables may be changed without there needing
+ to be any visible change in the API used by programmers. In some
+ systems, users are <emphasis>only</emphasis> allowed access to
+ stored procedures to update data, and cannot do direct updates to
+ tables.
+ </para>
+
+ </listitem>
+
+ </itemizedlist>
+ </para>
+
+ <para> These benefits build on one another: careful use of stored
+ procedures can simultaneously improve reliability and performance,
+ whilst simplifying database access code and improving portability
+ across client platforms and languages. For instance, consider that
+ a stored procedure can cheaply query tables in the database to
+ validate the correctness of data provided as input. </para>
+
+ <para> Instead of requiring a whole series of queries to create an
+ object, and to look up parent/subsidiary objects to link it to, a
+ stored procedure can do all of this efficiently in the database
+ server, improving performance, and eliminating whole classes of
+ errors. </para>
+
+ </sect1>
+
+ <sect1 id="tutorial-triggers">
+ <title> Triggers </title>
+
+ <indexterm zone="tutorial-triggers">
+ <primary>triggers</primary>
+ </indexterm>
+
+ <para> Triggers allow running a function either before or after
+ update (<command>INSERT</command>, <command>DELETE</command>,
+ <command>UPDATE</command>) operations, which can allow you to do
+ some very clever things. </para>
+
+ <itemizedlist>
+
+ <listitem><para> Data Validation </para>
+
+ <para> Instead of explicitly coding validation checks as part of a
+ stored procedure, they may be introduced as <command>BEFORE</command>
+ triggers. The trigger function checks the input values, raising an
+ exception if it finds invalid input.</para>
+
+ <para> Note that this is how foreign key checks are implemented in
+ <productname>PostgreSQL</productname>; when you define a foreign
+ key, you will see a message similar to the following:
+<screen>
+NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
+</screen></para>
+
+ <para> In some cases, it may be appropriate for a trigger function
+ to insert data in order to <emphasis>make</emphasis> the input valid. For
+ instance, if a newly created object needs a status code in a status
+ table, the trigger might automatically do that.</para>
+ </listitem>
+
+ <listitem><para> Audit logs </para>
+
+ <para> One may use <command>AFTER</command> triggers to monitor updates to
+ vital tables, and <command>INSERT</command> entries into log tables to
+ provide a more permanent record of those updates. </para>
+ </listitem>
+
+ <listitem><para> Replication </para>
+
+ <para> The <application>RServ</application> replication system uses
+ <command>AFTER</command> triggers to track which rows have changed on the
+ <quote>master</quote> system and therefore need to be copied over to
+ <quote>slave</quote> systems.</para>
+
+ <para> <command>
+ CREATE TRIGGER "_rserv_trigger_t_" AFTER INSERT OR DELETE OR UPDATE ON "my_table"
+ FOR EACH ROW EXECUTE PROCEDURE "_rserv_log_" ('10');
+ </command></para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para> Notice that there are strong parallels between what can be
+ accomplished using triggers and stored procedures, particularly in
+ regards to data validation. </para>
+
+ </sect1>

<sect1 id="tutorial-conclusion">
<title>Conclusion</title>
Index: datetime.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/datetime.sgml,v
retrieving revision 2.39
diff -c -u -r2.39 datetime.sgml
--- datetime.sgml 1 Dec 2003 20:34:53 -0000 2.39
+++ datetime.sgml 30 Dec 2003 01:58:25 -0000
@@ -11,8 +11,8 @@
strings, and are broken up into distinct fields with a preliminary
determination of what kind of information may be in the
field. Each field is interpreted and either assigned a numeric
- value, ignored, or rejected.
- The parser contains internal lookup tables for all textual fields,
+ value, ignored, or rejected.</para>
+ <para> The parser contains internal lookup tables for all textual fields,
including months, days of the week, and time
zones.
</para>
@@ -1056,21 +1056,21 @@
years.
</para>

- <para>
- The papal bull of February 1582 decreed that 10 days should be dropped
- from October 1582 so that 15 October should follow immediately after
- 4 October.
- This was observed in Italy, Poland, Portugal, and Spain. Other Catholic
- countries followed shortly after, but Protestant countries were
- reluctant to change, and the Greek orthodox countries didn't change
- until the start of the 20th century.
-
- The reform was observed by Great Britain and Dominions (including what is
- now the USA) in 1752.
- Thus 2 September 1752 was followed by 14 September 1752.
+ <para> The papal bull of February 1582 decreed that 10 days should
+ be dropped from October 1582 so that 15 October should follow
+ immediately after 4 October.</para>
+
+ <para> This was observed in Italy, Poland, Portugal, and Spain.
+ Other Catholic countries followed shortly after, but Protestant
+ countries were reluctant to change, and the Greek orthodox countries
+ didn't change until the start of the 20th century.</para>
+
+ <para> The reform was observed by Great Britain and Dominions
+ (including what is now the USA) in 1752. Thus 2 September 1752 was
+ followed by 14 September 1752.</para>

- This is why Unix systems have the <command>cal</command> program
- produce the following:
+ <para> This is why Unix systems have the <command>cal</command>
+ program produce the following:

<screen>
$ <userinput>cal 9 1752</userinput>
@@ -1094,19 +1094,24 @@
</para>
</note>

- <para>
- Different calendars have been developed in various parts of the
- world, many predating the Gregorian system.
+ <para> Different calendars have been developed in various parts of
+ the world, many predating the Gregorian system.</para>

- For example,
- the beginnings of the Chinese calendar can be traced back to the 14th
- century BC. Legend has it that the Emperor Huangdi invented the
- calendar in 2637 BC.
+ <para> For example, the beginnings of the Chinese calendar can be
+ traced back to the 14th century BC. Legend has it that the Emperor
+ Huangdi invented the calendar in 2637 BC.</para>

- The People's Republic of China uses the Gregorian calendar
- for civil purposes. The Chinese calendar is used for determining
- festivals.
+ <para> The People's Republic of China uses the Gregorian calendar
+ for civil purposes. The Chinese calendar is used for determining
+ festivals.
</para>
+
+ <para> If you are interested in this sort of thing, <citation>
+ Calendrical Calculations: The Millennium Edition </citation> by by
+ Edward M. Reingold and Nachum Dershowitz is an excellent reference,
+ describing some 25 calendars, and providing software for displaying
+ them and converting between them.</para>
+
</sect1>
</appendix>
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>, markw(at)osdl(dot)org
Subject:	Re: update i386 spinlock for hyperthreading
Date:	2003-12-30 03:37:45
Message-ID:	200312300337.hBU3bjP01804@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Manfred Spraul wrote:
> The scary part from the system perspective are the 35 million context
> switches that were generated by the BufMgrLock and the LockMgrLock. I
> remember there were patches that tried other algorithms instead of the
> simple LRU for the buffer manager. Has anyone tried to change the
> locking of the buffer manager?

CVS HEAD already has an Adaptive Replacement Cache (ARC) for buffer
replacement.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Manfred Spraul <manfred(at)colorfullife(dot)com>, Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 04:14:48
Message-ID:	20304.1072757688@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Manfred Spraul wrote:
>> I remember there were patches that tried other algorithms instead of the
>> simple LRU for the buffer manager. Has anyone tried to change the
>> locking of the buffer manager?

> CVS HEAD already has an Adaptive Replacement Cache (ARC) for buffer
> replacement.

That's irrelevant to the problem, though. Unless the ARC code uses data
structures that are more amenable to localized locking than the old
global buffer freelist. (Jan?)

regards, tom lane

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Manfred Spraul <manfred(at)colorfullife(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 05:33:16
Message-ID:	3FF10E1C.7050304@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:

>Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
>
>>Manfred Spraul wrote:
>>
>>
>>>I remember there were patches that tried other algorithms instead of the
>>>simple LRU for the buffer manager. Has anyone tried to change the
>>>locking of the buffer manager?
>>>
>>>
>
>
>
>>CVS HEAD already has an Adaptive Replacement Cache (ARC) for buffer
>>replacement.
>>
>>
>
>That's irrelevant to the problem, though. Unless the ARC code uses data
>structures that are more amenable to localized locking than the old
>global buffer freelist. (Jan?)
>
> regards, tom lane
>
>

Not that I know of. The new strategy uses one shared hash table like the
old, and one buffer pool as well. It grabs the same old Bufmgr lock
during the lookup+replacement decision process, gives it up during
eventual IO, grabs it again when done with the IO. As a matter of fact,
the strategy itself does no locking at all. Like the old LRU code it
simply assumes that the buffer manager holds the lock during calls.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Manfred Spraul <manfred(at)colorfullife(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 05:41:30
Message-ID:	20789.1072762890@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> That's irrelevant to the problem, though. Unless the ARC code uses data
>> structures that are more amenable to localized locking than the old
>> global buffer freelist. (Jan?)

> the strategy itself does no locking at all. Like the old LRU code it
> simply assumes that the buffer manager holds the lock during calls.

Okay, I suspected as much but wasn't sure.

Manfred's numbers definitely say that we need to find a way to break
down the BufMgrLock into multiple finer-grain locks. We already have
all those per-buffer LWLocks, but I don't see how to apply those to
the problem of managing the global lookup and replacement datastructures.

Anyone see an attack path here?

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jan Wieck <JanWieck(at)Yahoo(dot)com>, Manfred Spraul <manfred(at)colorfullife(dot)com>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 14:56:04
Message-ID:	200312301456.hBUEu4n13503@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> >> That's irrelevant to the problem, though. Unless the ARC code uses data
> >> structures that are more amenable to localized locking than the old
> >> global buffer freelist. (Jan?)
>
> > the strategy itself does no locking at all. Like the old LRU code it
> > simply assumes that the buffer manager holds the lock during calls.
>
> Okay, I suspected as much but wasn't sure.
>
> Manfred's numbers definitely say that we need to find a way to break
> down the BufMgrLock into multiple finer-grain locks. We already have
> all those per-buffer LWLocks, but I don't see how to apply those to
> the problem of managing the global lookup and replacement datastructures.
>
> Anyone see an attack path here?

Should we have one lock per hash bucket rather than one for the entire
hash?

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 15:17:10
Message-ID:	3FF196F6.90500@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Bruce Momjian wrote:

>>Anyone see an attack path here?
>>
>>
>
>Should we have one lock per hash bucket rather than one for the entire
>hash?
>
>
That's the simple part. The problem is the aging strategy: we need a
strategy that doesn't rely on a global list that's updated after every
lookup. If I understand the ARC code correctly, there is a
STRAT_MRU_INSERT(cdb, STRAT_LIST_T2) that happen in every lookup.

--
Manfred

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 15:28:37
Message-ID:	3FF199A5.1080709@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Manfred Spraul wrote:
> Bruce Momjian wrote:
>
>>>Anyone see an attack path here?
>>>
>>>
>>
>>Should we have one lock per hash bucket rather than one for the entire
>>hash?
>>
>>
> That's the simple part. The problem is the aging strategy: we need a
> strategy that doesn't rely on a global list that's updated after every
> lookup. If I understand the ARC code correctly, there is a
> STRAT_MRU_INSERT(cdb, STRAT_LIST_T2) that happen in every lookup.

Moving the Cache Directory Block (cdb) on a hit to the MRU position of
the appropriate queue "is the bookkeeping" of this strategy. The whole
algorithm is based on it, and I don't see yet how to avoid that without
opening a huge can of worms that look like deadlocks. But I'll think
about it for a while.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Manfred Spraul <manfred(at)colorfullife(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 16:48:10
Message-ID:	200312301648.hBUGmAT01243@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Jan Wieck wrote:
> Manfred Spraul wrote:
> > Bruce Momjian wrote:
> >
> >>>Anyone see an attack path here?
> >>>
> >>>
> >>
> >>Should we have one lock per hash bucket rather than one for the entire
> >>hash?
> >>
> >>
> > That's the simple part. The problem is the aging strategy: we need a
> > strategy that doesn't rely on a global list that's updated after every
> > lookup. If I understand the ARC code correctly, there is a
> > STRAT_MRU_INSERT(cdb, STRAT_LIST_T2) that happen in every lookup.
>
> Moving the Cache Directory Block (cdb) on a hit to the MRU position of
> the appropriate queue "is the bookkeeping" of this strategy. The whole
> algorithm is based on it, and I don't see yet how to avoid that without
> opening a huge can of worms that look like deadlocks. But I'll think
> about it for a while.

If we can't eliminate the global lock, and we reduce its duration?

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Manfred Spraul <manfred(at)colorfullife(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 16:55:43
Message-ID:	3FF1AE0F.1040003@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:

> Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
>> Are there strategies that do not rely on a global lock? The Linux kernel
>> uses a lazy LRU with referenced bits: on access, the referenced bit is
>> set. The freespace logic takes pages from the end of a linked list, and
>> checks that bit: if it's set, then the page is moved back to the top of
>> the list. Otherwise it's a candidate for replacement.
>
> I think this is the same idea as what I was just suggesting: add an
> extra check when looking for a free page, and thereby avoid having to
> lock/update the global datastructure during ReadBuffer.

Hmmmm ... speaking without having done in depth thought on this one:

If there would be an easy way to determine the approximate position of a
CDB inside the queue, one could just forget about moving it to the MRU
position if it's in the upper 3rd or so anyway.

The theory of the whole ARC strategy (which is nothing more than four
LRU's with a twist) is that the access frequency of blocks is reverse
proportional to their numbers (a few are high frequency used, some are
medium often requested, the vast majority rather seldom). That means,
that the buffers in the upper third or quarter of the T1 queue (and
that's the critical one) are basically circling around each other with
frequent visitors from the lower regions or other queues.

So if there would be a way to make a good guess on that relative queue
position, that expensive CDB shuffeling can be avoided often.

Jan

From:	Manfred Spraul <manfred(at)colorfullife(dot)com>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 17:10:39
Message-ID:	3FF1B18F.5090408@colorfullife.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Jan Wieck wrote:

>Moving the Cache Directory Block (cdb) on a hit to the MRU position of
>the appropriate queue "is the bookkeeping" of this strategy. The whole
>algorithm is based on it, and I don't see yet how to avoid that without
>opening a huge can of worms that look like deadlocks. But I'll think
>about it for a while.
>
I feared that.
Are there strategies that do not rely on a global lock? The Linux kernel
uses a lazy LRU with referenced bits: on access, the referenced bit is
set. The freespace logic takes pages from the end of a linked list, and
checks that bit: if it's set, then the page is moved back to the top of
the list. Otherwise it's a candidate for replacement. Pages start at the
head of that pseudo-lru list, with the reference bit clear: that way a
page that is accessed only once has a lower priority than a frequently
accessed page. At least that's how I understand the algorithm.

--
Manfred

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Jan Wieck <JanWieck(at)Yahoo(dot)com>, Manfred Spraul <manfred(at)colorfullife(dot)com>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 17:12:10
Message-ID:	23768.1072804330@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> If we can't eliminate the global lock, and we reduce its duration?

It'd be a big win even if we could just arrange that ReadBuffer and
ReleaseBuffer don't *both* grab the same global lock.

Can we logically separate the buffer lookup hashtable from the freelist
maintenance code? I am wondering about having two locks, one for each
of those areas. In principle I think a fast-path ReadBuffer (one that
doesn't need to do I/O because the desired page is in a buffer already)
would need to get only a share lock on the lookup table, and never lock
the freelist structure at all. ReleaseBuffer would need a write lock
on the freelist, but need not touch the lookup table. Only in the case
where ReadBuffer has to do I/O, and therefore needs to acquire a free
buffer to read the page into, do you need locks on both structures ---
and with sufficient care, you might not need to lock them both at the
same time, even in that path.

To make this work, we would have to recognize that the "freelist" might
be out of date --- that is, a page that had been in the freelist because
it has zero pins would be left in the freelist by ReadBuffer, and anyone
trying to remove it from the freelist for reuse would have to notice
that it has positive pin count and realize that it's not really free.
But that should be pretty simple.

Beyond that we could think about locking just parts of each of these
structures (for instance, doesn't ARC really use multiple freelists?)
but I think we ought to focus first on that basic division.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Spraul <manfred(at)colorfullife(dot)com>
Cc:	Jan Wieck <JanWieck(at)Yahoo(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, markw(at)osdl(dot)org
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2003-12-30 17:16:15
Message-ID:	23806.1072804575@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
> Are there strategies that do not rely on a global lock? The Linux kernel
> uses a lazy LRU with referenced bits: on access, the referenced bit is
> set. The freespace logic takes pages from the end of a linked list, and
> checks that bit: if it's set, then the page is moved back to the top of
> the list. Otherwise it's a candidate for replacement.

I think this is the same idea as what I was just suggesting: add an
extra check when looking for a free page, and thereby avoid having to
lock/update the global datastructure during ReadBuffer.

regards, tom lane

From:	Neil Conway <neilc(at)samurai(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Manfred Spraul <manfred(at)colorfullife(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, markw(at)osdl(dot)org
Subject:	Re: Lock contention (was Re: [PATCHES] update i386
Date:	2004-01-04 06:56:27
Message-ID:	m2y8so6w2s.fsf@mailbox.samurai.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> LRU etc have nothing to do with this. The majority of trips into the
> buffer manager are for ReadBuffer/ReleaseBuffer. What we need is to
> figure a way for those operations to use finer-grain locks so they don't
> contend so much.

I reimplemented much of bufmgr.c over the holidays, with the goal of
reducing the contention for the BufMgrLock as well as fixing some
other less important problems. Where possible, operations that only
effect the state of a single buffer now only need to acquire a
per-buffer "meta data lock". I haven't done any benchmarking yet, but
this should hopefully improve concurrent performance. The patch isn't
quite finished yet, but I'll post it to the list in a few days once
I've had a chance to wrap up a few loose ends.

-Neil

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Jan Wieck'" <JanWieck(at)Yahoo(dot)com>
Cc:	"'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Manfred Spraul'" <manfred(at)colorfullife(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>, <markw(at)osdl(dot)org>
Subject:	Re: [PATCHES] update i386 spinlock for hyperthreading
Date:	2004-02-08 23:29:47
Message-ID:	000c01c3ee9b$7178ce70$8ac887d9@LaptopDellXP
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

>Tom Lane
> Manfred's numbers definitely say that we need to find a way to break
> down the BufMgrLock into multiple finer-grain locks. We already have
> all those per-buffer LWLocks, but I don't see how to apply those to
> the problem of managing the global lookup and replacement
datastructures.
>
> Anyone see an attack path here?

Not specifically, but some ideas that might lead to something useful...

Taking a leaf from somebody else's book is always the best way - much as
has been done with the ARC algorithm itself.

1. Scaling Oracle8iT: Building Highly Scalable OLTP System Architectures

James Morle
ISBN: 0-201-32574-8
...seems like an interesting purchase in this regard (I have no
connection with the author) - but I can't vouch for usefulness of it.
http://www.aw-bc.com/catalog/academic/product/0,4096,0201325748-TOC,00.h
tml
This is nothing to do with OPS or RAC, which I definitely do not
advocate as an approach to the current challenges.

2. DB2 has long offered multiple BUFFERPOOLS. Why not explicitly define
multiple pools and then assign specific database objects to them - each
would have its own MRU list.

...that spurs some further off-the-wall ideas that follow the more usual
PostgreSQL line "design for automated operation" - how to make use of
multiple buffer pools without the need to manually assign database
objects to them???

First a question: am I right in thinking that the problem is
proportional to the number of independent actors (i.e. CPUs), but not
dependant at all on the actual size of the cache?

3. Split the T1 list into 2 (maybe more?) pieces, completely independent
of one another - T1a and T1b. When a T2, T1a, or T1b hit occurs, we
*somehow* pick one of the T1 lists, in a pseudo random fashion and move
it to the top of that list. This would clearly not be the best that ARC
can achieve, but if the buffer (and therefore the list) is big enough,
then the skew between the two lists might well be less than the loss of
performance from having a hot single MRU list. This might be regarded as
vertical partitioning.

>Jan Wieck writes:
>The theory of the whole ARC strategy (which is nothing more than four
>LRU's with a twist) is that the access frequency of blocks is reverse
>proportional to their numbers (a few are high frequency used, some are
>medium often requested, the vast majority rather seldom). That means,
>that the buffers in the upper third or quarter of the T1 queue (and
>that's the critical one) are basically circling around each other with
>frequent visitors from the lower regions or other queues.

4. Taking the analogy from (3) differently: partition T1 horizontally,
like the floors of a building, or the league divisions in Football
(Major/Minor? in the US, or 1st, 2nd, 3rd etc in the UK). The ARC
algorithm is identical, but T1 effectively has multiple MRU points, or
put another way, multiple lists all chained together. Say we give it 3
levels - T1.1 thru T1.3. When a block is first promoted from T2, it goes
to the MRU of T1.3. When a hit occurs when it is in T1.3 it gets punted
to the MRU of T1.2, and when a hit occurs when in T1.2 it would get
promoted to T1.1 MRU. On the way down, blocks falling off of T1.1 would
go to T1.2 then T1.3.

The T1.1 MRU would still be hotter than the rest. The way I am thinking
about this now is that the MRU points are evenly spaced, so the ARC
algorithm doesn't dynamically resize them as it does with T1/T2, but
that could change also. Doing things this way might mean that ARC
doesn't have to remember which transaction id added it - a specific ARC
tailoring feature added for PostgreSQL, since even if an UPDATE/DELETE
touches it twice, it won't move very far up the list. Would that save
time on the list reordering code?

4a) Perhaps MRU points should be spaced as 1/6, 1/3, 1/2 of the list, or
someother fairly simple but non-linear way in an attempt to balance the
hit frequency of the MRU points. Perhaps use four levels, then split
1/16, 2/16, 4/16, 8/16 (so last level if 9/16ths of list).

4b) Combine this with idea (3) above to split that list into two or more
equal sized lists, T1.1a and T1.1b.

4c) Promote blocks differently, according to their content type? Put
index non-leaf blocks (from fastroot down) & the rightmost leaf block
that are in T1 straight into T2.1, put permanent table heap blocks &
index leaf blocks into T2.2 and put temporary objects into T2.3. Would
require tasting the block or enquiring details about its parent object
rather than handling it blind - without regard to its contents, as ARC
does now - that sounds expensive.

Hope some of that helps..

Best Regards, Simon Riggs

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Christopher Browne <cbbrowne(at)libertyrms(dot)info>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: Some Documentation Changes
Date:	2004-03-30 21:58:26
Message-ID:	200403302158.i2ULwQT21333@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Patch applied. Thanks.

---------------------------------------------------------------------------

Christopher Browne wrote:
> 1. In keeping with the recent discussion that there should be more
> said about views, stored procedures, and triggers, in the tutorial, I
> have added a bit of verbiage to that end.
>
> 2. Some formatting changes to the datetime discussion, as well as
> addition of a citation of a relevant book on calendars.
>
> Index: advanced.sgml
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/advanced.sgml,v
> retrieving revision 1.38
> diff -c -u -r1.38 advanced.sgml
> --- advanced.sgml 29 Nov 2003 19:51:36 -0000 1.38
> +++ advanced.sgml 30 Dec 2003 01:58:24 -0000
> @@ -65,10 +65,24 @@
>
> <para>
> Views can be used in almost any place a real table can be used.
> - Building views upon other views is not uncommon.
> + Building views upon other views is not uncommon. You may cut down
> + on the difficulty of building complex queries by constructing them
> + in smaller, easier-to-verify pieces, using views. Views may be
> + used to reveal specific table columns to users that legitimately
> + need access to some of the data, but who shouldn't be able to look
> + at the whole table.
> </para>
> - </sect1>
>
> + <para>
> + Views differ from <quote> real tables </quote> in that they are
> + not, by default, updatable. If they join together several tables,
> + it may be troublesome to update certain columns since the
> + <emphasis>real</emphasis> update that must take place requires
> + identifying the relevant rows in the source tables. This is
> + discussed further in <xref linkend="rules-views-update">.
> + </para>
> +
> + </sect1>
>
> <sect1 id="tutorial-fk">
> <title>Foreign Keys</title>
> @@ -387,6 +401,169 @@
> </para>
> </sect1>
>
> + <sect1 id="tutorial-storedprocs">
> + <title> Stored Procedures </title>
> +
> + <indexterm zone="tutorial-storedprocs">
> + <primary>stored procedures</primary>
> + </indexterm>
> +
> + <para> Stored procedures are code that runs inside the database
> + system. Numerous languages may be used to implement functions and
> + procedures; most built-in code is implemented in C. The
> + <quote>basic</quote> loadable procedural language for
> + <productname>PostgreSQL</productname> is <xref linkid="plpgsql">.
> + Numerous other languages may also be used, including <xref
> + linkid="plperl">, <xref linkid="pltcl">, and <xref
> + linkid="plpython">.
> + </para>
> +
> + <para> There are several ways that stored procedures are really
> + helpful:
> +
> + <itemizedlist>
> +
> + <listitem><para> To centralize data validation code into the
> + database </para>
> +
> + <para> Your system may use client software written in several
> + languages, perhaps with a <quote>web application</quote>
> + implemented in PHP, a <quote>server application</quote> implemented
> + in Java, and a <quote> report writer</quote> implemented in Perl.
> + In the absence of stored procedures, you will likely find that data
> + validation code must be implemented multiple times, in multiple
> + languages, once for each application.</para>
> +
> + <para> By implementing data validation in stored procedures,
> + running in the database, it can behave uniformly for all these
> + systems, and you do not need to worry about synchronizing
> + validation procedures across the languages.</para>
> +
> + </listitem>
> +
> + <listitem><para> Reducing round trips between client and server
> + </para>
> +
> + <para>A stored procedure may submit multiple queries, looking up
> + information and adding in links to additional tables. This takes
> + place without requiring that the client submit multiple queries,
> + and without requiring any added network traffic.
> + </para>
> +
> + <para> As a matter of course, the queries share a single
> + transaction context, and there may also be savings in the
> + evaluation of query plans, that will be similar between invocations
> + of a given stored procedure. </para></listitem>
> +
> + <listitem><para> To simplify queries. </para>
> +
> + <para> For instance, if you are commonly checking the TLD on domain
> + names, you might create a stored procedure for this purpose, and so
> + be able to use queries such as <command> select domain, tld(domain)
> + from domains; </command> instead of having to put verbose code
> + using <function>substr()</function> into each query.
> + </para>
> +
> + <para> It is particularly convenient to use scripting languages
> + like Perl, Tcl, and Python to <quote>grovel through strings</quote>
> + since they are designed for <quote>text processing.</quote></para>
> +
> + <para> The binding to the R statistical language allows
> + implementing complex statistical queries inside the database,
> + instead of having to draw the data out.
> + </listitem>
> +
> + <listitem><para> Increasing the level of abstraction</para>
> +
> + <para> If data is accessed exclusively through stored procedures,
> + then the structures of tables may be changed without there needing
> + to be any visible change in the API used by programmers. In some
> + systems, users are <emphasis>only</emphasis> allowed access to
> + stored procedures to update data, and cannot do direct updates to
> + tables.
> + </para>
> +
> + </listitem>
> +
> + </itemizedlist>
> + </para>
> +
> + <para> These benefits build on one another: careful use of stored
> + procedures can simultaneously improve reliability and performance,
> + whilst simplifying database access code and improving portability
> + across client platforms and languages. For instance, consider that
> + a stored procedure can cheaply query tables in the database to
> + validate the correctness of data provided as input. </para>
> +
> + <para> Instead of requiring a whole series of queries to create an
> + object, and to look up parent/subsidiary objects to link it to, a
> + stored procedure can do all of this efficiently in the database
> + server, improving performance, and eliminating whole classes of
> + errors. </para>
> +
> + </sect1>
> +
> + <sect1 id="tutorial-triggers">
> + <title> Triggers </title>
> +
> + <indexterm zone="tutorial-triggers">
> + <primary>triggers</primary>
> + </indexterm>
> +
> + <para> Triggers allow running a function either before or after
> + update (<command>INSERT</command>, <command>DELETE</command>,
> + <command>UPDATE</command>) operations, which can allow you to do
> + some very clever things. </para>
> +
> + <itemizedlist>
> +
> + <listitem><para> Data Validation </para>
> +
> + <para> Instead of explicitly coding validation checks as part of a
> + stored procedure, they may be introduced as <command>BEFORE</command>
> + triggers. The trigger function checks the input values, raising an
> + exception if it finds invalid input.</para>
> +
> + <para> Note that this is how foreign key checks are implemented in
> + <productname>PostgreSQL</productname>; when you define a foreign
> + key, you will see a message similar to the following:
> +<screen>
> +NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
> +</screen></para>
> +
> + <para> In some cases, it may be appropriate for a trigger function
> + to insert data in order to <emphasis>make</emphasis> the input valid. For
> + instance, if a newly created object needs a status code in a status
> + table, the trigger might automatically do that.</para>
> + </listitem>
> +
> + <listitem><para> Audit logs </para>
> +
> + <para> One may use <command>AFTER</command> triggers to monitor updates to
> + vital tables, and <command>INSERT</command> entries into log tables to
> + provide a more permanent record of those updates. </para>
> + </listitem>
> +
> + <listitem><para> Replication </para>
> +
> + <para> The <application>RServ</application> replication system uses
> + <command>AFTER</command> triggers to track which rows have changed on the
> + <quote>master</quote> system and therefore need to be copied over to
> + <quote>slave</quote> systems.</para>
> +
> + <para> <command>
> + CREATE TRIGGER "_rserv_trigger_t_" AFTER INSERT OR DELETE OR UPDATE ON "my_table"
> + FOR EACH ROW EXECUTE PROCEDURE "_rserv_log_" ('10');
> + </command></para>
> + </listitem>
> +
> + </itemizedlist>
> +
> + <para> Notice that there are strong parallels between what can be
> + accomplished using triggers and stored procedures, particularly in
> + regards to data validation. </para>
> +
> + </sect1>
>
> <sect1 id="tutorial-conclusion">
> <title>Conclusion</title>
> Index: datetime.sgml
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/datetime.sgml,v
> retrieving revision 2.39
> diff -c -u -r2.39 datetime.sgml
> --- datetime.sgml 1 Dec 2003 20:34:53 -0000 2.39
> +++ datetime.sgml 30 Dec 2003 01:58:25 -0000
> @@ -11,8 +11,8 @@
> strings, and are broken up into distinct fields with a preliminary
> determination of what kind of information may be in the
> field. Each field is interpreted and either assigned a numeric
> - value, ignored, or rejected.
> - The parser contains internal lookup tables for all textual fields,
> + value, ignored, or rejected.</para>
> + <para> The parser contains internal lookup tables for all textual fields,
> including months, days of the week, and time
> zones.
> </para>
> @@ -1056,21 +1056,21 @@
> years.
> </para>
>
> - <para>
> - The papal bull of February 1582 decreed that 10 days should be dropped
> - from October 1582 so that 15 October should follow immediately after
> - 4 October.
> - This was observed in Italy, Poland, Portugal, and Spain. Other Catholic
> - countries followed shortly after, but Protestant countries were
> - reluctant to change, and the Greek orthodox countries didn't change
> - until the start of the 20th century.
> -
> - The reform was observed by Great Britain and Dominions (including what is
> - now the USA) in 1752.
> - Thus 2 September 1752 was followed by 14 September 1752.
> + <para> The papal bull of February 1582 decreed that 10 days should
> + be dropped from October 1582 so that 15 October should follow
> + immediately after 4 October.</para>
> +
> + <para> This was observed in Italy, Poland, Portugal, and Spain.
> + Other Catholic countries followed shortly after, but Protestant
> + countries were reluctant to change, and the Greek orthodox countries
> + didn't change until the start of the 20th century.</para>
> +
> + <para> The reform was observed by Great Britain and Dominions
> + (including what is now the USA) in 1752. Thus 2 September 1752 was
> + followed by 14 September 1752.</para>
>
> - This is why Unix systems have the <command>cal</command> program
> - produce the following:
> + <para> This is why Unix systems have the <command>cal</command>
> + program produce the following:
>
> <screen>
> $ <userinput>cal 9 1752</userinput>
> @@ -1094,19 +1094,24 @@
> </para>
> </note>
>
> - <para>
> - Different calendars have been developed in various parts of the
> - world, many predating the Gregorian system.
> + <para> Different calendars have been developed in various parts of
> + the world, many predating the Gregorian system.</para>
>
> - For example,
> - the beginnings of the Chinese calendar can be traced back to the 14th
> - century BC. Legend has it that the Emperor Huangdi invented the
> - calendar in 2637 BC.
> + <para> For example, the beginnings of the Chinese calendar can be
> + traced back to the 14th century BC. Legend has it that the Emperor
> + Huangdi invented the calendar in 2637 BC.</para>
>
> - The People's Republic of China uses the Gregorian calendar
> - for civil purposes. The Chinese calendar is used for determining
> - festivals.
> + <para> The People's Republic of China uses the Gregorian calendar
> + for civil purposes. The Chinese calendar is used for determining
> + festivals.
> </para>
> +
> + <para> If you are interested in this sort of thing, <citation>
> + Calendrical Calculations: The Millennium Edition </citation> by by
> + Edward M. Reingold and Nachum Dershowitz is an excellent reference,
> + describing some 25 calendars, and providing software for displaying
> + them and converting between them.</para>
> +
> </sect1>
> </appendix>
> --
> let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
> <http://dev6.int.libertyrms.com/>
> Christopher Browne
> (416) 646 3304 x124 (land)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Christopher Browne <cbbrowne(at)libertyrms(dot)info>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Some Documentation Changes
Date:	2004-03-31 12:32:52
Message-ID:	200403311432.52201.peter_e@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Am Dienstag, 30. Dezember 2003 03:01 schrieb Christopher Browne:
> 1. In keeping with the recent discussion that there should be more
> said about views, stored procedures, and triggers, in the tutorial, I
> have added a bit of verbiage to that end.

The idea that seems to get lost here is that the Tutorial is supposed to be
something for people to try out, not just a list of interesting ideas to keep
in mind for later on.

> 2. Some formatting changes to the datetime discussion,

Please revert them.

> as well as
> addition of a citation of a relevant book on calendars.

Citations go into the bibliography.

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Browne <cbbrowne(at)libertyrms(dot)info>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: Some Documentation Changes
Date:	2004-03-31 12:33:50
Message-ID:	200403311433.50091.peter_e@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Am Dienstag, 30. März 2004 23:58 schrieb Bruce Momjian:
> Patch applied. Thanks.

Please revert this patch. The material is not "Tutorial" material.

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Christopher Browne <cbbrowne(at)libertyrms(dot)info>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Some Documentation Changes
Date:	2004-03-31 16:19:07
Message-ID:	200403311619.i2VGJ7P04778@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> Am Dienstag, 30. Dezember 2003 03:01 schrieb Christopher Browne:
> > 1. In keeping with the recent discussion that there should be more
> > said about views, stored procedures, and triggers, in the tutorial, I
> > have added a bit of verbiage to that end.
>
> The idea that seems to get lost here is that the Tutorial is supposed to be
> something for people to try out, not just a list of interesting ideas to keep
> in mind for later on.
>
> > 2. Some formatting changes to the datetime discussion,
>
> Please revert them.
>
> > as well as
> > addition of a citation of a relevant book on calendars.
>
> Citations go into the bibliography.

OK, entire patch reverted. Does someone want to rework this information
to fit into our docs more cleanly?

Attachment	Content-Type	Size
unknown_filename	text/plain	13.3 KB