PL/TCL Patch to prevent postgres from becoming multithreaded

Lists: pgsql-hackerspgsql-patches
From: "Marshall, Steve" <smarshall(at)wsi(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-11 19:43:10
Message-ID: 46E6EFCE.1030809@wsi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

There is a problem in PL/TCL that can cause the postgres backend to
become multithreaded. Postgres is not designed to be multithreaded, so
this causes downstream errors in signal handling. We have seen this
cause a number of "unexpected state" errors associated with notification
handling; however, unpredictable signal handling would be likely to
cause other errors as well.

Some sample scripts are attached which will reproduce this problem when
running against a multithreaded version of TCL, but will work without
error with single-threaded TCL library. The scripts are a combination
of Unix shell, perl DBI, and SQL commands. The postgres process can be
seen to have multiple threads using the Linux command "ps -Lwfu
postgres". In this command the NLWP columns will be 2 for multithreaded
backend processes. The threaded/non-threaded state of the TCL library
can be ascertained on Linux using ldd to determine if libpthread.so is
linked to the TCL library (e.g. "ldd /usr/lib/libtcl8.4.so").

The multithreaded behavior occurs the first time PL/TCL is used in a
postgres backend, but only when postgres is linked against a
multithread-enabled version of libtcl. Thus, this problem can be
side-stepped by linking against the proper TCL library. However
multithreaded TCL libraries are becoming the norm in Linux distributions
and seems ubiquitous in the Windows world. Therefore a fix to the
PL/TCL code is warrented.

We determined that postgres became multithreaded during the creation of
the TCL interpreter in a function called tcl_InitNotifier. This
function is part of TCL's Notifier subsystem, which is used to monitor
for events asynchronously from the TCL event loop. Although initialized
when an interpreter is created, the Notifier subsystem is not used until
a process enters the TCL event loop. This never happens within a
postgres process, because postgres implements its own event loop.
Therefore the initialization of the Notifier subsystem is not necessary
within the context of PL/TCL.

Our solution was to disable the Notifier subsystem by overriding the
functions associated with it using the Tcl_SetNotifier function. This
allows 8 functions related to the Notifier to overriden. Even though we
found only two of the functions were ever called within postgres, we
overrode 8 functions with no-op versions, just for completeness. A
patch file containing the changes to pltcl.c from its 8.2.4 version is
also attached.

We tested this patch with PostgreSQL 8.2.4 on both RedHat Enterprise 4.0
usingTCL 8.4 (single threaded) and RHE 5.0 using TCL 8.4.13
(multithreaded). We expect this solution to work with Windows as well,
although we have not tested it. There may be some problems using this
solution with old versions of TCL that pre-date the Tcl_SetNotifier
function. However this function has been around for quite a while; it
was added in in the TCL 8.2 release, circa 2000.

We hope this patch will be considered for a future PostgreSQL release.

Steve Marshall
Paul Bayer
Doug Knight
WSI Corporation

Attachment Content-Type Size
pltcl_multithread_bug_test.tar.gz application/x-gzip 936 bytes
pltcl.c.8.2.4.patch text/plain 4.2 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Marshall, Steve" <smarshall(at)wsi(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-14 17:20:58
Message-ID: 200709141720.l8EHKwX13792@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Marshall, Steve wrote:
> There is a problem in PL/TCL that can cause the postgres backend to
> become multithreaded. Postgres is not designed to be multithreaded, so
> this causes downstream errors in signal handling. We have seen this
> cause a number of "unexpected state" errors associated with notification
> handling; however, unpredictable signal handling would be likely to
> cause other errors as well.
>
> Some sample scripts are attached which will reproduce this problem when
> running against a multithreaded version of TCL, but will work without
> error with single-threaded TCL library. The scripts are a combination
> of Unix shell, perl DBI, and SQL commands. The postgres process can be
> seen to have multiple threads using the Linux command "ps -Lwfu
> postgres". In this command the NLWP columns will be 2 for multithreaded
> backend processes. The threaded/non-threaded state of the TCL library
> can be ascertained on Linux using ldd to determine if libpthread.so is
> linked to the TCL library (e.g. "ldd /usr/lib/libtcl8.4.so").
>
> The multithreaded behavior occurs the first time PL/TCL is used in a
> postgres backend, but only when postgres is linked against a
> multithread-enabled version of libtcl. Thus, this problem can be
> side-stepped by linking against the proper TCL library. However
> multithreaded TCL libraries are becoming the norm in Linux distributions
> and seems ubiquitous in the Windows world. Therefore a fix to the
> PL/TCL code is warrented.
>
> We determined that postgres became multithreaded during the creation of
> the TCL interpreter in a function called tcl_InitNotifier. This
> function is part of TCL's Notifier subsystem, which is used to monitor
> for events asynchronously from the TCL event loop. Although initialized
> when an interpreter is created, the Notifier subsystem is not used until
> a process enters the TCL event loop. This never happens within a
> postgres process, because postgres implements its own event loop.
> Therefore the initialization of the Notifier subsystem is not necessary
> within the context of PL/TCL.
>
> Our solution was to disable the Notifier subsystem by overriding the
> functions associated with it using the Tcl_SetNotifier function. This
> allows 8 functions related to the Notifier to overriden. Even though we
> found only two of the functions were ever called within postgres, we
> overrode 8 functions with no-op versions, just for completeness. A
> patch file containing the changes to pltcl.c from its 8.2.4 version is
> also attached.
>
> We tested this patch with PostgreSQL 8.2.4 on both RedHat Enterprise 4.0
> usingTCL 8.4 (single threaded) and RHE 5.0 using TCL 8.4.13
> (multithreaded). We expect this solution to work with Windows as well,
> although we have not tested it. There may be some problems using this
> solution with old versions of TCL that pre-date the Tcl_SetNotifier
> function. However this function has been around for quite a while; it
> was added in in the TCL 8.2 release, circa 2000.
>
> We hope this patch will be considered for a future PostgreSQL release.
>
> Steve Marshall
> Paul Bayer
> Doug Knight
> WSI Corporation
>
>

[ application/x-gzip is not supported, skipping... ]

> *** pltcl.c.orig 2007-09-10 12:58:34.000000000 -0400
> --- pltcl.c 2007-09-11 11:37:33.363222114 -0400
> ***************
> *** 163,168 ****
> --- 163,258 ----
> static void pltcl_build_tuple_argument(HeapTuple tuple, TupleDesc tupdesc,
> Tcl_DString *retval);
>
> + /**********************************************************************
> + * Declarations for functions overriden using Tcl_SetNotifier.
> + **********************************************************************/
> + static int fakeThreadKey; /* To give valid address for ClientData */
> +
> + static ClientData
> + pltcl_InitNotifier _ANSI_ARGS_((void));
> +
> + static void
> + pltcl_FinalizeNotifier _ANSI_ARGS_((ClientData clientData));
> +
> + static void
> + pltcl_SetTimer _ANSI_ARGS_((Tcl_Time *timePtr));
> +
> + static void
> + pltcl_AlertNotifier _ANSI_ARGS_((ClientData clientData));
> +
> + static void
> + pltcl_CreateFileHandler _ANSI_ARGS_((int fd, int mask, Tcl_FileProc *proc, ClientData clientData));
> +
> + static void
> + pltcl_DeleteFileHandler _ANSI_ARGS_((int fd));
> +
> + static void
> + pltcl_ServiceModeHook _ANSI_ARGS_((int mode));
> +
> + static int
> + pltcl_WaitForEvent _ANSI_ARGS_((Tcl_Time *timePtr));
> +
> + /**********************************************************************
> + * Definitions for functions overriden using Tcl_SetNotifier.
> + * These implementations effectively disable the TCL Notifier subsystem.
> + * This is okay because we never enter the TCL event loop from postgres,
> + * so the notifier capabilities are initialized, but never used.
> + *
> + * NOTE: Only InitNotifier and DeleteFileHandler ever seem to get called
> + * by postgres, but we implement all the functions for completeness.
> + **********************************************************************/
> +
> + ClientData
> + pltcl_InitNotifier()
> + {
> + return (ClientData) &(fakeThreadKey);
> + }
> +
> + void
> + pltcl_FinalizeNotifier(clientData)
> + ClientData clientData; /* Not used. */
> + {
> + }
> +
> + void
> + pltcl_SetTimer(timePtr)
> + Tcl_Time *timePtr;
> + {
> + }
> +
> + void
> + pltcl_AlertNotifier(clientData)
> + ClientData clientData;
> + {
> + }
> +
> + void
> + pltcl_CreateFileHandler(fd, mask, proc, clientData)
> + int fd;
> + int mask;
> + Tcl_FileProc *proc;
> + ClientData clientData;
> + {
> + }
> +
> + void
> + pltcl_DeleteFileHandler(fd)
> + int fd;
> + {
> + }
> +
> + void
> + pltcl_ServiceModeHook(mode)
> + int mode;
> + {
> + }
> +
> + int
> + pltcl_WaitForEvent(timePtr)
> + Tcl_Time *timePtr; /* Maximum block time, or NULL. */
> + {
> + return 0;
> + }
>
> /*
> * This routine is a crock, and so is everyplace that calls it. The problem
> ***************
> *** 189,194 ****
> --- 279,287 ----
> void
> _PG_init(void)
> {
> + /* Notifier structure used to override functions in Notifier subsystem*/
> + Tcl_NotifierProcs notifier;
> +
> /* Be sure we do initialization only once (should be redundant now) */
> if (pltcl_pm_init_done)
> return;
> ***************
> *** 199,204 ****
> --- 292,316 ----
> #endif
>
> /************************************************************
> + * Override the functions in the Notifier subsystem.
> + *
> + * We do this to prevent the postgres backend from becoming
> + * multithreaded, which happens in the default version of
> + * Tcl_InitNotifier if the TCL library has been compiled with
> + * multithreading support (i.e. when TCL_THREADS is defined
> + * under Unix, and in all cases under Windows).
> + ************************************************************/
> + notifier.setTimerProc = pltcl_SetTimer;
> + notifier.waitForEventProc = pltcl_WaitForEvent;
> + notifier.createFileHandlerProc = pltcl_CreateFileHandler;
> + notifier.deleteFileHandlerProc = pltcl_DeleteFileHandler;
> + notifier.initNotifierProc = pltcl_InitNotifier;
> + notifier.finalizeNotifierProc = pltcl_FinalizeNotifier;
> + notifier.alertNotifierProc = pltcl_AlertNotifier;
> + notifier.serviceModeHookProc = pltcl_ServiceModeHook;
> + Tcl_SetNotifier(&notifier);
> +
> + /************************************************************
> * Create the dummy hold interpreter to prevent close of
> * stdout and stderr on DeleteInterp
> ************************************************************/

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Marshall\, Steve" <smarshall(at)wsi(dot)com>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-14 17:50:16
Message-ID: 87bqc5m90n.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> This has been saved for the 8.4 release:
>
> http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> ---------------------------------------------------------------------------
>
> Marshall, Steve wrote:
>> There is a problem in PL/TCL that can cause the postgres backend to
>> become multithreaded. Postgres is not designed to be multithreaded, so
>> this causes downstream errors in signal handling.

Um, this is a bug fix. Unless you had some problem with it?

Do we have anyone actively maintaining pltcl these days? I'm intentionally
quite unfamiliar with Tcl or I would be happy to verify it's reasonable. But
the explanation seems pretty convincing. If we don't have anyone maintaining
it then we're pretty much at the mercy of applying whatever patches come in
from people who are more familiar with it than us. In my experience that's how
new maintainers for modules of free software are often found anyways.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-14 17:57:49
Message-ID: 12919.1189792669@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>>> There is a problem in PL/TCL that can cause the postgres backend to
>>> become multithreaded. Postgres is not designed to be multithreaded, so
>>> this causes downstream errors in signal handling.

> Um, this is a bug fix. Unless you had some problem with it?

I haven't reviewed that patch yet, but I concur we should consider it
for 8.3.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "Marshall\, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-14 18:00:41
Message-ID: 46EACC49.3060404@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark wrote:
> Do we have anyone actively maintaining pltcl these days? I'm intentionally
> quite unfamiliar with Tcl or I would be happy to verify it's reasonable. But
> the explanation seems pretty convincing. If we don't have anyone maintaining
> it then we're pretty much at the mercy of applying whatever patches come in
> from people who are more familiar with it than us. In my experience that's how
> new maintainers for modules of free software are often found anyways.
>
>

I was hoping that Jan or somebody else with some Tcl-fu would comment. I
agree it probably shouldn't be deferred.

It does bother me a bit that the Tcl engine startup just starts firing
up threads without advertisement - I wondered if we shouldn't kick back
and say it's their problem to fix.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-14 18:07:30
Message-ID: 200709141807.l8EI7UZ04572@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


OK, moved to patches list.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
>
> Gregory Stark wrote:
> > Do we have anyone actively maintaining pltcl these days? I'm intentionally
> > quite unfamiliar with Tcl or I would be happy to verify it's reasonable. But
> > the explanation seems pretty convincing. If we don't have anyone maintaining
> > it then we're pretty much at the mercy of applying whatever patches come in
> > from people who are more familiar with it than us. In my experience that's how
> > new maintainers for modules of free software are often found anyways.
> >
> >
>
> I was hoping that Jan or somebody else with some Tcl-fu would comment. I
> agree it probably shouldn't be deferred.
>
> It does bother me a bit that the Tcl engine startup just starts firing
> up threads without advertisement - I wondered if we shouldn't kick back
> and say it's their problem to fix.
>
> cheers
>
> andrew

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-14 18:59:22
Message-ID: 46EADA0A.3040504@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>>>> There is a problem in PL/TCL that can cause the postgres backend to
>>>> become multithreaded. Postgres is not designed to be multithreaded, so
>>>> this causes downstream errors in signal handling.
>
>> Um, this is a bug fix. Unless you had some problem with it?
>
> I haven't reviewed that patch yet, but I concur we should consider it
> for 8.3.

hmm i wonder if that could be related to:

http://archives.postgresql.org/pgsql-hackers/2007-01/msg00377.php

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-15 01:53:28
Message-ID: 12435.1189821208@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> hmm i wonder if that could be related to:
> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00377.php

I had forgotten that thread, but it sure does look related doesn't it?
Do you want to try Steve's proposed patch and see if it fixes it?

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-15 16:27:19
Message-ID: 46EC07E7.9000100@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> hmm i wonder if that could be related to:
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00377.php
>
> I had forgotten that thread, but it sure does look related doesn't it?
> Do you want to try Steve's proposed patch and see if it fixes it?

yeah testing that patch now (seems to apply just fine on -HEAD) but it
seems that there is something strange going on because I just got:

http://www.kaltenbrunner.cc/files/regression.diffs

on the first manual buildfarm run on quagga...

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-15 17:03:05
Message-ID: 28630.1189875785@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> yeah testing that patch now (seems to apply just fine on -HEAD) but it
> seems that there is something strange going on because I just got:

! ERROR: could not read block 2 of relation 1663/16384/2606: read only 0 of 8192 bytes

Is that repeatable? What sort of filesystem are you testing on?
(soft-mounted NFS by any chance?)

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-15 17:24:27
Message-ID: 46EC154B.2050401@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> yeah testing that patch now (seems to apply just fine on -HEAD) but it
>> seems that there is something strange going on because I just got:
>
> ! ERROR: could not read block 2 of relation 1663/16384/2606: read only 0 of 8192 bytes
>
> Is that repeatable? What sort of filesystem are you testing on?

that box is slow - still testing ...

> (soft-mounted NFS by any chance?)

no - local SATA disk with ext3 (no OS related errors and SMART is clean too)

Stefan


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "Marshall, Steve" <smarshall(at)wsi(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-16 15:57:46
Message-ID: 46ED527A.4090108@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> yeah testing that patch now (seems to apply just fine on -HEAD) but it
>> seems that there is something strange going on because I just got:
>
> ! ERROR: could not read block 2 of relation 1663/16384/2606: read only 0 of 8192 bytes
>
> Is that repeatable? What sort of filesystem are you testing on?
> (soft-mounted NFS by any chance?)

doesn't seem to be repeatable :-(

on the postitive side - the pltcl patch does seem to fix the mentioned
regression failures quagga triggered earlier this year. I did about a
dozends of full buildfarm runs with that patch and about ten manual
executions of the pltcl regression tests without any sign of
misbehaviour so this seems like a clear candidate for at least -HEAD.

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: curious regression failures (was Re: [PATCHES] PL/TCL Patch to prevent postgres from becoming multithreaded)
Date: 2007-09-19 15:14:20
Message-ID: 20616.1190214860@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>> ! ERROR: could not read block 2 of relation 1663/16384/2606: read only 0 of 8192 bytes
>>
>> Is that repeatable? What sort of filesystem are you testing on?
>> (soft-mounted NFS by any chance?)

> doesn't seem to be repeatable :-(

Hmm ...
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=luna_moth&dt=2007-09-19%2013:10:01

Exact same error --- is it at the same place in the tests where you saw it?

Now that I think about it, there have been similar transient failures
("read only 0 of 8192 bytes") in the buildfarm before. It would be
helpful to collect a list of exactly which build reports contain
that string, but AFAIK there's no very easy way to do that; Andrew,
any suggestions?

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: curious regression failures (was Re: [PATCHES] PL/TCL Patch to prevent postgres from becoming multithreaded)
Date: 2007-09-19 15:22:34
Message-ID: 46F13EBA.8000306@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> Tom Lane wrote:
>>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>>> ! ERROR: could not read block 2 of relation 1663/16384/2606: read only 0 of 8192 bytes
>>> Is that repeatable? What sort of filesystem are you testing on?
>>> (soft-mounted NFS by any chance?)
>
>> doesn't seem to be repeatable :-(
>
> Hmm ...
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=luna_moth&dt=2007-09-19%2013:10:01
>
> Exact same error --- is it at the same place in the tests where you saw it?

looks like it is in a similiar place:

http://www.kaltenbrunner.cc/files/regression.diffs (I don't have more
than this on that failure any more)

Stefan


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: curious regression failures (was Re: [PATCHES] PL/TCL Patch to prevent postgres from becoming multithreaded)
Date: 2007-09-19 15:55:59
Message-ID: 46F1468F.1080903@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>
>> Tom Lane wrote:
>>
>>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>>
>>>> ! ERROR: could not read block 2 of relation 1663/16384/2606: read only 0 of 8192 bytes
>>>>
>>> Is that repeatable? What sort of filesystem are you testing on?
>>> (soft-mounted NFS by any chance?)
>>>
>
>
>> doesn't seem to be repeatable :-(
>>
>
> Hmm ...
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=luna_moth&dt=2007-09-19%2013:10:01
>
> Exact same error --- is it at the same place in the tests where you saw it?
>
> Now that I think about it, there have been similar transient failures
> ("read only 0 of 8192 bytes") in the buildfarm before. It would be
> helpful to collect a list of exactly which build reports contain
> that string, but AFAIK there's no very easy way to do that; Andrew,
> any suggestions?
>
>
>

pgbfprod=# select sysname, stage, snapshot from build_status where log ~
$$read only \d+ of \d+ bytes$$;
sysname | stage | snapshot
-----------+--------------+---------------------
zebra | InstallCheck | 2007-09-11 10:25:03
wildebeest | InstallCheck | 2007-09-11 22:00:11
baiji | InstallCheck | 2007-09-12 22:39:24
luna_moth | InstallCheck | 2007-09-19 13:10:01
(4 rows)

cheers

andrew


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: curious regression failures (was Re: [PATCHES] PL/TCL Patch to prevent postgres from becoming multithreaded)
Date: 2007-09-19 16:21:39
Message-ID: 46F14C93.4040908@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>
>>> Tom Lane wrote:
>>>
>>>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>>>
>>>>> ! ERROR: could not read block 2 of relation 1663/16384/2606: read
>>>>> only 0 of 8192 bytes
>>>>>
>>>> Is that repeatable? What sort of filesystem are you testing on?
>>>> (soft-mounted NFS by any chance?)
>>>>
>>
>>
>>> doesn't seem to be repeatable :-(
>>>
>>
>> Hmm ...
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=luna_moth&dt=2007-09-19%2013:10:01
>>
>>
>> Exact same error --- is it at the same place in the tests where you
>> saw it?
>>
>> Now that I think about it, there have been similar transient failures
>> ("read only 0 of 8192 bytes") in the buildfarm before. It would be
>> helpful to collect a list of exactly which build reports contain
>> that string, but AFAIK there's no very easy way to do that; Andrew,
>> any suggestions?
>>
>>
>>
>
> pgbfprod=# select sysname, stage, snapshot from build_status where log ~
> $$read only \d+ of \d+ bytes$$;
> sysname | stage | snapshot
> -----------+--------------+---------------------
> zebra | InstallCheck | 2007-09-11 10:25:03
> wildebeest | InstallCheck | 2007-09-11 22:00:11
> baiji | InstallCheck | 2007-09-12 22:39:24
> luna_moth | InstallCheck | 2007-09-19 13:10:01

hmm all of those seem to fail the foreign key checks in a very similiar
way and that are vastly different platforms (windows,solaris,openbsd and
linux).

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: curious regression failures (was Re: [PATCHES] PL/TCL Patch to prevent postgres from becoming multithreaded)
Date: 2007-09-19 16:34:00
Message-ID: 22220.1190219640@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> pgbfprod=# select sysname, stage, snapshot from build_status where log ~
> $$read only \d+ of \d+ bytes$$;
> sysname | stage | snapshot
> -----------+--------------+---------------------
> zebra | InstallCheck | 2007-09-11 10:25:03
> wildebeest | InstallCheck | 2007-09-11 22:00:11
> baiji | InstallCheck | 2007-09-12 22:39:24
> luna_moth | InstallCheck | 2007-09-19 13:10:01
> (4 rows)

Fascinating. So I would venture that (1) it's definitely our bug,
not something we could blame on NFS or whatever, and (2) we introduced
it fairly recently. That specific error message wording exists only
in HEAD, but it's been there since 2007-01-03, so if there were a
pre-existing problem you'd think there would be some more matches.

The patterns I notice here are (1) they're all InstallCheck not Check
failures; (2) though not all at the same place in the tests, it's
a fairly short range; (3) it's all references to system catalogs,
though not all the same one.

My gut feeling is that we're seeing autovacuum truncate off an empty end
block and then a backend tries to reference that block again. But there
should be enough interlocks in place to prevent such references. Any
ideas out there?

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: curious regression failures
Date: 2007-09-19 18:09:20
Message-ID: 877immil2n.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc> writes:

> Andrew Dunstan wrote:
>
>> pgbfprod=# select sysname, stage, snapshot from build_status where log ~
>> $$read only \d+ of \d+ bytes$$;
>> sysname | stage | snapshot
>> -----------+--------------+---------------------
>> zebra | InstallCheck | 2007-09-11 10:25:03
>> wildebeest | InstallCheck | 2007-09-11 22:00:11
>> baiji | InstallCheck | 2007-09-12 22:39:24
>> luna_moth | InstallCheck | 2007-09-19 13:10:01
>
> hmm all of those seem to fail the foreign key checks in a very similiar
> way and that are vastly different platforms (windows,solaris,openbsd and
> linux).

Is this exhaustive? That is, are we sure this never happened before Sept 11th?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: curious regression failures
Date: 2007-09-19 18:14:40
Message-ID: 46F16710.2030202@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark wrote:
> "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc> writes:
>
>
>> Andrew Dunstan wrote:
>>
>>
>>> pgbfprod=# select sysname, stage, snapshot from build_status where log ~
>>> $$read only \d+ of \d+ bytes$$;
>>> sysname | stage | snapshot
>>> -----------+--------------+---------------------
>>> zebra | InstallCheck | 2007-09-11 10:25:03
>>> wildebeest | InstallCheck | 2007-09-11 22:00:11
>>> baiji | InstallCheck | 2007-09-12 22:39:24
>>> luna_moth | InstallCheck | 2007-09-19 13:10:01
>>>
>> hmm all of those seem to fail the foreign key checks in a very similiar
>> way and that are vastly different platforms (windows,solaris,openbsd and
>> linux).
>>
>
> Is this exhaustive? That is, are we sure this never happened before Sept 11th?
>
>

Yes, we have never thrown away any buildfarm history, and we have build
logs going back several years now. Being able to run queries like this
makes it all worth while :-) (Thanks Joshua for the disk space - I know
it annoys you.)

cheers

andrew


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: curious regression failures
Date: 2007-09-19 19:02:09
Message-ID: 87r6kuh426.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Looking back, by far the largest change in the period Sep 1 - Sep 11 was the
lazy xid calculation and read-only transactions. That seems like the most
likely culprit.

But given Tom's comments this commit stands out too:


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: curious regression failures
Date: 2007-09-19 21:16:04
Message-ID: 26362.1190236564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> But given Tom's comments this commit stands out too:

> From: "Alvaro Herrera" <alvherre(at)postgresql(dot)org>
> Log Message:
> -----------
> Release the exclusive lock on the table early after truncating it in lazy
> vacuum, instead of waiting till commit.

I had thought about that one and not seen a problem with it --- but
sometimes when the light goes on, it's just blinding :-(. This change
is undoubtedly what's breaking it. The failures in question are coming
from commands that try to insert new entries into various system tables.
Now normally, the first place a backend will try to insert a brand-new
tuple in a table is the rd_targblock block that is remembered in
relcache as being where we last successfully inserted. The failures
must be happening because autovacuum has just truncated away where
rd_targblock points. There is a mechanism to reset everyone's
rd_targblock after a truncation: it's done by broadcasting a
shared-invalidation relcache inval message for that relation. Which
happens at commit, before releasing locks, which is the correct time for
the typical application of this mechanism, namely to make sure people
see system-catalog updates on time. Releasing the exclusive lock early
allows backends to try to access the relation again before they've heard
about the truncation.

There might be another way to manage this, but we're not inventing
a new invalidation mechanism for 8.3. This patch will have to be
reverted for the time being :-(

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Marshall, Steve" <smarshall(at)wsi(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-21 00:32:57
Message-ID: 17334.1190334777@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Marshall, Steve" <smarshall(at)wsi(dot)com> writes:
> There is a problem in PL/TCL that can cause the postgres backend to
> become multithreaded. Postgres is not designed to be multithreaded, so
> this causes downstream errors in signal handling. We have seen this
> cause a number of "unexpected state" errors associated with notification
> handling; however, unpredictable signal handling would be likely to
> cause other errors as well.

I've applied this patch to CVS HEAD (8.3-to-be). I'm a bit hesitant
to back-patch it however, at least not till it gets through some beta
testing.

Thanks for the detailed explanation, test case, and patch!

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: curious regression failures
Date: 2007-09-21 01:13:22
Message-ID: 20070921011321.GO30013@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:

> There might be another way to manage this, but we're not inventing
> a new invalidation mechanism for 8.3. This patch will have to be
> reverted for the time being :-(

Thanks. Seems it was a good judgement call to apply it only to HEAD,
after all.

In any case, at that point we are mostly done with the expensive steps
of vacuuming, so the transaction finishes not long after this. I don't
think this issue is worth inventing a new invalidation mechanism.

--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"La victoria es para quien se atreve a estar solo"


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: curious regression failures
Date: 2007-09-21 01:26:50
Message-ID: 18147.1190338010@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> In any case, at that point we are mostly done with the expensive steps
> of vacuuming, so the transaction finishes not long after this. I don't
> think this issue is worth inventing a new invalidation mechanism.

Yeah, I agree --- there are only a few catalog updates left to do after
we truncate. If we held the main-table exclusive lock while vacuuming
the TOAST table, we'd have a problem, but it looks to me like we don't.

Idle thought here: did anything get done with the idea of decoupling
main-table vacuum decisions from toast-table vacuum decisions? vacuum.c
comments

* Get a session-level lock too. This will protect our access to the
* relation across multiple transactions, so that we can vacuum the
* relation's TOAST table (if any) secure in the knowledge that no one is
* deleting the parent relation.

and it suddenly occurs to me that we'd need some other way to deal with
that scenario if autovac tries to vacuum toast tables independently.

Also, did you see the thread complaining that autovacuums block CREATE
INDEX? This seems true given the current locking definitions, and it's
a bit annoying. Is it worth inventing a new table lock type just for
vacuum?

regards, tom lane


From: "Marshall, Steve" <smarshall(at)wsi(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-21 11:55:24
Message-ID: 8536F69C1FCC294B859D07B179F069440A789512@EXCHANGE.ad.wsicorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I'm glad to see the patch making its way through the process. I'm also
glad you guys do comprehensive testing before accepting it, since we are
only able to test in a more limited range of environments.

We have applied the patch to our 8.2.4 installations and are running it
in a high transaction rate system (processing lots and lots of
continually changing weather data). Let me know if there is any
information we could provide that would be of help in making the
back-patching decision.

Yours,
Steve Marshall

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Thursday, September 20, 2007 8:33 PM
To: Marshall, Steve
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] PL/TCL Patch to prevent postgres from becoming
multithreaded

"Marshall, Steve" <smarshall(at)wsi(dot)com> writes:
> There is a problem in PL/TCL that can cause the postgres backend to
> become multithreaded. Postgres is not designed to be multithreaded,
so
> this causes downstream errors in signal handling. We have seen this
> cause a number of "unexpected state" errors associated with
> notification handling; however, unpredictable signal handling would be

> likely to cause other errors as well.

I've applied this patch to CVS HEAD (8.3-to-be). I'm a bit hesitant to
back-patch it however, at least not till it gets through some beta
testing.

Thanks for the detailed explanation, test case, and patch!

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: curious regression failures
Date: 2007-09-24 05:28:21
Message-ID: 20070924052821.GF5661@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:

> Idle thought here: did anything get done with the idea of decoupling
> main-table vacuum decisions from toast-table vacuum decisions? vacuum.c
> comments
>
> * Get a session-level lock too. This will protect our access to the
> * relation across multiple transactions, so that we can vacuum the
> * relation's TOAST table (if any) secure in the knowledge that no one is
> * deleting the parent relation.
>
> and it suddenly occurs to me that we'd need some other way to deal with
> that scenario if autovac tries to vacuum toast tables independently.

Hmm, right. We didn't change this in 8.3 but it looks like somebody
will need to have a great idea before long.

Of course, the easy answer is to grab a session-level lock for the main
table while vacuuming the toast table, but it doesn't seem very
friendly.

> Also, did you see the thread complaining that autovacuums block CREATE
> INDEX? This seems true given the current locking definitions, and it's
> a bit annoying. Is it worth inventing a new table lock type just for
> vacuum?

Hmm. I think Jim is right in that what we need is to make some forms of
ALTER TABLE take a lighter lock, one that doesn't conflict with analyze.
Guillaume's complaint are about restore times, which can only be
affected by analyze, not vacuum.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: curious regression failures
Date: 2007-09-24 09:07:20
Message-ID: 87zlzctos7.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Alvaro Herrera" <alvherre(at)commandprompt(dot)com> writes:

> Tom Lane wrote:
>
>> Idle thought here: did anything get done with the idea of decoupling
>> main-table vacuum decisions from toast-table vacuum decisions? vacuum.c
>> comments
>>
>> * Get a session-level lock too. This will protect our access to the
>> * relation across multiple transactions, so that we can vacuum the
>> * relation's TOAST table (if any) secure in the knowledge that no one is
>> * deleting the parent relation.
>>
>> and it suddenly occurs to me that we'd need some other way to deal with
>> that scenario if autovac tries to vacuum toast tables independently.
>
> Hmm, right. We didn't change this in 8.3 but it looks like somebody
> will need to have a great idea before long.
>
> Of course, the easy answer is to grab a session-level lock for the main
> table while vacuuming the toast table, but it doesn't seem very
> friendly.

Just a normal lock would do, no? At least for normal (non-full) vacuum.

I'm not clear why this has to be dealt with at all though. What happens if we
don't do anything? Doesn't it just mean a user trying to drop the table will
block until the vacuum is done? Or does dropping not take a lock on the toast
table?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: "Marshall, Steve" <smarshall(at)wsi(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: PL/TCL Patch to prevent postgres from becoming multithreaded
Date: 2007-09-25 18:22:13
Message-ID: 46F951D5.2080603@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Marshall, Steve wrote:
> I'm glad to see the patch making its way through the process. I'm also
> glad you guys do comprehensive testing before accepting it, since we are
> only able to test in a more limited range of environments.
>
> We have applied the patch to our 8.2.4 installations and are running it
> in a high transaction rate system (processing lots and lots of
> continually changing weather data). Let me know if there is any
> information we could provide that would be of help in making the
> back-patching decision.

I have re-enabled tcl builds(for -HEAD) on lionfish (mipsel) and quagga
(arm) a few days ago so we should get a bit of additional coverage from
boxes that definitly had problems with the tcl-threading behaviour.

Stefan