Re: sequential scans that pick up only deleted records do not honor query cancel or timeout

Lists: pgsql-bugs
From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: sequential scans that pick up only deleted records do not honor query cancel or timeout
Date: 2012-05-22 17:14:04
Message-ID: CAHyXU0xPW=JBur1FvC-ZbKXiLNbVzX6-pHGesfJiTqsMLXpYyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Basically, $subject says it all. It's pretty easy to reproduce:
delete all the records from a large table and execute any sequentially
scanning query before autocvacuum comes around and cleans the table
up; the query will be uncancellable. This can result in fairly
pathological behavior in i/o constrained systems because the query
will bog itself down writing out hint bits for minutes or hours
without any way to cancel or effective i/o throttling (unlike vacuum).

IMO, this should be backpatched, and is likely fixed by injecting an
interrupts check at a strategic location. But where? I was thinking
in heapgetpage() but here are no checks elsehwere in heapam.c which is
a red flag.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: sequential scans that pick up only deleted records do not honor query cancel or timeout
Date: 2012-05-22 21:08:32
Message-ID: 8558.1337720912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> Basically, $subject says it all. It's pretty easy to reproduce:
> delete all the records from a large table and execute any sequentially
> scanning query before autocvacuum comes around and cleans the table
> up; the query will be uncancellable. This can result in fairly
> pathological behavior in i/o constrained systems because the query
> will bog itself down writing out hint bits for minutes or hours
> without any way to cancel or effective i/o throttling (unlike vacuum).

> IMO, this should be backpatched, and is likely fixed by injecting an
> interrupts check at a strategic location. But where? I was thinking
> in heapgetpage() but here are no checks elsehwere in heapam.c which is
> a red flag.

heapgetpage() seems like the most reasonable place to me, as there we'll
only be making the check once per page not once per tuple.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: sequential scans that pick up only deleted records do not honor query cancel or timeout
Date: 2012-05-22 22:39:02
Message-ID: CAHyXU0xS_0xx92bxsMHfRRgo0e3rS1bg0ngbz9qN=Buee7P32A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, May 22, 2012 at 4:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> Basically, $subject says it all.  It's pretty easy to reproduce:
>> delete all the records from a large table and execute any sequentially
>> scanning query before autocvacuum comes around and cleans the table
>> up; the query will be uncancellable.  This can result in fairly
>> pathological behavior in i/o constrained systems because the query
>> will bog itself down writing out hint bits for minutes or hours
>> without any way to cancel or effective i/o throttling (unlike vacuum).
>
>> IMO, this should be backpatched, and is likely fixed by injecting an
>> interrupts check at a strategic location.  But where? I was thinking
>> in heapgetpage() but here are no checks elsehwere in heapam.c which is
>> a red flag.
>
> heapgetpage() seems like the most reasonable place to me, as there we'll
> only be making the check once per page not once per tuple.

ok. this fixes the issue:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
new file mode 100644
index 0d6fe3f..acef385
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** heapgetpage(HeapScanDesc scan, BlockNumb
*** 287,292 ****
--- 287,299 ----

LockBuffer(buffer, BUFFER_LOCK_UNLOCK);

+ /*
+ * We have to check for signals here because a long series of
+ * pages containing nothing but deleted tuples can cause control
+ * to remain in the scan loop for an unbounded amount of time.
+ */
+ CHECK_FOR_INTERRUPTS();
+
Assert(ntup <= MaxHeapTuplesPerPage);
scan->rs_ntuples = ntup;
}

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: sequential scans that pick up only deleted records do not honor query cancel or timeout
Date: 2012-05-22 23:13:12
Message-ID: 12013.1337728392@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Tue, May 22, 2012 at 4:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> heapgetpage() seems like the most reasonable place to me, as there we'll
>> only be making the check once per page not once per tuple.

> ok. this fixes the issue:

Well, actually it needs to be a bit earlier than that or it won't stop
non-pageatatime scans (cf the return at line 230 in HEAD). I was
thinking to place it at the point where we hold no buffer pin, just to
save a couple cycles in error cleanup:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0d6fe3f0acd38a87af759bd3d5196da504202486..0c67156390a6a1bf1fee0bb39b96e0da8cd55dbb 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** heapgetpage(HeapScanDesc scan, BlockNumb
*** 222,227 ****
--- 222,234 ----
scan->rs_cbuf = InvalidBuffer;
}

+ /*
+ * Be sure to check for interrupts at least once per page. Checks at
+ * higher code levels won't be able to stop a seqscan that encounters
+ * many pages' worth of consecutive dead tuples.
+ */
+ CHECK_FOR_INTERRUPTS();
+
/* read page using selected strategy */
scan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, page,
RBM_NORMAL, scan->rs_strategy);

But thanks for verifying that a check in this function does fix the
issue for you.

regards, tom lane