Re: Inaccuracy in VACUUM's tuple count estimates

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Inaccuracy in VACUUM's tuple count estimates
Date: 2014-06-12 11:40:59
Message-ID: 20140612114059.GA24710@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
> I figured it'd be easy enough to get a better estimate by adding another
> counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
> assuming that in-progress inserts and deletes will both commit).

Did you plan to backpatch that? My inclination would be no...

> I did
> that, and found that it helped Tim's test case not at all :-(. A bit of
> sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
> INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
> whether the transaction has since marked it for deletion:
>
> /*
> * It'd be possible to discern between INSERT/DELETE in progress
> * here by looking at xmax - but that doesn't seem beneficial for
> * the majority of callers and even detrimental for some. We'd
> * rather have callers look at/wait for xmin than xmax. It's
> * always correct to return INSERT_IN_PROGRESS because that's
> * what's happening from the view of other backends.
> */
> return HEAPTUPLE_INSERT_IN_PROGRESS;
>
> It did not use to blow this question off: back around 8.3 you got
> DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need
> less laziness + fuzzy thinking here. Maybe we should have a separate
> HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really*
> the case that callers other than VACUUM itself are okay with failing
> to make this distinction? I'm dubious: there are very few if any
> callers that treat the INSERT and DELETE cases exactly alike.

My current position on this is that we should leave the code as is <9.4
and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok
with that? The second best thing imo would be to discern and return
HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the
respective cases.
Which way would you like to go?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-06-12 11:51:36 Audit of logout
Previous Message Fujii Masao 2014-06-12 11:37:25 Re: replication commands and log_statements