Re: autovacuum_work_mem

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: autovacuum_work_mem
Date: 2013-12-16 10:12:57
Message-ID: 52AED229.10807@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/13/2013 08:40 PM, Alvaro Herrera wrote:
> Heikki Linnakangas escribió:
>
>> I haven't been following this thread in detail, but would it help if
>> we implemented a scheme to reduce (auto)vacuum's memory usage? Such
>> schemes have been discussed in the past, packing the list of dead
>> items more tightly.
>
> Well, it would help some, but it wouldn't eliminate the problem
> completely. Autovacuum scales its memory usage based on the size of the
> table. There will always be a table so gigantic that a maximum
> allocated memory is to be expected; and DBAs will need a way to limit
> the memory consumption even if we pack dead items more densely.
>
> I was playing with keeping item pointers for each page in a bitmapset.
> This was pretty neat and used a lot less memory than currently, except
> that I needed to allocate a large chunk of memory and then have
> bitmapsets use words within that large allocation space. It turned out
> to be too ugly so I abandoned it. With the "varbit encoding" thingy in
> the recent GIN patchset, maybe it would be workable.

The varbyte encoding is actually a very poor fit for vacuum. Vacuum
needs fast random access into the array when scanning indexes, and the
varbyte encoded item pointer lists used in gin don't allow that.

I couldn't find it in the archives now, but when we last discussed this,
Tom suggested that we divide the large chunk of memory that vacuum
allocates into two parts. The first part grows from the bottom up, and
the second part from top down, until there is no free space in the
middle anymore. For each heap page, there is one entry in the first
part, with the block number, and a pointer to an entry in the second
part. In the second part, there's a list of offset numbers on that page
(or a bitmap).

Another idea: Store only the least significant 20 bits the block number
of each item pointer, and use the remaining 12 bits for the offset
number. So each item pointer is stored as a single 32 bit integer. For
the top 12 bits of the block number, build a separate lookup table of
4096 entries, indexed by the top bits. Each entry in the lookup table
points to the beginning and end index in the main array where the
entries for that page range is stored. That would reduce the memory
usage by about 1/3, which isn't as good as the bitmap method when there
is a lot of dead tuples same pages, but would probably be a smaller patch.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru Hanada 2013-12-16 10:15:12 Re: Custom Scan APIs (Re: Custom Plan node)
Previous Message David Rowley 2013-12-16 09:55:50 Re: [PATCH] Negative Transition Aggregate Functions (WIP)