Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jan Lentfer <Jan(dot)Lentfer(at)web(dot)de>, Euler Taveira <euler(at)timbira(dot)com(dot)br>
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date: 2014-07-02 17:52:53
Message-ID: CAMkU=1zf8s7HJd+tzp_BPD4X8UYLitd+E3Q6mDDOe7jhRkv6GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 30, 2014 at 3:17 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Jeff Janes wrote:
>
>> In particular, pgpipe is almost an exact duplicate between them,
>> except the copy in vac_parallel.c has fallen behind changes made to
>> parallel.c. (Those changes would have fixed the Windows warnings). I
>> think that this function (and perhaps other parts as
>> well--"exit_horribly" for example) need to refactored into a common
>> file that both files can include. I don't know where the best place
>> for that would be, though. (I haven't done this type of refactoring
>> myself.)
>
> I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
> Maybe we should move pgpipe back to src/port and have pg_dump and this
> new thing use that. I'm not sure about the rest of duplication in
> vac_parallel.c; there might be a lot in common with what
> pg_dump/parallel.c does too. Having two copies of code is frowned upon
> for good reasons. This patch introduces 1200 lines of new code in
> vac_parallel.c, ugh.
>
> If we really require 1200 lines to get parallel vacuum working for
> vacuumdb, I would question the wisdom of this effort. To me, it seems
> better spent improving autovacuum to cover whatever it is that this
> patch is supposed to be good for --- or maybe just enable having a shell
> script that launches multiple vacuumdb instances in parallel ...

I would only envision using the parallel feature for vacuumdb after a
pg_upgrade or some other major maintenance window (that is the only
time I ever envision using vacuumdb at all). I don't think autovacuum
can be expected to handle such situations well, as it is designed to
be a smooth background process.

I guess the ideal solution would be for manual VACUUM to have a
PARALLEL option, then vacuumdb could just invoke that one table at a
time. That way you would get within-table parallelism which would be
important if one table dominates the entire database cluster. But I
don't foresee that happening any time soon.

I don't know how to calibrate the number of lines that is worthwhile.
If you write in C and need to have cross-platform compatibility and
robust error handling, it seems to take hundreds of lines to do much
of anything. The code duplication is a problem, but I don't think
just raw line count is, especially since it has already been written.

The trend in this project seems to be for shell scripts to eventually
get converted into C programs. In fact, src/bin/scripts now has no
scripts at all. Also it is important to vacuum/analyze tables in the
same database at the same time, otherwise you will not get much
speed-up in the ordinary case where there is only one meaningful
database. Doing that in a shell script would be fairly hard. It
should be pretty easy in Perl (at least for me--I'm sure others
disagree), but that also doesn't seem to be the way we do things for
programs intended for end users.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Abhijit Menon-Sen 2014-07-02 18:11:23 Re: 9.5 CF1
Previous Message Andrew Gierth 2014-07-02 17:52:04 Re: Aggregate function API versus grouping sets