Re: vacuumlo patch

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Tim <elatllat(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Aron Wieck <me(at)eunice(dot)de>
Subject: Re: vacuumlo patch
Date: 2011-08-07 06:36:41
Message-ID: CAK3UJRG3p0mhmkm_StqjSQ3H6M214OXOXdVz78WDoaXm1cGAwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 7, 2011 at 12:41 AM, Tim <elatllat(at)gmail(dot)com> wrote:
> Hi Josh,
>
> Thanks for help. Attached is a patch including changes suggested in your
> comments.
>
> Excerpts from Josh's message On Sat, Aug 6, 2011 at 9:57 PM:
>>
>>  1. It wasn't clear to me whether you're OK with Aron's suggested
>> tweak, please include it in your patch if so.
>
>
> I decided to and included Aron's tweak.
> I'm not sure if the PostgreSQL project prefers simplified code over faster
> run time (if only under rare conditions).
> Who knows maybe the compiler will re-optimize it anyway.

Thanks for the quick update.

It was pretty hard for me to compare your initial versions with
Aron's, since I had trouble with those patches. But if it's just a
minor change to how transaction_limit is handled, I wouldn't worry
about it; the vast majority of vacuumlo's time is going to be spent in
the database AFAICT.

Incidentally, if someone wants to optimize vacuumlo, I see some
low-hanging fruit there, such as maybe avoiding that expensive loop of
DELETEs out of vacuum_l entirely with a bit smarter queries. But
that's a problem for another day.

>>   2. I think it might be better to use INT_MAX instead of hardcoding
>> 2147483647.
>
> Fixed, though I'm not sure if I put the include in the preferred order.

Hrm yeah.. maybe keep it so that all the system headers are together
(i.e. put <limits.h> after <fcntl.h>). At least, that's how similar
header files like pg_upgrade.h seem to be structured.

>>   5. transaction_limit is an int, yet you're using strtol() which
>> returns long. Maybe you want to use atoi() or make transaction_limit a
>> long?
>
> The other INT in the code is using strtol() so I also used strtol instead of
> changing more code.
> I'm not sure if there are any advantages or disadvantages.
> maybe it's easier to prevent/detect/report overflow wraparound.

Ugh, yeah you're right. I think this vacuumlo.c code is not such a
great role model for clean code :-) Probably not a big deal, since you
are checking for param.transaction_limit < 0 which would detect
overflow.

I'm not sure if the other half of that check, (param.transaction_limit
> INT_MAX) has any meaning, i.e. how can an int ever be > INT_MAX?

> I can't think of a good reason anyone would want to limit transaction to
> something more than INT_MAX.
> Implementing that would best be done in separate and large patch as
> PQntuples returns an int and is used on 271 lines across PostgreSQL.

Right, I wasn't suggesting that would actually be useful - I just
thought the return type of strtol() or atoi() should agree with its
lvalue.

I've only spent a little bit of time with this patch and LOs so far,
but one general question/idea I have is whether the "-l LIMIT" option
could be made smarter in some way. Say, instead of making the user
figure out how many LOs he can feasibly delete in a single transaction
(how is he supposed to know anyway, trial and error?) could we figure
out what that limit should be based on max_locks_per_transaction? and
handle deleting all the ophan LOs in several transactions for the user
automatically?

Josh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2011-08-07 07:49:21 Re: vacuumlo patch
Previous Message Tim 2011-08-07 04:41:25 Re: vacuumlo patch