Re: autovacuum and orphaned large objects

Lists: pgsql-hackers
From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: autovacuum and orphaned large objects
Date: 2011-10-24 04:56:47
Message-ID: 4EA4F00F.3040208@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

The main point of autovacuum is maintenance tasks. Currently, it executes
VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo
functionality into it. While dealing with large objects (LO), we have lo
contrib module that helps with LO maintenance but has some limitations (does
not handle DROP TABLE and TRUNCATE cases) and vacuumlo that does an excellent
job but have to be executed outside DBMS. The proposal is to clean up LO when
autovacuum triggers VACUUM; cleanup LO routine will starts after(?) VACUUM
command.

In a near future I want to propose that orphaned LO be cleaned up by VACUUM
but that a history for another thread...

Comments?

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum and orphaned large objects
Date: 2011-10-24 13:57:06
Message-ID: CA+TgmobAmtD51E=V5nOZSkWSPA1+Y_XaBsXcynFakd-Y2BqkDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 24, 2011 at 12:56 AM, Euler Taveira de Oliveira
<euler(at)timbira(dot)com> wrote:
> The main point of autovacuum is maintenance tasks. Currently, it executes
> VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo
> functionality into it. While dealing with large objects (LO), we have lo
> contrib module that helps with LO maintenance but has some limitations (does
> not handle DROP TABLE and TRUNCATE cases) and vacuumlo that does an
> excellent job but have to be executed outside DBMS. The proposal is to clean
> up LO when autovacuum triggers VACUUM; cleanup LO routine will starts
> after(?) VACUUM command.
>
> In a near future I want to propose that orphaned LO be cleaned up by VACUUM
> but that a history for another thread...
>
> Comments?

I think the main reason why vacuumlo is a contrib module rather than
in core is that it is just a heuristic, and it might not be what
everyone wants to do. You could store a bunch of large objects in the
database and use the returned OIDs to generate links that you email to
users, and then when the user clicks on the link we retrieve the
corresponding LO and send it to the user over HTTP. In that design,
there are no tables in the database at all, yet the large objects
aren't orphaned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum and orphaned large objects
Date: 2011-10-24 14:25:16
Message-ID: 4EA5754C.9060901@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24-10-2011 10:57, Robert Haas wrote:
> I think the main reason why vacuumlo is a contrib module rather than
> in core is that it is just a heuristic, and it might not be what
> everyone wants to do. You could store a bunch of large objects in the
> database and use the returned OIDs to generate links that you email to
> users, and then when the user clicks on the link we retrieve the
> corresponding LO and send it to the user over HTTP. In that design,
> there are no tables in the database at all, yet the large objects
> aren't orphaned.
>
Uau, what a strange method to solve a problem and possibly bloat your
database. No, I'm not suggesting that we forbid it. The proposed method could
cleanup orphaned LO in 95% (if not 99%) of the use cases.

I've never heard someone using LO like you describe it. It seems strange that
someone distributes an OID number but (s)he does not store its reference at
the same database. Yes, it is a possibility but ...

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum and orphaned large objects
Date: 2011-10-24 14:36:55
Message-ID: 1141.1319467015@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> The main point of autovacuum is maintenance tasks. Currently, it executes
> VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo
> functionality into it.

I'm not terribly thrilled with that because (a) large objects seem like
mostly a legacy feature from here, and (b) it's hard to see how to
implement it without imposing overhead on everybody, whether they use
LOs or not. This is especially problematic if you're proposing that
cleanup triggers not be required.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum and orphaned large objects
Date: 2011-10-24 15:28:35
Message-ID: CA+TgmoYZioo0pg7USBrOVgYx5uDcE0d_NcxsLqNyZQ-kX7FGYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 24, 2011 at 10:25 AM, Euler Taveira de Oliveira
<euler(at)timbira(dot)com> wrote:
> On 24-10-2011 10:57, Robert Haas wrote:
>>
>> I think the main reason why vacuumlo is a contrib module rather than
>> in core is that it is just a heuristic, and it might not be what
>> everyone wants to do.  You could store a bunch of large objects in the
>> database and use the returned OIDs to generate links that you email to
>> users, and then when the user clicks on the link we retrieve the
>> corresponding LO and send it to the user over HTTP.  In that design,
>> there are no tables in the database at all, yet the large objects
>> aren't orphaned.
>>
> Uau, what a strange method to solve a problem and possibly bloat your
> database. No, I'm not suggesting that we forbid it. The proposed method
> could cleanup orphaned LO in 95% (if not 99%) of the use cases.
>
> I've never heard someone using LO like you describe it. It seems strange
> that someone distributes an OID number but (s)he does not store its
> reference at the same database. Yes, it is a possibility but ...

I guess we could make it an optional behavior, but once you go that
far then you have to wonder whether what's really needed here is a
general-purpose task scheduler. I mean, the autovacuum launcher's
idea about how often to vacuum the database won't necessarily match
the user's idea of how often they want to vacuum away large objects -
and if the user is doing something funky (like storing arrays of large
object OIDs, or inexplicably storing them using numeric or int8) then
putting it in the backend removes a considerable amount of
flexibility. Another case where vacuumlo will fall over is if you
have a very, very large table with an OID column, but with lots of
duplicate values so that the number of OIDs actually referenced is
much smaller. You might end up doing a table scan on the large table
every time this logic kicks in, and that might suck.

I'm sort of unexcited about the idea of doing a lot of engineering
around this; it seems to me that the only reasons we still have a
separate large object facility rather than just letting everyone go
through regular tables with toastable columns are (1) the size limit
is 2GB rather than 1GB and (2) you can read and write parts of objects
rather than the whole thing. If we're going to do some more
engineering here, I'd rather set our sights a little higher.
Complaints I often hear about the large object machinery include (1)
2GB is still not enough, (2) 4 billion large objects is not enough,
(3) the performance is inadequate, particularly with large numbers of
large objects from possibly-unrelated subsystems slammed into a single
table, and (4) it would be nice to be able to partial reads and writes
on any toastable field, not just large objects. I'm not saying that
the problem you're complaining about isn't worth fixing in the
abstract, and if it seemed like a nice, clean fix I'd be all in favor,
but I just don't think it's going to be very simple, and for the
amount of work involved I'd rather get a little bit more bang for the
buck.

Of course, you don't have to agree with me on any of this; I'm just
giving you my take on it. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum and orphaned large objects
Date: 2011-10-24 15:29:52
Message-ID: 4EA58470.8020204@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24-10-2011 11:36, Tom Lane wrote:
> Euler Taveira de Oliveira<euler(at)timbira(dot)com> writes:
>> The main point of autovacuum is maintenance tasks. Currently, it executes
>> VACUUM and ANALYZE commands. I want to propose that we incorporate vacuumlo
>> functionality into it.
>
> I'm not terribly thrilled with that because (a) large objects seem like
> mostly a legacy feature from here, and
>
Right, but there isn't a solution for > 1 GB column data besides LO.

> (b) it's hard to see how to
> implement it without imposing overhead on everybody, whether they use
> LOs or not. This is especially problematic if you're proposing that
> cleanup triggers not be required.
>
I was thinking about starting the LO cleanup after autovacuum finishes the
VACUUM command (so no trigger, no new mechanism). And about the overhead
imposed, it will only execute the cleanup code in the tables that have oid/lo
columns (this information will be collected when the autovacuum collects table
information).

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento