Re: surprised to find bloat in insert-only table

Lists: pgsql-admin
From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: surprised to find bloat in insert-only table
Date: 2011-06-03 15:10:51
Message-ID: isathr$kig$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


I recently set up partitioning on a table that sees heavy insert
traffic. There are never updates or deletes, we just drop the partitions
later.

It's my understanding that bloat can only appear through updates or
deletes, but these partitions are reported to have significant bloat in
them. Where else can this come from and how I can I reduce it?

I'm using a "bloat" view with 9.0.3 which might the same as this one:

https://wiki.postgresql.org/wiki/Show_database_bloat

This shows that each index on the table has over a gig of bloat:

select distinct(iname), wastedbytes, wastedsize from bloat where
wastedbytes > 0 order by wastedbytes DESC limit 20;

And this query also shows the same amount of bloat in the table itself:

select distinct(tablename), wastedbytes, wastedsize from bloat where
wastedbytes > 0 order by wastedbytes DESC limit 10;

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Stosberg <mark(at)summersault(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: surprised to find bloat in insert-only table
Date: 2011-06-03 16:02:02
Message-ID: 28332.1307116922@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Mark Stosberg <mark(at)summersault(dot)com> writes:
> I recently set up partitioning on a table that sees heavy insert
> traffic. There are never updates or deletes, we just drop the partitions
> later.

> It's my understanding that bloat can only appear through updates or
> deletes, but these partitions are reported to have significant bloat in
> them. Where else can this come from and how I can I reduce it?

> I'm using a "bloat" view with 9.0.3 which might the same as this one:

> https://wiki.postgresql.org/wiki/Show_database_bloat

> This shows that each index on the table has over a gig of bloat:

> select distinct(iname), wastedbytes, wastedsize from bloat where
> wastedbytes > 0 order by wastedbytes DESC limit 20;

> And this query also shows the same amount of bloat in the table itself:

> select distinct(tablename), wastedbytes, wastedsize from bloat where
> wastedbytes > 0 order by wastedbytes DESC limit 10;

It's hard to evaluate that without knowing what the actual table/index
sizes are, or IOW what is the reported bloat on a percentage basis?

The view you mention isn't tremendously accurate --- AFAICS it isn't
accounting for alignment padding between fields, page headers, and some
other things. And it will consider the unused space on a page to be
"bloat" even if it's too small to fit another tuple. So expecting the
number to be zero is hopelessly optimistic. Also, indexes generally
don't even try to pack pages completely full, so a larger percentage of
unused space is to be expected for them.

regards, tom lane


From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: surprised to find bloat in insert-only table
Date: 2011-06-03 17:03:31
Message-ID: isb453$uhp$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


Thanks for reply, Tom.

> It's hard to evaluate that without knowing what the actual table/index
> sizes are, or IOW what is the reported bloat on a percentage basis?

The table size is reported as: 4036 MB according to:

select pg_size_pretty(pg_table_size('table_name'));

Compared to 1669 MB reported as table bloat in the 'bloat' view. So,
the bloat is about 40% of the total size.

For an index, it's 410 MB of bloat, vs 1669 MB for an index size.

One thing that looks suspicious is that the exact same number of bytes
is being report for the table as well as each index.

> The view you mention isn't tremendously accurate --- AFAICS it isn't
> accounting for alignment padding between fields, page headers, and some
> other things. And it will consider the unused space on a page to be
> "bloat" even if it's too small to fit another tuple. So expecting the
> number to be zero is hopelessly optimistic. Also, indexes generally
> don't even try to pack pages completely full, so a larger percentage of
> unused space is to be expected for them.

Thanks for the clarifications! In summary, it sounds like there is some
bloat that I need to live with, but at least with partitioning the
amount of bloat will stay relatively constant.

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Stosberg <mark(at)summersault(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: surprised to find bloat in insert-only table
Date: 2011-06-03 17:39:12
Message-ID: 1739.1307122752@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Mark Stosberg <mark(at)summersault(dot)com> writes:
> Compared to 1669 MB reported as table bloat in the 'bloat' view. So,
> the bloat is about 40% of the total size.
> For an index, it's 410 MB of bloat, vs 1669 MB for an index size.

Hm ... 40% unused space wouldn't be surprising at all for an index.
The traditional rule of thumb for a b-tree index is that the steady
state fill factor is about 2/3rds. You can do better for an index that
is loaded in increasing order (eg, an index on a serial or timestamp
column typically has a higher fill factor) but indexes on columns that
are more random are not going to see that.

> One thing that looks suspicious is that the exact same number of bytes
> is being report for the table as well as each index.

Um. In that case I'd say your view is flat out wrong...

regards, tom lane