Re: Unlogged tables, persistent kind

Lists: pgsql-hackers
From: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To: simon(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-25 18:00:28
Message-ID: 30910.44583.qm@web29008.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The amount of data loss on a big
> table will be <1% of the data loss
>caused by truncating the whole table.

If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful?


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-25 18:03:29
Message-ID: 4DB5B771.2050908@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2011-04-25 20:00, Leonardo Francalanci wrote:
> > The amount of data loss on a big table will be <1% of the data
> > loss caused by truncating the whole table.
>
> If that 1% is random (not time/transaction related), usually you'd
> rather have an empty table. In other words: is a table that is not
> consistant with anything else in the db useful?
>
Depends on the application, if it serves for pure caching then it is
fully acceptable and way
better than dropping everything.

--
Jesper


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Jesper Krogh" <jesper(at)krogh(dot)cc>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-25 18:12:27
Message-ID: 4DB5733B020000250003CDBD@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> On 2011-04-25 20:00, Leonardo Francalanci wrote:
>>> The amount of data loss on a big table will be <1% of the data
>>> loss caused by truncating the whole table.
>>
>> If that 1% is random (not time/transaction related), usually
>> you'd rather have an empty table. In other words: is a table
>> that is not consistant with anything else in the db useful?
>>
> Depends on the application, if it serves for pure caching then it
> is fully acceptable and way better than dropping everything.

I buy this *if* we can be sure we're not keeping information which
is duplicated or mangled, and if we can avoid crashing the server to
a panic because of broken pointers or other infelicities. I'm not
sure that can't be done, but I don't think I've heard an explanation
of how that could be accomplished, particularly without overhead
which would wipe out the performance benefit of unlogged tables.
(And without a performance benefit, what's the point?)

-Kevin


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-25 18:13:35
Message-ID: BANLkTi=1zENA7XSvNkwWDzf93NnKp3NXCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 25, 2011 at 2:03 PM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> On 2011-04-25 20:00, Leonardo Francalanci wrote:
>>> The amount of data loss on a big table will be <1% of the data
>>> loss caused by truncating the whole table.
>>
>> If that 1% is random (not time/transaction related), usually you'd
>> rather have an empty table. In other words: is a table that is not
>> consistant with anything else in the db useful?
>>
> Depends on the application, if it serves for pure caching then it is fully
> acceptable and way
> better than dropping everything.

Whoah...

When cacheing, the application already needs to be able to cope with
the case where there's nothing in the cache.

This means that if the cache gets truncated, it's reasonable to expect
that the application won't get deranged - it already needs to cope
with the case where data's not there and needs to get constructed.

In contrast, if *wrong* data is in the cache, that could very well
lead to wrong behavior on the part of the application.

And there may not be any mechanism aside from cache truncation that
will rectify that.

It seems to me that it's a lot riskier to try to preserve contents of
such tables than it is to truncate them.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
Cc: robertmhaas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-25 18:17:05
Message-ID: BANLkTik_QFsbncWLpP+R-K-i2oCJbFL45w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 25, 2011 at 7:00 PM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>> The amount of data loss on a big
>> table will be <1% of the data loss
>>caused by truncating the whole table.
>
> If that 1% is random (not time/transaction related), usually you'd rather have an empty table.

Why do you think it would be random?

> In other words: is a table that is not consistant with anything else in the db useful?

That's too big a leap. Why would it suddenly be inconsistent with the
rest of the database?

Not good arguments.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: robertmhaas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-26 07:49:57
Message-ID: 987768.25492.qm@web29015.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > If that 1% is random (not time/transaction related), usually you'd rather
>have an empty table.
>
> Why do you think it would be random?

"Heap blocks would be zeroed if they were found to be damaged, following a
crash."

If you erase full blocks, you have no idea what data you erased; it could be
something changed 1 hour ago, 1 month ago, 1 year ago. This is very different
from,
say, synchronous_commit=off: in that case, "the most recent transactions may be
lost if the database should crash". In your case, "some (who knows which???)
data
is lost". So, to me that sounds like random loss. I don't think that that is
different
from a corrupted table. You're not deleting rows recently changed; you're
deleting
everything that is "physically close" to it.

> > In other words: is a table that is not consistant with anything else in the
>db useful?
>
> That's too big a leap. Why would it suddenly be inconsistent with the
> rest of the database?

If you delete some data, and you have no idea what data you lost, I don't think
you have a
consistent db. Unless, of course, your table has no relation with any other
table in the db.

Of course, all these thoughts are based on the assumption that I know what
happens when a
block is erased; but my knowledge of postgresql internals is not so good, so I
might be
*very* wrong


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unlogged tables, persistent kind
Date: 2011-04-26 14:45:46
Message-ID: BANLkTindTtcqbOnU_qHN2DfdcQwaVUvWdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/4/25 Christopher Browne <cbbrowne(at)gmail(dot)com>:
> On Mon, Apr 25, 2011 at 2:03 PM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
>> On 2011-04-25 20:00, Leonardo Francalanci wrote:
>>>> The amount of data loss on a big table will be <1% of the data
>>>> loss caused by truncating the whole table.
>>>
>>> If that 1% is random (not time/transaction related), usually you'd
>>> rather have an empty table. In other words: is a table that is not
>>> consistant with anything else in the db useful?
>>>
>> Depends on the application, if it serves for pure caching then it is fully
>> acceptable and way
>> better than dropping everything.
>
> Whoah...
>
> When cacheing, the application already needs to be able to cope with
> the case where there's nothing in the cache.
>
> This means that if the cache gets truncated, it's reasonable to expect
> that the application won't get deranged - it already needs to cope
> with the case where data's not there and needs to get constructed.

That is true but the application performance has already to cope with
a server crash/restart. Many things you can add to make the restart
(for the application) more 'smooth' is good.

>
> In contrast, if *wrong* data is in the cache, that could very well
> lead to wrong behavior on the part of the application.
>
> And there may not be any mechanism aside from cache truncation that
> will rectify that.
>
> It seems to me that it's a lot riskier to try to preserve contents of
> such tables than it is to truncate them.
> --
> When confronted by a difficult problem, solve it by reducing it to the
> question, "How would the Lone Ranger handle this?"
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
Cc: robertmhaas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-05-03 19:21:30
Message-ID: BANLkTi=eLiF9fiq7LqoOiyJM+Ls5nOb7BQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 26, 2011 at 8:49 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>> > If that 1%  is random (not time/transaction related), usually you'd rather
>>have an empty  table.
>>
>> Why do you think it would be random?
>
> "Heap blocks would be zeroed if they were found to be damaged, following a
> crash."
>
> If you erase full blocks, you have no idea what data you erased; it could be
> something changed 1 hour ago, 1 month ago, 1 year ago. This is very different
> from,
> say, synchronous_commit=off: in that case, "the most recent transactions may be
> lost if the database should crash". In your case, "some (who knows which???)
> data
> is lost". So, to me that sounds like random loss. I don't think that that is
> different
> from a corrupted table. You're not deleting rows recently changed; you're
> deleting
> everything that is "physically close" to it.
>
>> > In other  words: is a table that is not consistant with anything else in the
>>db  useful?
>>
>> That's too big a leap. Why would it suddenly be inconsistent with  the
>> rest of the database?
>
>
> If you delete some data, and you have no idea what data you lost, I don't think
> you have a
> consistent db. Unless, of course, your table has no relation with any other
> table in the db.
>
> Of course, all these thoughts are based on the assumption that I know what
> happens when a
> block is erased; but my knowledge of postgresql internals is not so good, so I
> might be
> *very* wrong

You're assuming that there are referential links *from* other tables
to the table with damage. In which case you would be correct. But of
course, if you needed that data for integrity you would never do that,
so the problem is a nonexistent use case. The suggested mode is for
Fact data, not reference tables.

The current assessment is that UNLOGGED tables are useful only for
running a data cache. If the database crashes, then the table is
truncated and you must refill the cache. If that is the case, then it
must surely be better to have a cache that is already 99% full, than
one which starts at empty. There is no damage or loss because parts of
the cache were missing.

Unlogged Tables are currently so volatile they are unusable for any
other purpose. I want to see a table that is useful for low value
data, such as sensor data.
If you had 10 TB of sensor data and the database crashes, then you
want to lose a few blocks, not the whole lot. Low value => rare, minor
loss is acceptable, but it doesn;t mean total data loss is acceptable.
For that use case, total loss is catastrophic, not just mildly
irritating. If you are a Telco, losing a few minutes billing data
costs much less than having every server have better hardware so it
can cope with high WAL traffic as well. They don't want to lose the
data, but its a cost based trade off. Consistency is not an issue, you
are just missing some data. That is normal anyway, since sensor data
generators (mobile devices etc) frequently fail, are offline, turned
off etc, so there isn't even a definition of what complete data is
supposed to look like. The missing data looks exactly like lots of
people turned their phones off for a few minutes.

So my suggestion makes UNLOGGED tables more useful for the use case
they were designed to address - cached data (AIUI), plus they allow
another use case that doesn't seem to be well understood, low value
data in massive data volumes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Leonardo Francalanci <m_lists(at)yahoo(dot)it>, robertmhaas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-05-03 20:29:49
Message-ID: BANLkTi=58zymki97XJvNXFG9=UgDkDNo1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 3, 2011 at 8:21 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The current assessment is that UNLOGGED tables are useful only for
> running a data cache. If the database crashes, then the table is
> truncated and you must refill the cache. If that is the case, then it
> must surely be better to have a cache that is already 99% full, than
> one which starts at empty. There is no damage or loss because parts of
> the cache were missing.

That's not necessarily the case of course. I've written caches before
where a set of records would be present for each object being cached.
I could deal with a whole set not being present but if just some of
the records were missing then I would serve incorrect results. Since
they were created in a single transacion I should be able to rely on
the whole set being present or missing consistently.

That doesn't mean there aren't cases where that's true of course. In
an ideal world the database would guarantee that you couldn't use the
table in this way and fail to get the consistency you expect.
Something like getting an error if you try to modify two rows in an
unlogged table during a single transaction. I'm not sure how to do
that reasonably though.

> For that use case, total loss is catastrophic, not just mildly
> irritating. If you are a Telco, losing a few minutes billing data
> costs much less than having every server have better hardware so it
> can cope with high WAL traffic as well. They don't want to lose the
> data, but its a cost based trade off.

This analysis is dead on. That's precisely how businesses evaluate
this question.

> Consistency is not an issue, you
> are just missing some data. That is normal anyway, since sensor data
> generators (mobile devices etc) frequently fail, are offline, turned
> off etc, so there isn't even a definition of what complete data is
> supposed to look like. The missing data looks exactly like lots of
> people turned their phones off for a few minutes.

I don't think that's true however. Consider if I have a rollup table
that contains aggregated sums of that data. If you lose some of the
records then my aggregates don't add up to the "correct" values any
more.

Or consider if you are counting sensor data like total data
transferred and session count -- and then reporting the average data
transferred per session. And you accidentally lose a bunch of
sessions. Now your data/session report will be reporting false
information.

This is true even if you only lose recently committed rows. But losing
whole blocks means you risk losing random old data which makes it hard
to work around by, say, purging recently aggregated data.

> So my suggestion makes UNLOGGED tables more useful for the use case
> they were designed to address - cached data (AIUI), plus they allow
> another use case that doesn't seem to be well understood, low value
> data in massive data volumes.

There other approaches as well. Foreign data wrappers mean you could
do things like store the low value data in raw text files or other
systems like memcached or Hadoop or whatever. I'm not saying there's
no reason to do something in Postgres but if you're being bitten by
Postgres's block-oriented random access storage it may be a problem
too fundamental to solve without addressing the underlying storage
strategy?

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Leonardo Francalanci <m_lists(at)yahoo(dot)it>, robertmhaas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-05-03 20:54:55
Message-ID: 25638.1304456095@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> You're assuming that there are referential links *from* other tables
> to the table with damage. In which case you would be correct. But of
> course, if you needed that data for integrity you would never do that,
> so the problem is a nonexistent use case. The suggested mode is for
> Fact data, not reference tables.

So I suppose your notion of "fact data" includes no fields that are wide
enough to need toasting? Because as soon as you have any out-of-line
values, there's an equivalent of foreign keys behind the scenes, where
the user can't see it (until he gets "missing chunk number" or some such
error).

> The current assessment is that UNLOGGED tables are useful only for
> running a data cache. If the database crashes, then the table is
> truncated and you must refill the cache. If that is the case, then it
> must surely be better to have a cache that is already 99% full, than
> one which starts at empty. There is no damage or loss because parts of
> the cache were missing.

A cache that starts at 99% full of untrustworthy data is NOT better.

> Unlogged Tables are currently so volatile they are unusable for any
> other purpose. I want to see a table that is useful for low value
> data, such as sensor data.

Basically, you're being hopelessly optimistic both about the extent to
which a crash is likely to render data inconsistent, and our ability to
detect that inconsistency. It doesn't matter whether the data is "low
value", the difficulty of cleaning up remains the same. I don't want to
deal with trying to detect that, and I definitely don't want to dump the
problems onto users.

regards, tom lane


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables, persistent kind
Date: 2011-05-03 21:16:55
Message-ID: BANLkTindyNZ79h7UTXJiwUJ+dDy-QD0AMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 3, 2011 at 4:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> You're assuming that there are referential links *from* other tables
>> to the table with damage. In which case you would be correct. But of
>> course, if you needed that data for integrity you would never do that,
>> so the problem is a nonexistent use case. The suggested mode is for
>> Fact data, not reference tables.
>
> So I suppose your notion of "fact data" includes no fields that are wide
> enough to need toasting?  Because as soon as you have any out-of-line
> values, there's an equivalent of foreign keys behind the scenes, where
> the user can't see it (until he gets "missing chunk number" or some such
> error).
>
>> The current assessment is that UNLOGGED tables are useful only for
>> running a data cache. If the database crashes, then the table is
>> truncated and you must refill the cache. If that is the case, then it
>> must surely be better to have a cache that is already 99% full, than
>> one which starts at empty. There is no damage or loss because parts of
>> the cache were missing.
>
> A cache that starts at 99% full of untrustworthy data is NOT better.

That's a bit pessimistic.

The case that bugs me is that a cache that's 99% trustworthy, but
where I have no idea that:
a) 1% of it is crud, and
b) Which 1% of it is crud
is still a pretty unacceptable scenario.

I head back to our policy for handling caches:
If in doubt, TRUNCATE.

That policy would be nicely consistent with the way 9.1 deals with
unlogged tables.

>> Unlogged Tables are currently so volatile they are unusable for any
>> other purpose. I want to see a table that is useful for low value
>> data, such as sensor data.
>
> Basically, you're being hopelessly optimistic both about the extent to
> which a crash is likely to render data inconsistent, and our ability to
> detect that inconsistency.  It doesn't matter whether the data is "low
> value", the difficulty of cleaning up remains the same.  I don't want to
> deal with trying to detect that, and I definitely don't want to dump the
> problems onto users.

+1, on both grounds.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"