Re: Fwd: [GENERAL] 4B row limit for CLOB tables

Lists: pgsql-generalpgsql-hackers
From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: 4B row limit for CLOB tables
Date: 2015-01-29 17:51:00
Message-ID: CAL1QdWeLLVN6bm_Se7Ms+G7jjh8YXnKqA0Pd+G4kqS8uHKWdew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hello. I see on this page a mention of basically a 4B row limit for
tables that have BLOB's
https://wiki.postgresql.org/wiki/BinaryFilesInDB

Is this fact mentioned in the documentation anywhere? Is there an
official source for this? (If not, maybe consider this a feature
request to mention it in the documentation on BLOB).
Cheers and thanks.
-roger-


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: 4B row limit for CLOB tables
Date: 2015-01-29 17:53:48
Message-ID: CAL1QdWfoG5sLKeaLLNpXDDmeE-ZwcWMSxXa_DBpd4_oyQRN1ZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
> Hello. I see on this page a mention of basically a 4B row limit for
> tables that have BLOB's

Oops I meant for BYTEA or TEXT columns, but it's possible the
reasoning is the same...

> https://wiki.postgresql.org/wiki/BinaryFilesInDB
>
> Is this fact mentioned in the documentation anywhere? Is there an
> official source for this? (If not, maybe consider this a feature
> request to mention it in the documentation on BLOB).
> Cheers and thanks.
> -roger-
>


From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Roger Pack <rogerdpack2(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: 4B row limit for CLOB tables
Date: 2015-01-29 18:10:12
Message-ID: 54CA7784.8000808@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 01/29/2015 09:51 AM, Roger Pack wrote:
> Hello. I see on this page a mention of basically a 4B row limit for
> tables that have BLOB's
> https://wiki.postgresql.org/wiki/BinaryFilesInDB
>
> Is this fact mentioned in the documentation anywhere? Is there an
> official source for this? (If not, maybe consider this a feature
> request to mention it in the documentation on BLOB).

Take a look at:

http://www.postgresql.org/about/

Pretty sure it has to do with this:

Maximum Table Size 32 TB

> Cheers and thanks.
> -roger-
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com


From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: 4B row limit for CLOB tables
Date: 2015-01-29 18:41:58
Message-ID: C996CBCD-E825-4586-8A02-B6F21E486FBB@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:

> On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>> Hello. I see on this page a mention of basically a 4B row limit for
>> tables that have BLOB's
>
> Oops I meant for BYTEA or TEXT columns, but it's possible the
> reasoning is the same...

It only applies to large objects, not bytea or text.

>> https://wiki.postgresql.org/wiki/BinaryFilesInDB

Some of that looks incorrect or out of date. (e.g. large objects can be a lot
bigger than 2GB in 9.3+).

>>
>> Is this fact mentioned in the documentation anywhere? Is there an
>> official source for this? (If not, maybe consider this a feature
>> request to mention it in the documentation on BLOB).
>> Cheers and thanks.
>> -roger

I'm not sure whether it's mentioned explicitly, but large objects are
referenced by an OID, which is a 32 bit value (and a global resource).

If you had 4B BLOBs, though, running out of OIDs would probably be
the least of your worries.

Cheers,
Steve


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: 4B row limit for CLOB tables
Date: 2015-01-29 19:24:50
Message-ID: 20150129142450.b83ce5d0652c39791292d0d5@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, 29 Jan 2015 10:41:58 -0800
Steve Atkins <steve(at)blighty(dot)com> wrote:
>
> >> Is this fact mentioned in the documentation anywhere? Is there an
> >> official source for this? (If not, maybe consider this a feature
> >> request to mention it in the documentation on BLOB).
> >> Cheers and thanks.
> >> -roger
>
> I'm not sure whether it's mentioned explicitly, but large objects are
> referenced by an OID, which is a 32 bit value (and a global resource).
>
> If you had 4B BLOBs, though, running out of OIDs would probably be
> the least of your worries.

Because of how other RDBMs systems use BLOB-ish types, I think a lot of
people get confused about when to use bytea vs. a large object in
PostgreSQL ... and as a result, end up using large objects more often
than is really necessary.

Large objects are for LARGE data ... keep in mind that a bytea column
can store up to 4G of data. While that seems excessive, it's perfectly
reasonable to use it to store images and other data that's frequently
in the "several megabytes" range. In general, if you can transfer the
entirety of the data in a single shot, then bytea will work fine for
you.

Large objects are for something more like streaming media, where it's
impractical to store the entire file in memory, even for a short time.
I.e. you'd read ~100k from the DB into application memory, do
processing on that data, then discard it and read another 100k. While
large objects certainly fill a nitch and for some uses are the only
way to make things work, I have never actually seen an implementation
where large objects were the right solution to the problem. (Of course,
I've never worked in the movie or music industry) but I've found
that bytea is usually the correct storage method for things like PDF
files, wordprocessor files, images, etc ... anything where the entire
file needs to be delivered before it can be used.

--
Bill Moran


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: 4B row limit for CLOB tables
Date: 2015-01-29 21:06:13
Message-ID: CAL1QdWfb-p5kE9DT2pMqBxohaKG=vxmDREmSBjc+7TkbOeKBBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Forgot to reply all on this one, many thanks to Steve Adrian and Bill
for their answers.

On Jan 29, 2015, at 12:32 PM, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:

> On 1/29/15, Steve Atkins <steve(at)blighty(dot)com> wrote:
>>
>> On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>>
>>> On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>>>> Hello. I see on this page a mention of basically a 4B row limit for
>>>> tables that have BLOB's
>>>
>>> Oops I meant for BYTEA or TEXT columns, but it's possible the
>>> reasoning is the same...
>>
>> It only applies to large objects, not bytea or text.
>>
>>>> https://wiki.postgresql.org/wiki/BinaryFilesInDB
>>
>> Some of that looks incorrect or out of date. (e.g. large objects can be a
>> lot
>> bigger than 2GB in 9.3+).
>>
>>
>>>>
>>>> Is this fact mentioned in the documentation anywhere? Is there an
>>>> official source for this? (If not, maybe consider this a feature
>>>> request to mention it in the documentation on BLOB).
>>>> Cheers and thanks.
>>>> -roger
>>
>> I'm not sure whether it's mentioned explicitly, but large objects are
>> referenced by an OID, which is a 32 bit value (and a global resource).
>
> Thanks for the info, precisely what I was looking for.
> As a following up, could you elaborate on what you mean by "global
> resource"? I believe OID's are generated from a global counter. Does
> this mean the maximum number of large objects in the database is 4B?

Well, OIDs are generated from a global counter, but there's nothing to
stop that wrapping around.

OIDs are used in a lot of places in the system - to identify tables,
and functions
and loaded modules and suchlike, and duplicates are prevented
by unique indexes or similar. But that means that if the OID counter were
to wrap around and return an OID that was already in use in the same
context then the attempt to use it would fail. For instance, you might
try to create a table, and it would fail because the "next" OID was already
used to specify another table.

Wrapping the OID counter around will cause all sorts of things to break.

Use of OIDs by the user (as opposed to by postgresql itself for internal
bookkeeping) has been deprecated for years.

That's one reason, but not the only reason, that I don't believe anyone
should every use the postgresql large object infrastructure. For small
(megabytes rather than gigabytes) chunks of data that might be processed
in the database or might not, bytea or text types are the right thing. For
anything larger, or anything that's not actually processed within the database
(e.g. images or PDFs handled by a webapp) then leaving the file on the
filesystem and just storing metadata in the database is usually the right
thing.

> If you actually had that many BLOB's (and the counter wrapped) I
> assume that lo_create would start failing [i.e. it has some kind of
> uniqueness constraint on the oid]? Or something like that?

lo_create() would fail, but so would the rest of the database. Nothing
would work.

A billion large objects is too many. (Exactly a billion too many, in fact).

Cheers,
Steve


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: Steve Atkins <steve(at)blighty(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: 4B row limit for CLOB tables
Date: 2015-01-29 21:58:14
Message-ID: CAL1QdWe-JnsD9CBVUffUoaQrUDT8a3MHLJn+La5um=-mW==ULw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 1/29/15, Steve Atkins <steve(at)blighty(dot)com> wrote:
>
> On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>
>> On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>>> Hello. I see on this page a mention of basically a 4B row limit for
>>> tables that have BLOB's
>>
>> Oops I meant for BYTEA or TEXT columns, but it's possible the
>> reasoning is the same...
>
> It only applies to large objects, not bytea or text.

OK I think I figured out possibly why the wiki says this. I guess
BYTEA entries > 8KB will be autostored via TOAST, which uses an OID in
its backend. So BYTEA have the same limitation. It appears that
disabling TOAST is not an option [1]?
So I guess if the number of BYTEA entries (in the sum all tables?
partitioning doesn't help?) with size > 8KB is > 4 billion then there
is actually no option there? (I'm not running into that case just
conjecturing).
Thanks!
-roger-

[1] http://www.postgresql.org/message-id/20130405140348.GC4326@awork2.anarazel.de


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: steve(at)blighty(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: 4B row limit for CLOB tables
Date: 2015-01-29 23:17:03
Message-ID: 20150130.081703.86526503323626229.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> I'm not sure whether it's mentioned explicitly, but large objects are
> referenced by an OID, which is a 32 bit value (and a global resource).

Large object is not necessarily referenced by OID since 8.1. You can
assign arbitrary 32 bit integers as long as they are unique in the
pg_largeobject table.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-01-30 17:54:42
Message-ID: CAL1QdWc0Qo_J_zf5ZzBnLPhwZ7C_YwgxGZ0fKhN_TEc+z=fmqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

>> On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>>> Hello. I see on this page a mention of basically a 4B row limit for
>>> tables that have BLOB's
>>
>> Oops I meant for BYTEA or TEXT columns, but it's possible the
>> reasoning is the same...
>
> It only applies to large objects, not bytea or text.

OK I think I figured out possibly why the wiki says this. I guess
BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
its backend. So BYTEA has a same limitation. It appears that
disabling TOAST is not an option [1].
So I guess if the number of BYTEA entries (in the sum all tables?
partitioning doesn't help?) with size > 2KB is > 4 billion then there
is actually no option there? If this occurred it might cause "all
sorts of things to break"? [2]
Thanks!
-roger-

[1] http://www.postgresql.org/message-id/20130405140348.GC4326@awork2.anarazel.de
[2] http://www.postgresql.org/message-id/CAL1QdWfb-p5kE9DT2pMqBxohaKG=vxmDREmSBjc+7TkbOeKBBw@mail.gmail.com


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Roger Pack <rogerdpack2(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-01-30 23:25:59
Message-ID: 54CC1307.8010000@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 1/30/15 11:54 AM, Roger Pack wrote:
>>> On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>>>> Hello. I see on this page a mention of basically a 4B row limit for
>>>> tables that have BLOB's
>>>
>>> Oops I meant for BYTEA or TEXT columns, but it's possible the
>>> reasoning is the same...
>>
>> It only applies to large objects, not bytea or text.
>
> OK I think I figured out possibly why the wiki says this. I guess
> BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
> its backend. So BYTEA has a same limitation. It appears that
> disabling TOAST is not an option [1].
> So I guess if the number of BYTEA entries (in the sum all tables?
> partitioning doesn't help?) with size > 2KB is > 4 billion then there
> is actually no option there? If this occurred it might cause "all
> sorts of things to break"? [2]

It's a bit more complex than that. First, toast isn't limited to bytea;
it holds for ALL varlena fields in a table that are allowed to store
externally. Second, the limit is actually per-table: every table gets
it's own toast table, and each toast table is limited to 4B unique OIDs.
Third, the OID counter is actually global, but the code should handle
conflicts by trying to get another OID. See toast_save_datum(), which
calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't in
the toast table. That means that if you actually get anywhere close to
using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

I don't think it would be horrifically hard to change the way toast OIDs
are assigned (I'm thinking we'd basically switch to creating a sequence
for every toast table), but I don't think anyone's ever tried to push
toast hard enough to hit this kind of limit.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-01-31 13:55:31
Message-ID: CAL1QdWdd8WC=i8TniGddFqGKZ3gY0Ngn-Cira4A6O7BQNcOD=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Oops forgot to forward to the list (suggestion/feature request to the
list admin for the various pg lists: make the default "reply to" go to
the list, not the sender, if at all possible).

Response below:

On 1/30/15, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 1/30/15 11:54 AM, Roger Pack wrote:
>>>> On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>>>>> Hello. I see on this page a mention of basically a 4B row limit for
>>>>> tables that have BLOB's
>>>>
>>>> Oops I meant for BYTEA or TEXT columns, but it's possible the
>>>> reasoning is the same...
>>>
>>> It only applies to large objects, not bytea or text.
>>
>> OK I think I figured out possibly why the wiki says this. I guess
>> BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
>> its backend. So BYTEA has a same limitation. It appears that
>> disabling TOAST is not an option [1].
>> So I guess if the number of BYTEA entries (in the sum all tables?
>> partitioning doesn't help?) with size > 2KB is > 4 billion then there
>> is actually no option there? If this occurred it might cause "all
>> sorts of things to break"? [2]
>
> It's a bit more complex than that. First, toast isn't limited to bytea;
> it holds for ALL varlena fields in a table that are allowed to store
> externally. Second, the limit is actually per-table: every table gets
> it's own toast table, and each toast table is limited to 4B unique OIDs.
> Third, the OID counter is actually global, but the code should handle
> conflicts by trying to get another OID. See toast_save_datum(), which
> calls GetNewOidWithIndex().
>
> Now, the reality is that GetNewOidWithIndex() is going to keep
> incrementing the global OID counter until it finds an OID that isn't in
> the toast table. That means that if you actually get anywhere close to
> using 4B OIDs you're going to become extremely unhappy with the
> performance of toasting new data.

OK so "system stability" doesn't degrade per se when it wraps [since
they all use that GetNewOid method or similar [?] good to know.

So basically when it gets near 4B TOAST'ed rows it may have to wrap that
counter and search for "unused" number, and for each number it's
querying the TOAST table to see if it's already used, degrading
performance.

So I guess partitioning tables for now is an acceptable work around,
good to know.

Thanks much for your response, good to know the details before we dive
into postgres with our 8B row table with BYTEA's in it :)


From: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-01 14:54:03
Message-ID: 54CE3E0B.9030606@8Kdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 31/01/15 14:55, Roger Pack wrote:
> [...]
>
> Now, the reality is that GetNewOidWithIndex() is going to keep
> incrementing the global OID counter until it finds an OID that isn't in
> the toast table. That means that if you actually get anywhere close to
> using 4B OIDs you're going to become extremely unhappy with the
> performance of toasting new data.
> OK so "system stability" doesn't degrade per se when it wraps [since
> they all use that GetNewOid method or similar [?] good to know.
>
> So basically when it gets near 4B TOAST'ed rows it may have to wrap that
> counter and search for "unused" number, and for each number it's
> querying the TOAST table to see if it's already used, degrading
> performance.

The problem here is that performance degrades exponentially, or
worse. Speaking here from experience, we already tested this for a very
similar case (table creation, where two oids are consumed from a global
sequence when inserting to pg_class). Have a look at
http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
slides 43-45. We tested there this scenario and shown that table
creations per second dropped from 10K to a few per second and then to a
few per day. In the graphs you can't even realize there were more tables
been created. At around 8K tables from the theoretical limit of 4B oids
consumed, the process basically stopped (doing more insertions).

Hope that this information helps.

Best regards,

Álvaro

--
Álvaro Hernández Tortosa

-----------
8Kdata


From: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: 'Pg Hackers' <pgsql-hackers(at)postgresql(dot)org>, 'Bruce Momjian' <bruce(at)momjian(dot)us>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-02 19:09:59
Message-ID: 54CFCB87.6000608@adv-solutions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 01/31/2015 12:25 AM, Jim Nasby wrote:
> [snip]
> It's a bit more complex than that. First, toast isn't limited to
> bytea; it holds for ALL varlena fields in a table that are allowed to
> store externally. Second, the limit is actually per-table: every table
> gets it's own toast table, and each toast table is limited to 4B
> unique OIDs. Third, the OID counter is actually global, but the code
> should handle conflicts by trying to get another OID. See
> toast_save_datum(), which calls GetNewOidWithIndex().
>
> Now, the reality is that GetNewOidWithIndex() is going to keep
> incrementing the global OID counter until it finds an OID that isn't
> in the toast table. That means that if you actually get anywhere close
> to using 4B OIDs you're going to become extremely unhappy with the
> performance of toasting new data.

Indeed ......

> I don't think it would be horrifically hard to change the way toast
> OIDs are assigned (I'm thinking we'd basically switch to creating a
> sequence for every toast table), but I don't think anyone's ever tried
> to push toast hard enough to hit this kind of limit.

We did. The Billion Table Project, part2 (a.k.a. "when does Postgres'
OID allocator become a bottleneck").... The allocator becomes
essentially unusable at about 2.1B OIDs, where it performed very well at
"quite empty"(< 100M objects) levels.

So yes, using one sequence per TOAST table should help.
Combined with the new SequenceAMs / sequence implementation being
proposed (specifically: one file for all sequences in a certain
tablespace) this should scale much better.

My 2c.

Regards,

/ J.L.


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-02 20:32:51
Message-ID: CAL1QdWdJMz-KBgx=zKgsxKxOh8zQ3qjj5Wd_EzLzyaTvbCEu5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 1/30/15, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 1/30/15 11:54 AM, Roger Pack wrote:
>>>> On 1/29/15, Roger Pack <rogerdpack2(at)gmail(dot)com> wrote:
>>>>> Hello. I see on this page a mention of basically a 4B row limit for
>>>>> tables that have BLOB's
>>>>
>>>> Oops I meant for BYTEA or TEXT columns, but it's possible the
>>>> reasoning is the same...
>>>
>>> It only applies to large objects, not bytea or text.
>>
>> OK I think I figured out possibly why the wiki says this. I guess
>> BYTEA entries > 2KB will be autostored via TOAST, which uses an OID in
>> its backend. So BYTEA has a same limitation. It appears that
>> disabling TOAST is not an option [1].
>> So I guess if the number of BYTEA entries (in the sum all tables?
>> partitioning doesn't help?) with size > 2KB is > 4 billion then there
>> is actually no option there? If this occurred it might cause "all
>> sorts of things to break"? [2]
>
> It's a bit more complex than that. First, toast isn't limited to bytea;
> it holds for ALL varlena fields in a table that are allowed to store
> externally. Second, the limit is actually per-table: every table gets
> it's own toast table, and each toast table is limited to 4B unique OIDs.
> Third, the OID counter is actually global, but the code should handle
> conflicts by trying to get another OID. See toast_save_datum(), which
> calls GetNewOidWithIndex().
>
> Now, the reality is that GetNewOidWithIndex() is going to keep
> incrementing the global OID counter until it finds an OID that isn't in
> the toast table. That means that if you actually get anywhere close to
> using 4B OIDs you're going to become extremely unhappy with the
> performance of toasting new data.

OK so "system stability" doesn't degrade per se when it wraps, good to know.

So basically when it gets near 4B rows it may have to wrap that
counter multiple times, and for each "entry" it's searching if it's
already used, etc.

So I guess partitioning tables for now is an acceptable work around,
good to know.

Thanks much for your response, good to know the details before we dive
into postgres with our 8B row table with BYTEA's in it :)


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-02 20:36:41
Message-ID: CAL1QdWd22s6B2xrzFPNJfjs1F0NTtsgObnawm3nwgqWBGqMc5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/2/15, José Luis Tallón <jltallon(at)adv-solutions(dot)net> wrote:
> On 01/31/2015 12:25 AM, Jim Nasby wrote:
>> [snip]
>> It's a bit more complex than that. First, toast isn't limited to
>> bytea; it holds for ALL varlena fields in a table that are allowed to
>> store externally. Second, the limit is actually per-table: every table
>> gets it's own toast table, and each toast table is limited to 4B
>> unique OIDs. Third, the OID counter is actually global, but the code
>> should handle conflicts by trying to get another OID. See
>> toast_save_datum(), which calls GetNewOidWithIndex().
>>
>> Now, the reality is that GetNewOidWithIndex() is going to keep
>> incrementing the global OID counter until it finds an OID that isn't
>> in the toast table. That means that if you actually get anywhere close
>> to using 4B OIDs you're going to become extremely unhappy with the
>> performance of toasting new data.
>
> Indeed ......
>
>> I don't think it would be horrifically hard to change the way toast
>> OIDs are assigned (I'm thinking we'd basically switch to creating a
>> sequence for every toast table), but I don't think anyone's ever tried
>> to push toast hard enough to hit this kind of limit.
>
> We did. The Billion Table Project, part2 (a.k.a. "when does Postgres'
> OID allocator become a bottleneck").... The allocator becomes
> essentially unusable at about 2.1B OIDs, where it performed very well at
> "quite empty"(< 100M objects) levels.
>
> So yes, using one sequence per TOAST table should help.
> Combined with the new SequenceAMs / sequence implementation being
> proposed (specifically: one file for all sequences in a certain
> tablespace) this should scale much better.

But it wouldn't be perfect, right? I mean if you had multiple
deletion/insertions and pass 4B then the "one sequence per TOAST
table" would still wrap [albeit more slowly], and performance start
degrading the same way. And there would still be the hard 4B limit.
Perhaps the foreign key to the TOAST table could be changed from oid
(32 bits) to something else (64 bits) [as well the sequence] so that
it never wraps? What do you think? And would a more aggressive change
like this have a chance of being accepted into the code base?
Thanks.
-roger-


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Roger Pack <rogerdpack2(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-02 21:03:55
Message-ID: 8314.1422911035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Roger Pack <rogerdpack2(at)gmail(dot)com> writes:
> On 2/2/15, Jos Luis Talln <jltallon(at)adv-solutions(dot)net> wrote:
>> So yes, using one sequence per TOAST table should help.
>> Combined with the new SequenceAMs / sequence implementation being
>> proposed (specifically: one file for all sequences in a certain
>> tablespace) this should scale much better.

> But it wouldn't be perfect, right? I mean if you had multiple
> deletion/insertions and pass 4B then the "one sequence per TOAST
> table" would still wrap [albeit more slowly], and performance start
> degrading the same way. And there would still be the hard 4B limit.
> Perhaps the foreign key to the TOAST table could be changed from oid
> (32 bits) to something else (64 bits) [as well the sequence] so that
> it never wraps? What do you think? And would a more aggressive change
> like this have a chance of being accepted into the code base?

There has been some thought about this, but I have seen no, zero, reports
of anyone actually running into problems *in practice* (as opposed to
contrived cases like "can we create a billion tables"). So we probably
aren't going to want to address it until it starts being a real problem.

The reason it's not as significant as you might think is that small field
values (less than a couple KB *after compression*) don't get pushed out
to the TOAST table, so they don't consume OIDs. And large field values,
like megabytes worth, aren't a problem either because you just aren't
gonna have that many of them. (Simple arithmetic.) You could potentially
get into trouble if you had a whole lot of entries that were just a little
over the toasting threshold, because then you'd have a lot of OIDs
consumed but still a manageable total amount of disk space. But that
doesn't seem to be a very common usage pattern.

Also, partitioning the table largely eliminates the problem because each
partition will have its own TOAST table. I'm on record as saying that
many people are far too quick to decide that they need partitioning; but
once you get into the volume of data where 4B toast entries starts to
look like a limitation, you will probably have other reasons to think
that you need to partition.

In short, this is something that's theoretically interesting but doesn't
seem worth doing in practice --- yet anyway.

regards, tom lane


From: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-02 21:50:15
Message-ID: 54CFF117.9020206@adv-solutions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 02/02/2015 09:36 PM, Roger Pack wrote:
> On 2/2/15, José Luis Tallón <jltallon(at)adv-solutions(dot)net> wrote:
>> On 01/31/2015 12:25 AM, Jim Nasby wrote:
>>> [snip]
>>> It's a bit more complex than that. First, toast isn't limited to
>>> bytea; it holds for ALL varlena fields in a table that are allowed to
>>> store externally. Second, the limit is actually per-table: every table
>>> gets it's own toast table, and each toast table is limited to 4B
>>> unique OIDs. Third, the OID counter is actually global, but the code
>>> should handle conflicts by trying to get another OID. See
>>> toast_save_datum(), which calls GetNewOidWithIndex().
>>>
>>> Now, the reality is that GetNewOidWithIndex() is going to keep
>>> incrementing the global OID counter until it finds an OID that isn't
>>> in the toast table. That means that if you actually get anywhere close
>>> to using 4B OIDs you're going to become extremely unhappy with the
>>> performance of toasting new data.
>> Indeed ......
>>
>>> I don't think it would be horrifically hard to change the way toast
>>> OIDs are assigned (I'm thinking we'd basically switch to creating a
>>> sequence for every toast table), but I don't think anyone's ever tried
>>> to push toast hard enough to hit this kind of limit.
>> We did. The Billion Table Project, part2 (a.k.a. "when does Postgres'
>> OID allocator become a bottleneck").... The allocator becomes
>> essentially unusable at about 2.1B OIDs, where it performed very well at
>> "quite empty"(< 100M objects) levels.
>>
>> So yes, using one sequence per TOAST table should help.
>> Combined with the new SequenceAMs / sequence implementation being
>> proposed (specifically: one file for all sequences in a certain
>> tablespace) this should scale much better.
> But it wouldn't be perfect, right? I mean if you had multiple
> deletion/insertions and pass 4B then the "one sequence per TOAST
> table" would still wrap [albeit more slowly], and performance start
> degrading the same way. And there would still be the hard 4B limit.
> Perhaps the foreign key to the TOAST table could be changed from oid
> (32 bits) to something else (64 bits) [as well the sequence] so that
> it never wraps?

Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap
page) is 8796093022208 (~9e13) bytes
... which results in 8192 1GB segments :O
Looks like partitioning might be needed much sooner than that (if only
for index efficiency reasons)... unless access is purely sequential.

The problem with changing the id from 32 to 64 bits is that the storage
*for everybody else* doubles, making the implementation slower for
most.... though this might be actually not that important.
The alternative could be some "long LOB" ("HugeOBject"?) using the
equivalent to "serial8" whereas regular LOBs would use "serial4".

Anybody actually reaching this limit out there?

Regards,

/ J .L.


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: José Luis Tallón <jltallon(at)adv-solutions(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 02:44:46
Message-ID: 54D0361E.7050609@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/2/15 3:50 PM, José Luis Tallón wrote:
> Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap
> page) is 8796093022208 (~9e13) bytes
> ... which results in 8192 1GB segments :O
> Looks like partitioning might be needed much sooner than that (if only
> for index efficiency reasons)... unless access is purely sequential.
>
> The problem with changing the id from 32 to 64 bits is that the storage
> *for everybody else* doubles, making the implementation slower for
> most.... though this might be actually not that important.
> The alternative could be some "long LOB" ("HugeOBject"?) using the
> equivalent to "serial8" whereas regular LOBs would use "serial4".

Well, it depends on how we did this. We could (for example) add a field
to pg_class that determines what type to use for toast pointers; OID,
int, or bigint. That could then be taken into account in the *toast*
functions.

But as others have pointed out, we haven't even had any real complaints
about toast using OIDs as being an issue until now, so I think it's
premature to start messing with this. At most it's just something to
keep in mind so we don't preclude doing this in the future.

BTW, regarding the size of what gets toasted; I've often thought it
would be useful to allow a custom size limit on columns so that you
could easily force data to be toasted if you knew you were very unlikely
to access it. Basically, a cheap form of vertical partitioning.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Matthew Kelly <mkelly(at)tripadvisor(dot)com>
To: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 03:12:11
Message-ID: 5696A620-74A7-4C4D-937B-A4D5695CC9BE@tripadvisor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap page) is 8796093022208 (~9e13) bytes
> ... which results in 8192 1GB segments :O

8192 1GB segments is just 8TB, its not _that_ large. At TripAdvisor we’ve been using a NoSQL solution to do session storage. We are looking to probably swap that out to be Postgres (every other db backing the site is Postgres). Essentially, what I’m building is a system with 1 logical table that maps session id to a 2KB+ grab bag of ever changing session attributes which is partially normalized, partially json. 315 million uniques a month multiplied by the retention policy means I need to hold 2-4 billion session objects (and somehow expire old ones). Additionally, most http calls can update the session, so between maintenance windows I expect to take around 20 billion 'upserts’. Obviously, I will have to shard and partition the table in practice, but this weekend I ran a test that demonstrated that a single table on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. Well, it can for 38 hours… until you wrap xid’s on the toast table. :P I’ll be the first to admit that isn’t the normal use case though. I’m happy to have found this thread, however, because I’m going to have to build around the global oid counter, explicitly the prevent the problem I explain below regarding clustering.

> Anybody actually reaching this limit out there?

Well its not the 4 billion row limit that concerns me, its the global shared counter in conjunction with pg_restore/clustering that is actually pretty concerning.

Just checked through all of TripAdvisor’s normal databases and the max tuples I see in single toast table is 17,000,000, so that is still a couple of orders of magnitude too small. (however, close enough that it’ll be a concern in a few years).

However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids.

I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesn’t protect you from this potential problem.

What even more weird is that this issue can be trigged by consuming too many oid’s in a different database in the same cluster (i.e. creating large amounts of temp tables)

> The problem with changing the id from 32 to 64 bits is that the storage *for everybody else* doubles, making the implementation slower for most.... though this might be actually not that important.

Well, you aren’t doubling the storage. Even if you have to store the key in 4 places, you are adding 16 bytes per TOAST tuple. If we work off the 2KB estimate for each TOAST tuple, then you are only increasing the storage by 0.7%. I’m sure there are more hidden costs but we are really only talking about a low single digit percent increase. In exchange, you get to drop one index scan per toast insert; an index scan looking in the only hot part of the index.

That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs.

- Matt K


From: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 15:01:05
Message-ID: 54D0E2B1.3030900@adv-solutions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 02/03/2015 03:44 AM, Jim Nasby wrote:
> [snip]
>> The alternative could be some "long LOB" ("HugeOBject"?) using the
>> equivalent to "serial8" whereas regular LOBs would use "serial4".
>
> Well, it depends on how we did this. We could (for example) add a
> field to pg_class that determines what type to use for toast pointers;
> OID, int, or bigint. That could then be taken into account in the
> *toast* functions.
>
> But as others have pointed out, we haven't even had any real
> complaints about toast using OIDs as being an issue until now, so I
> think it's premature to start messing with this. At most it's just
> something to keep in mind so we don't preclude doing this in the future.

A patch creating those HOBs (Huge Objects) might well make sense *after*
the sequence refactoring got merged.
Removing the bottleneck due to the OID allocator for this use case will
be definitively welcome
(I don't dare to code that just yet, but here's hoping someone will
step in O:-)

> BTW, regarding the size of what gets toasted; I've often thought it
> would be useful to allow a custom size limit on columns so that you
> could easily force data to be toasted if you knew you were very
> unlikely to access it. Basically, a cheap form of vertical partitioning.

Hmmm.... alter column set storage external / set storage extended ?

From http://www.postgresql.org/docs/9.4/static/sql-altertable.html :
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL |
EXTENDED | MAIN }

This would do what you described, right?

HTH,

/ J.L.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Kelly <mkelly(at)tripadvisor(dot)com>
Cc: José Luis Tallón <jltallon(at)adv-solutions(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 15:01:07
Message-ID: 2275.1422975667@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Matthew Kelly <mkelly(at)tripadvisor(dot)com> writes:
> However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids.

> I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesnt protect you from this potential problem.

That may be a hazard, but ...

> That being said Id be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs.

... this "fix" would actually make things enormously worse. With the
single counter feeding all tables, you at least have a reasonable
probability that there are not enormously long runs of consecutive OIDs in
any one toast table. With a sequence per table, you are nearly guaranteed
that there are such runs, because inserts into other tables don't create a
break.

(This effect is also why you're wrong to claim that partitioning can't fix
it.)

regards, tom lane


From: David Steele <david(at)pgmasters(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 15:50:28
Message-ID: 54D0EE44.2030100@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/3/15 10:01 AM, José Luis Tallón wrote:

> Hmmm.... alter column set storage external / set storage extended ?
>
> From http://www.postgresql.org/docs/9.4/static/sql-altertable.html :
> ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL |
> EXTENDED | MAIN }
>
> This would do what you described, right?

EXTENDED is the default for most TOAST-able types and is still subject
to TOAST_TUPLE_THRESHOLD which is normally 2K. EXTERNAL is the same but
with no compression.

See: http://www.postgresql.org/docs/9.4/static/storage-toast.html

--
- David Steele
david(at)pgmasters(dot)net


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: David Steele <david(at)pgmasters(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 22:27:42
Message-ID: 54D14B5E.9000109@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/3/15 9:50 AM, David Steele wrote:
> On 2/3/15 10:01 AM, José Luis Tallón wrote:
>
>> Hmmm.... alter column set storage external / set storage extended ?
>>
>> From http://www.postgresql.org/docs/9.4/static/sql-altertable.html :
>> ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL |
>> EXTENDED | MAIN }
>>
>> This would do what you described, right?
>
> EXTENDED is the default for most TOAST-able types and is still subject
> to TOAST_TUPLE_THRESHOLD which is normally 2K. EXTERNAL is the same but
> with no compression.
>
> See: http://www.postgresql.org/docs/9.4/static/storage-toast.html

Right. I'd like to be able to set per-column TOAST_TUPLE_THRESHOLD.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthew Kelly <mkelly(at)tripadvisor(dot)com>
Cc: José Luis Tallón <jltallon(at)adv-solutions(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-03 22:31:29
Message-ID: 54D14C41.4050003@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/3/15 9:01 AM, Tom Lane wrote:
> Matthew Kelly <mkelly(at)tripadvisor(dot)com> writes:
>> However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restored for a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine that I have tables which are keyed by ~8,000,000 consecutive oids.
>
>> I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries to accomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesn’t protect you from this potential problem.
>
> That may be a hazard, but ...
>
>> That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates the risk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those of us with larger than average installs.
>
> ... this "fix" would actually make things enormously worse. With the
> single counter feeding all tables, you at least have a reasonable
> probability that there are not enormously long runs of consecutive OIDs in
> any one toast table. With a sequence per table, you are nearly guaranteed
> that there are such runs, because inserts into other tables don't create a
> break.
>
> (This effect is also why you're wrong to claim that partitioning can't fix
> it.)

That's assuming that toasting is evenly spread between tables. In my
experience, that's not a great bet...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: David Steele <david(at)pgmasters(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-04 00:40:03
Message-ID: 54D16A63.7000902@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/3/15 5:27 PM, Jim Nasby wrote:
> On 2/3/15 9:50 AM, David Steele wrote:
>> EXTENDED is the default for most TOAST-able types and is still subject
>> to TOAST_TUPLE_THRESHOLD which is normally 2K. EXTERNAL is the same but
>> with no compression.
>>
>> See: http://www.postgresql.org/docs/9.4/static/storage-toast.html
>
> Right. I'd like to be able to set per-column TOAST_TUPLE_THRESHOLD.

No argument there. There are some columns that I would prefer to always
TOAST because even 2K can be very big for some use cases.

--
- David Steele
david(at)pgmasters(dot)net


From: Matthew Kelly <mkelly(at)tripadvisor(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Luis Tallón <jltallon(at)adv-solutions(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 4B row limit for CLOB tables
Date: 2015-02-05 06:53:11
Message-ID: A4F793D3-76D5-49EB-9121-BDB867955A11@tripadvisor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> That's assuming that toasting is evenly spread between tables. In my experience, that's not a great bet...

Time to create a test:
SELECT chunk_id::bigint/100000 as id_range, count(*), count(*)/(100000::float) density FROM (SELECT chunk_id FROM pg_toast.pg_toast_39000165 WHERE chunk_id <
100000000 AND chunk_seq = 0) f GROUP BY id_range ORDER BY id_range;

The machine in question was restored in parallel in Sept 2013 as part of an upgrade from 8.4. It has about 2000 tables, so its definitely not dominated by a couple tables. Progress towards oid wrap around is about 25.6%.

With minimal effort, I found 2 bad examples, and I’m sure I can easily find more. I attached the results for those two.

There were runs of 1,100,000+ and 600,000+ chunk_ids where more than 99% of the chunk_id are taken. After restore completion, oid densities averaged less than 20 per 100,000 and 400 per 100,000 respectively. The only reasons those runs seem to be so short is because the tables were much smaller back then. I expect that next time I dump restore (necessary for upgrading OS versions due to the collation issue), I’m going to have runs closer to 20,0000,000.

> ... this "fix" would actually make things enormously worse. With the
> single counter feeding all tables, you at least have a reasonable
> probability that there are not enormously long runs of consecutive OIDs in
> any one toast table. With a sequence per table, you are nearly guaranteed
> that there are such runs, because inserts into other tables don't create a
> break.

It makes each toast table independent (and far less likely to wrap) . It would wrap when the sum(mods on THIS toast table) > 2^32. Right now the function looks like:

sum(mods on ALL toast tables in cluster) + sum(created normal tables in cluster * k) + sum(created temp tables in cluster * k) + [...] > 2^32,
WHERE k average number of ids consumed for pg_class, pg_type, etc...

In the case of an insert only table (which is a common use case for partitions), the id would only wrap when the TOAST table was “full”. On the other hand currently, it would wrap into its pg_restored section when the combined oid consuming operations on the cluster surpassed 4 billion.

That being said, I’m certainly not attached to that solution. My real argument is that although its not a problem today, we are only about 5 years from it being a problem for large installs and the first time you’ll hear about it is after someone has a 5 minute production outage on a database thats been taking traffic for 2 years.

- Matt K.

Attachment Content-Type Size
oid_densities_1.txt text/plain 2.3 KB
oid_densities_2.txt text/plain 18.8 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-23 16:11:57
Message-ID: 20150423161157.GF31856@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Sun, Feb 1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote:
>
> On 31/01/15 14:55, Roger Pack wrote:
> >[...]
> >
> >Now, the reality is that GetNewOidWithIndex() is going to keep
> >incrementing the global OID counter until it finds an OID that isn't in
> >the toast table. That means that if you actually get anywhere close to
> >using 4B OIDs you're going to become extremely unhappy with the
> >performance of toasting new data.
> >OK so "system stability" doesn't degrade per se when it wraps [since
> >they all use that GetNewOid method or similar [?] good to know.
> >
> >So basically when it gets near 4B TOAST'ed rows it may have to wrap that
> >counter and search for "unused" number, and for each number it's
> >querying the TOAST table to see if it's already used, degrading
> >performance.
>
>
> The problem here is that performance degrades exponentially, or
> worse. Speaking here from experience, we already tested this for a
> very similar case (table creation, where two oids are consumed from
> a global sequence when inserting to pg_class). Have a look at
> http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
> slides 43-45. We tested there this scenario and shown that table
> creations per second dropped from 10K to a few per second and then
> to a few per day. In the graphs you can't even realize there were
> more tables been created. At around 8K tables from the theoretical
> limit of 4B oids consumed, the process basically stopped (doing more
> insertions).

I had a look at our FAQ about Postgres limitations and I don't see
anything that needs changing:

https://wiki.postgresql.org/wiki/FAQ#What_is_the_maximum_size_for_a_row.2C_a_table.2C_and_a_database.3F

Maximum size for a database? unlimited (32 TB databases exist)
Maximum size for a table? 32 TB
Maximum size for a row? 400 GB
Maximum size for a field? 1 GB
Maximum number of rows in a table? unlimited
Maximum number of columns in a table? 250-1600 depending on column types
Maximum number of indexes on a table? unlimited

We don't report the maximum number of tables per database, or the
maximum number of TOAST values. Agreed?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-24 03:04:59
Message-ID: 20150424030459.GS30322@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

* Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> On Sun, Feb 1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote:
> > The problem here is that performance degrades exponentially, or
> > worse. Speaking here from experience, we already tested this for a
> > very similar case (table creation, where two oids are consumed from
> > a global sequence when inserting to pg_class). Have a look at
> > http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
> > slides 43-45. We tested there this scenario and shown that table
> > creations per second dropped from 10K to a few per second and then
> > to a few per day. In the graphs you can't even realize there were
> > more tables been created. At around 8K tables from the theoretical
> > limit of 4B oids consumed, the process basically stopped (doing more
> > insertions).
>
> I had a look at our FAQ about Postgres limitations and I don't see
> anything that needs changing:
>
> https://wiki.postgresql.org/wiki/FAQ#What_is_the_maximum_size_for_a_row.2C_a_table.2C_and_a_database.3F
>
> Maximum size for a database? unlimited (32 TB databases exist)
> Maximum size for a table? 32 TB
> Maximum size for a row? 400 GB
> Maximum size for a field? 1 GB
> Maximum number of rows in a table? unlimited
> Maximum number of columns in a table? 250-1600 depending on column types
> Maximum number of indexes on a table? unlimited
>
> We don't report the maximum number of tables per database, or the
> maximum number of TOAST values. Agreed?

For my 2c, this limitation is a surprise to users and therefore we
should add documentation to point out that it exists, unless we're going
to actually fix it (which is certainly what I'd prefer to see...).

As for the other discussion on the thread, having a per-table sequence
would be far better as it'd reduce the wrap-around risk quite a bit and,
further, we should be able to provide that without breaking the on-disk
format. What would be really nice is a way to expand the size of the
key when needed- in other words, instead of wrapping around, if we
actually hit 4B toasted values in a table then stick a flag somewhere
for the next toasted value that says "this value is in the second toast
table/fork" and then go up to 4B on that one, etc. That allows us to
support more than 4B toasted values and doesn't require searching for
holes in the values assigned.

If we end up with empty toast tables eventually, then allow reusing
them. Perhaps vacuum can even be used to make a note somewhere saying
"this toast table is now empty and can be reused".

In the end, I'd like to think we can do better here than having a hard
limit at 4B when it comes to how many values over a few KB we can store.
As mentioned, that isn't all that much these days. I'm not saying that
my proposal or what's been proposed upthread is an answer, but I've
certainly build PG systems which store over 4B rows and it's not hard to
imagine cases where I might have wanted a toasted value for each of
those rows.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-24 03:24:46
Message-ID: 1362.1429845886@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
>> On Sun, Feb 1, 2015 at 03:54:03PM +0100, lvaro Hernndez Tortosa wrote:
>>> The problem here is that performance degrades exponentially, or
>>> worse. Speaking here from experience, we already tested this for a
>>> very similar case (table creation, where two oids are consumed from
>>> a global sequence when inserting to pg_class). Have a look at
>>> http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
>>> slides 43-45. We tested there this scenario and shown that table
>>> creations per second dropped from 10K to a few per second and then
>>> to a few per day. In the graphs you can't even realize there were
>>> more tables been created. At around 8K tables from the theoretical
>>> limit of 4B oids consumed, the process basically stopped (doing more
>>> insertions).

>> We don't report the maximum number of tables per database, or the
>> maximum number of TOAST values. Agreed?

> For my 2c, this limitation is a surprise to users and therefore we
> should add documentation to point out that it exists, unless we're going
> to actually fix it (which is certainly what I'd prefer to see...).

TBH, I've got very little enthusiasm for fixing this given the number
of reports of trouble from the field, which so far as I recall is zero.
lvaro's case came up through intentionally trying to create an
unreasonable number of tables, not from real usage. This thread likewise
appears to contain lots of speculation and no reports of anyone hitting
a problem in practice.

Certainly this is likely to become an issue at some point in the future,
but I'm not finding it very compelling to worry about now. By the time
it does become an issue, we may have additional considerations or use
cases that should inform a solution; which seems to me to be a good
argument not to try to fix it in advance of real problems. Perhaps,
for example, we'd find that at the same time we ought to relax the 1GB
limit on individual-value size; or perhaps not.

Having said all that, if we did try to fix it today, I'd imagine changing
TOAST value identifiers to int64 and inventing a new TOAST pointer format
for use when 32 bits isn't wide enough for the ID. But I think we're best
advised to hold off doing that until the need becomes pressing.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Álvaro Hernández <aht(at)8kdata(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-24 12:11:41
Message-ID: CA+TgmobOxuFjkD7_g3R3PGDvmLw6Cmwm921M=5jLPALMa124ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Apr 23, 2015 at 11:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Having said all that, if we did try to fix it today, I'd imagine changing
> TOAST value identifiers to int64 and inventing a new TOAST pointer format
> for use when 32 bits isn't wide enough for the ID. But I think we're best
> advised to hold off doing that until the need becomes pressing.

Just out of curiosity, has anyone thought about inventing a new TOAST
pointer format on the grounds that our TOAST pointers are unreasonably
large? IIUC, a TOAST pointer right now is 18 bytes: 16 for a
varatt_external, and then that gets embedded in a varattrib_1b_e with
a va_header byte and a va_tag byte. Eliminating one or both of
va_rawsize and va_extsize from the TOAST pointer itself seems like it
could save quite a bit of space on disk. Maybe you could even find a
way to get rid of va_toastrelid; after all, at the point when you
first acquire a pointer to the tuple, you surely know what relation
it's a part of. You'd probably want to force de-TOASTing (or
converting to a more expressive form of TOAST pointer, anyway) when
you extracted the column from the tuple, which might be hard to
arrange.

But the benefits could be pretty significant. Suppose you have a
table where each tuple is 4K untoasted, with all but 100 bytes of that
in a single column. So, as stored, you've got 100 bytes of regular
stuff plus an 18-byte TOAST header. If you could trim 2 of the
above-mentioned 4-byte fields out of the TOAST header, that would
reduce the size of the main relation fork by almost 7%. If you could
trim all 3 of them out, you'd save more than 10%. That's not nothing,
and the benefits could be even larger for rows that contain multiple
TOAST pointers.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Álvaro Hernández <aht(at)8kdata(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-24 13:23:17
Message-ID: 20150424132317.GW30322@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Thu, Apr 23, 2015 at 11:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Having said all that, if we did try to fix it today, I'd imagine changing
> > TOAST value identifiers to int64 and inventing a new TOAST pointer format
> > for use when 32 bits isn't wide enough for the ID. But I think we're best
> > advised to hold off doing that until the need becomes pressing.
>
> Just out of curiosity, has anyone thought about inventing a new TOAST
> pointer format on the grounds that our TOAST pointers are unreasonably
> large?

I'd not thought about it, but sure sounds like a good idea from here.

Would be particularly great if we were able to do this and increase the
number of supported toast pointers and avoid having to go hunting for
unused identifiers due to wrapping.

Thanks!

Stephen


From: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-25 00:11:43
Message-ID: 553ADBBF.10100@8Kdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 24/04/15 05:24, Tom Lane wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
>>> On Sun, Feb 1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote:
>>>> The problem here is that performance degrades exponentially, or
>>>> worse. Speaking here from experience, we already tested this for a
>>>> very similar case (table creation, where two oids are consumed from
>>>> a global sequence when inserting to pg_class). Have a look at
>>>> http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
>>>> slides 43-45. We tested there this scenario and shown that table
>>>> creations per second dropped from 10K to a few per second and then
>>>> to a few per day. In the graphs you can't even realize there were
>>>> more tables been created. At around 8K tables from the theoretical
>>>> limit of 4B oids consumed, the process basically stopped (doing more
>>>> insertions).
>>> We don't report the maximum number of tables per database, or the
>>> maximum number of TOAST values. Agreed?
>> For my 2c, this limitation is a surprise to users and therefore we
>> should add documentation to point out that it exists, unless we're going
>> to actually fix it (which is certainly what I'd prefer to see...).
> TBH, I've got very little enthusiasm for fixing this given the number
> of reports of trouble from the field, which so far as I recall is zero.
> Álvaro's case came up through intentionally trying to create an
> unreasonable number of tables, not from real usage. This thread likewise
> appears to contain lots of speculation and no reports of anyone hitting
> a problem in practice.

It is certainly true that this was a very synthetic case. I
envision, however, certain use cases where we may hit a very large
number of tables:

- Massive multitenancy
- Aggressive partitioning
- Massive multitenancy with aggressive partitioning
- Software dynamically generated tables, like those created by ToroDB
(https://github.com/torodb/torodb). In ToroDB we generate tables
depending only on the input data, so we may end up having as many as
required by the datasource. For example, a "general purpose" json
datastore may generate several tables per document inserted.

>
> Certainly this is likely to become an issue at some point in the future,
> but I'm not finding it very compelling to worry about now. By the time
> it does become an issue, we may have additional considerations or use
> cases that should inform a solution; which seems to me to be a good
> argument not to try to fix it in advance of real problems. Perhaps,

I understand this argument, and it makes sense. However, on the
other side, given the long time it may take from patch to commit and
then release version to companies finally using it in production, I'd
rather try to fix it soon, as there are already reports and use cases
that may hit it, rather than wait three years until it explodes in our
faces. After all, 640Kb RAM is enough, right? So maybe 2B tables is not
that far in the horizon. Who knows.

Regards,

Álvaro

--
Álvaro Hernández Tortosa

-----------
8Kdata


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-25 04:39:04
Message-ID: 553B1A68.1000900@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 4/24/15 7:11 PM, Álvaro Hernández Tortosa wrote:
> On 24/04/15 05:24, Tom Lane wrote:
...
>> TBH, I've got very little enthusiasm for fixing this given the number
>> of reports of trouble from the field, which so far as I recall is zero.
>> Álvaro's case came up through intentionally trying to create an
>> unreasonable number of tables, not from real usage. This thread likewise
>> appears to contain lots of speculation and no reports of anyone hitting
>> a problem in practice.
>
> It is certainly true that this was a very synthetic case. I
> envision, however, certain use cases where we may hit a very large
> number of tables:

The original case has NOTHING to do with the number of tables and
everything to do with the number of toasted values a table can have. If
you have to toast 4B attributes in a single relation it will fail. In
reality, if you get anywhere close to that things will fall apart due to
OID conflicts.

This case isn't nearly as insane as 4B tables. A table storing 10 text
fields each of which is 2K would hit this limit with only 400M rows. If
my math is right that's only 8TB; certainly not anything insane
space-wise or rowcount-wise.

Perhaps it's still not fixing, but I think it's definitely worth
documenting.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-25 18:19:10
Message-ID: 20150425181910.GB17791@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Apr 24, 2015 at 11:39:04PM -0500, Jim Nasby wrote:
> On 4/24/15 7:11 PM, Álvaro Hernández Tortosa wrote:
> >On 24/04/15 05:24, Tom Lane wrote:
> ...
> >>TBH, I've got very little enthusiasm for fixing this given the number
> >>of reports of trouble from the field, which so far as I recall is zero.
> >>Álvaro's case came up through intentionally trying to create an
> >>unreasonable number of tables, not from real usage. This thread likewise
> >>appears to contain lots of speculation and no reports of anyone hitting
> >>a problem in practice.
> >
> > It is certainly true that this was a very synthetic case. I
> >envision, however, certain use cases where we may hit a very large
> >number of tables:
>
> The original case has NOTHING to do with the number of tables and
> everything to do with the number of toasted values a table can have.
> If you have to toast 4B attributes in a single relation it will
> fail. In reality, if you get anywhere close to that things will fall
> apart due to OID conflicts.
>
> This case isn't nearly as insane as 4B tables. A table storing 10
> text fields each of which is 2K would hit this limit with only 400M
> rows. If my math is right that's only 8TB; certainly not anything
> insane space-wise or rowcount-wise.
>
> Perhaps it's still not fixing, but I think it's definitely worth
> documenting.

And it is now documented in the Postgres FAQ thanks to 'Rogerdpack',
which is where that "maximum" table came from:

https://wiki.postgresql.org/wiki/FAQ#What_is_the_maximum_size_for_a_row.2C_a_table.2C_and_a_database.3F

Note if you are storing a table with rows that exceed 2KB in size
(aggregate size of each row) then the "Maximum number of rows in a
table" may be limited to 4 Billion, see TOAST.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-26 14:44:26
Message-ID: 553CF9CA.7070004@8Kdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 25/04/15 06:39, Jim Nasby wrote:
> On 4/24/15 7:11 PM, Álvaro Hernández Tortosa wrote:
>> On 24/04/15 05:24, Tom Lane wrote:
> ...
>>> TBH, I've got very little enthusiasm for fixing this given the number
>>> of reports of trouble from the field, which so far as I recall is zero.
>>> Álvaro's case came up through intentionally trying to create an
>>> unreasonable number of tables, not from real usage. This thread
>>> likewise
>>> appears to contain lots of speculation and no reports of anyone hitting
>>> a problem in practice.
>>
>> It is certainly true that this was a very synthetic case. I
>> envision, however, certain use cases where we may hit a very large
>> number of tables:
>
> The original case has NOTHING to do with the number of tables and
> everything to do with the number of toasted values a table can have.
> If you have to toast 4B attributes in a single relation it will fail.
> In reality, if you get anywhere close to that things will fall apart
> due to OID conflicts.
>
> This case isn't nearly as insane as 4B tables. A table storing 10 text
> fields each of which is 2K would hit this limit with only 400M rows.
> If my math is right that's only 8TB; certainly not anything insane
> space-wise or rowcount-wise.
>
> Perhaps it's still not fixing, but I think it's definitely worth
> documenting.

They are definitely different problems, but caused by similar
symptoms: an oid wrapping around, or not even there: just trying to find
an unused one. If fixed, we should probably look at both at the same time.

It's worth document but also, as I said, maybe also fixing them, so
that if three years from now they really show up, solution is already in
production (rather than in patching state).

Regards,

Álvaro

--
Álvaro Hernández Tortosa

-----------
8Kdata


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-26 18:16:31
Message-ID: 20150426181630.GD30322@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

* Álvaro Hernández Tortosa (aht(at)8Kdata(dot)com) wrote:
> It's worth document but also, as I said, maybe also fixing them,
> so that if three years from now they really show up, solution is
> already in production (rather than in patching state).

With the proliferation of JSON usage in PG thanks to jsonb, I'd count us
lucky if we don't get complaints about this in the next three years.

I don't expect to have time to work on it in the near future,
unfortunately, but Robert's thoughts on supporting a new TOAST pointer
structure (with a way to support what's currently there, to avoid an
on-disk break) seems like a good starting point to me.

Thanks!

Stephen


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-27 06:49:44
Message-ID: 553DDC08.1020309@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 4/25/15 1:19 PM, Bruce Momjian wrote:
> Note if you are storing a table with rows that exceed 2KB in size
> (aggregate size of each row) then the "Maximum number of rows in a
> table" may be limited to 4 Billion, see TOAST.

That's not accurate though; you could be limited to far less than 4B
rows. If each row has 10 fields that toast, you'd be limited to just
400M rows.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Roger Pack <rogerdpack2(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-27 13:07:06
Message-ID: CAL1QdWfwDGPScY7uBnfVpjzbgb1xzvAHEGbVXRFhj-hZgi0ugQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 4/27/15, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 4/25/15 1:19 PM, Bruce Momjian wrote:
>> Note if you are storing a table with rows that exceed 2KB in size
>> (aggregate size of each row) then the "Maximum number of rows in a
>> table" may be limited to 4 Billion, see TOAST.
>
> That's not accurate though; you could be limited to far less than 4B
> rows. If each row has 10 fields that toast, you'd be limited to just
> 400M rows.

Good point. I noted that on the TOAST wiki page now, at least (and
also mentioned that using partitioning is a "work around" for now).


From: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, 'Bruce Momjian' <bruce(at)momjian(dot)us>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-28 10:41:26
Message-ID: 553F63D6.3060507@adv-solutions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 04/27/2015 08:49 AM, Jim Nasby wrote:
> On 4/25/15 1:19 PM, Bruce Momjian wrote:
>> Note if you are storing a table with rows that exceed 2KB in size
>> (aggregate size of each row) then the "Maximum number of rows in a
>> table" may be limited to 4 Billion, see TOAST.
>
> That's not accurate though; you could be limited to far less than 4B
> rows. If each row has 10 fields that toast, you'd be limited to just
> 400M rows.

ISTM like the solution is almost here, and could be done without too
much (additional) work:
* We have already discussed having a page-per-sequence with the new
SeqAMs being introduced and how that would improve scalability.
* We have commented on having a sequence per TOAST table
(hence, 4B toasted values per table each up to 4B chunks in size...
vs just 4B toasted values per cluster)

I'm not sure that I can do it all by myself just yet, but I sure
can try if there is interest.
(just after I'm done with another patch that is independent from
this, though)

This would be material for 9.6, of course :)

Thanks,

J.L.


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: José Luis Tallón <jltallon(at)adv-solutions(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, 'Bruce Momjian' <bruce(at)momjian(dot)us>
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Date: 2015-04-28 17:48:33
Message-ID: 553FC7F1.8070401@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 4/28/15 5:41 AM, José Luis Tallón wrote:
> On 04/27/2015 08:49 AM, Jim Nasby wrote:
>> On 4/25/15 1:19 PM, Bruce Momjian wrote:
>>> Note if you are storing a table with rows that exceed 2KB in size
>>> (aggregate size of each row) then the "Maximum number of rows in a
>>> table" may be limited to 4 Billion, see TOAST.
>>
>> That's not accurate though; you could be limited to far less than 4B
>> rows. If each row has 10 fields that toast, you'd be limited to just
>> 400M rows.
>
> ISTM like the solution is almost here, and could be done without too
> much (additional) work:
> * We have already discussed having a page-per-sequence with the new
> SeqAMs being introduced and how that would improve scalability.
> * We have commented on having a sequence per TOAST table
> (hence, 4B toasted values per table each up to 4B chunks in size...
> vs just 4B toasted values per cluster)
>
> I'm not sure that I can do it all by myself just yet, but I sure
> can try if there is interest.

I don't think it would be hard at all to switch toast pointers to being
sequence generated instead of OIDs. The only potential downside I see is
the extra space required for all the sequnces... but that would only
matter on the tinyest of clusters (think embedded), which probably don't
have that many tables to begin with.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com