Re: unlogged tables vs. GIST

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: unlogged tables vs. GIST
Date: 2010-12-14 21:06:40
Message-ID: AANLkTikdks1RfnjaX__H9X=EzjAgKR_v-OJH0N8L78DT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Nov 13, 2010 at 9:09 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> The fact that it's easy doesn't make it workable.  I would point out for
>> starters that AMs might (do) put WAL locations and/or XIDs into indexes.
>> Occasionally copying very old LSNs or XIDs back into active files seems
>> pretty dangerous.
>
> I haven't examined the GIST, GIN, or hash index code in detail so I am
> not sure whether there are any hazards there; the btree case does not
> seem to have any issues of this type.  Certainly, if an index AM puts
> an XID into an empty index, that's gonna break.  I would consider that
> a pretty odd thing to do, though.  An LSN seems less problematic since
> the LSN space does not wrap; it should just look like an index that
> was created a long time ago and never updated (which, in effect, it
> is).

I'm still not convinced there's any hazard of this type, but there is,
apparently, a problem with failing to emit XLOG records for GIST
indexes, because they apparently use LSNs to detect concurrent page
splits (see Heikki's commit on November 16th, aka
2edc5cd493ce3d7834026970e9d3cd00e203f51a) and the hack he inserted to
work around that problem for temporary tables isn't going to work for
unlogged tables. I suppose we could disallow unlogged GIST indexes.
Or we could allow them but still XLOG some operations anyway to make
sure that the LSN advances at the appropriate time. That seems pretty
ugly, though. Any other ideas?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-14 21:24:25
Message-ID: 4D07E089.5040903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14.12.2010 23:06, Robert Haas wrote:
> On Sat, Nov 13, 2010 at 9:09 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>> The fact that it's easy doesn't make it workable. I would point out for
>>> starters that AMs might (do) put WAL locations and/or XIDs into indexes.
>>> Occasionally copying very old LSNs or XIDs back into active files seems
>>> pretty dangerous.
>>
>> I haven't examined the GIST, GIN, or hash index code in detail so I am
>> not sure whether there are any hazards there; the btree case does not
>> seem to have any issues of this type. Certainly, if an index AM puts
>> an XID into an empty index, that's gonna break. I would consider that
>> a pretty odd thing to do, though. An LSN seems less problematic since
>> the LSN space does not wrap; it should just look like an index that
>> was created a long time ago and never updated (which, in effect, it
>> is).
>
> I'm still not convinced there's any hazard of this type, but there is,
> apparently, a problem with failing to emit XLOG records for GIST
> indexes, because they apparently use LSNs to detect concurrent page
> splits (see Heikki's commit on November 16th, aka
> 2edc5cd493ce3d7834026970e9d3cd00e203f51a) and the hack he inserted to
> work around that problem for temporary tables isn't going to work for
> unlogged tables. I suppose we could disallow unlogged GIST indexes.
> Or we could allow them but still XLOG some operations anyway to make
> sure that the LSN advances at the appropriate time. That seems pretty
> ugly, though. Any other ideas?

Hmm, the first idea that comes to mind is to use a counter like the
GetXLogRecPtrForTemp() counter I used for temp tables, but global, in
shared memory. However, that's a bit problematic because if we store a
value from that counter to LSN, it's possible that the counter overtakes
the XLOG insert location, and you start to get xlog flush errors. We
could avoid that if we added a new field to the GiST page header, and
used that to store the value in the parent page instead of the LSN.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-14 21:29:12
Message-ID: AANLkTin36PBKpTU2s+A-O0whdSgOdsA7ius3YPuXoG63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 14, 2010 at 4:24 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Hmm, the first idea that comes to mind is to use a counter like the
> GetXLogRecPtrForTemp() counter I used for temp tables, but global, in shared
> memory. However, that's a bit problematic because if we store a value from
> that counter to LSN, it's possible that the counter overtakes the XLOG
> insert location, and you start to get xlog flush errors. We could avoid that
> if we added a new field to the GiST page header, and used that to store the
> value in the parent page instead of the LSN.

That doesn't seem ideal, either, because now you're eating up some
number of bytes per page in every GIST index just on the off chance
that one of them is unlogged. Unless there's a way to do it only for
unlogged GIST indexes, but it seems like that could be messy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-14 21:55:05
Message-ID: 19949.1292363705@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Dec 14, 2010 at 4:24 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Hmm, the first idea that comes to mind is to use a counter like the
>> GetXLogRecPtrForTemp() counter I used for temp tables, but global, in shared
>> memory. However, that's a bit problematic because if we store a value from
>> that counter to LSN, it's possible that the counter overtakes the XLOG
>> insert location, and you start to get xlog flush errors. We could avoid that
>> if we added a new field to the GiST page header, and used that to store the
>> value in the parent page instead of the LSN.

> That doesn't seem ideal, either, because now you're eating up some
> number of bytes per page in every GIST index just on the off chance
> that one of them is unlogged.

On-disk compatibility seems problematic here as well.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-14 22:14:28
Message-ID: AANLkTing6fW66mdmwu3CKLEDOKZ+WQ-wNsiPFoXNwV==@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 14, 2010 at 4:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Dec 14, 2010 at 4:24 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Hmm, the first idea that comes to mind is to use a counter like the
>>> GetXLogRecPtrForTemp() counter I used for temp tables, but global, in shared
>>> memory. However, that's a bit problematic because if we store a value from
>>> that counter to LSN, it's possible that the counter overtakes the XLOG
>>> insert location, and you start to get xlog flush errors. We could avoid that
>>> if we added a new field to the GiST page header, and used that to store the
>>> value in the parent page instead of the LSN.
>
>> That doesn't seem ideal, either, because now you're eating up some
>> number of bytes per page in every GIST index just on the off chance
>> that one of them is unlogged.
>
> On-disk compatibility seems problematic here as well.

Good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 18:53:03
Message-ID: AANLkTikbBRtPB8eO8NDbn5H+cRdqr5N6Zj-K=yukSSXd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 14, 2010 at 5:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Dec 14, 2010 at 4:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> On Tue, Dec 14, 2010 at 4:24 PM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Hmm, the first idea that comes to mind is to use a counter like the
>>>> GetXLogRecPtrForTemp() counter I used for temp tables, but global, in shared
>>>> memory. However, that's a bit problematic because if we store a value from
>>>> that counter to LSN, it's possible that the counter overtakes the XLOG
>>>> insert location, and you start to get xlog flush errors. We could avoid that
>>>> if we added a new field to the GiST page header, and used that to store the
>>>> value in the parent page instead of the LSN.
>>
>>> That doesn't seem ideal, either, because now you're eating up some
>>> number of bytes per page in every GIST index just on the off chance
>>> that one of them is unlogged.
>>
>> On-disk compatibility seems problematic here as well.
>
> Good point.

Given the foregoing discussion, I see only two possible paths forward here.

1. Just decide that that unlogged tables can't have GIST indexes, at
least until someone figures out a way to make it work. That's sort of
an annoying limitation, but I think we could live with it.

2. Write WAL records even though the GIST index is supposedly
unlogged. We could either (1) write the usual XLOG_GIST_* records,
and arrange to ignore them on replay when the relation in question is
unlogged, or (2) write an XLOG_NOOP record to advance the current LSN.
The latter sounds safer to me, but it will mean that the XLOG code
for GIST needs three separate cases (temp, perm, unlogged). Either
way we give up a significant chunk of the benefit of making the
relation unlogged in the first place.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andy Colson <andy(at)squeakycode(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 18:59:32
Message-ID: 4D0BB314.9050807@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> Given the foregoing discussion, I see only two possible paths forward here.
>
> 1. Just decide that that unlogged tables can't have GIST indexes, at
> least until someone figures out a way to make it work. That's sort of
> an annoying limitation, but I think we could live with it.
>

+1

In the small subset of situations that need unlogged tables, I would
think the subset of those that need gist is exceedingly small.

Unless someone can come up with a use case that needs both unlogged and
gist, I'd vote not to spend time on it. (And, if ever someone does come
along with a really good use, then time can be put toward it).

-Andy


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:07:22
Message-ID: 3753.1292612842@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Given the foregoing discussion, I see only two possible paths forward here.

> 1. Just decide that that unlogged tables can't have GIST indexes, at
> least until someone figures out a way to make it work. That's sort of
> an annoying limitation, but I think we could live with it.

> 2. Write WAL records even though the GIST index is supposedly
> unlogged. We could either (1) write the usual XLOG_GIST_* records,
> and arrange to ignore them on replay when the relation in question is
> unlogged, or (2) write an XLOG_NOOP record to advance the current LSN.
> The latter sounds safer to me, but it will mean that the XLOG code
> for GIST needs three separate cases (temp, perm, unlogged). Either
> way we give up a significant chunk of the benefit of making the
> relation unlogged in the first place.

> Thoughts?

IIUC, the problem is that the bufmgr might think that a GIST NSN is an
LSN that should affect when to force out a dirty buffer? What if we
taught it the difference? We could for example dedicate a pd_flags
bit to marking pages whose pd_lsn isn't actually an LSN.

This solution would probably imply that all pages in the shared buffer
pool have to have a standard PageHeaderData header, not just an LSN at
the front as is assumed now. But that doesn't seem like a bad thing to
me, unless maybe we were dumb enough to not use a standard page header
in some of the secondary forks.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:19:11
Message-ID: 4D0BB7AF.4030503@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17.12.2010 21:07, Tom Lane wrote:
> IIUC, the problem is that the bufmgr might think that a GIST NSN is an
> LSN that should affect when to force out a dirty buffer? What if we
> taught it the difference? We could for example dedicate a pd_flags
> bit to marking pages whose pd_lsn isn't actually an LSN.
>
> This solution would probably imply that all pages in the shared buffer
> pool have to have a standard PageHeaderData header, not just an LSN at
> the front as is assumed now. But that doesn't seem like a bad thing to
> me, unless maybe we were dumb enough to not use a standard page header
> in some of the secondary forks.

I'm not very fond of expanding buffer manager's knowledge of the page
layout. How about a new flag in the buffer desc, BM_UNLOGGED? There was
some talk about skipping flushing of unlogged tables at checkpoints, I
think we'd need BM_UNLOGGED for that anyway. Or I guess we could hang
that behavior on the pd_flags bit too, but it doesn't seem like the
right place for that information.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:22:35
Message-ID: 4034.1292613755@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 17.12.2010 21:07, Tom Lane wrote:
>> IIUC, the problem is that the bufmgr might think that a GIST NSN is an
>> LSN that should affect when to force out a dirty buffer? What if we
>> taught it the difference? We could for example dedicate a pd_flags
>> bit to marking pages whose pd_lsn isn't actually an LSN.

> I'm not very fond of expanding buffer manager's knowledge of the page
> layout. How about a new flag in the buffer desc, BM_UNLOGGED?

That could work too, if you can explain how the flag comes to be set
without a bunch of ugliness all over the system. I don't want callers
of ReadBuffer to have to supply the bit for example.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:25:24
Message-ID: AANLkTi=w+Qpw40-8eFewtthPsGmt5gCNXOYePpnd-Sse@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 2:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> IIUC, the problem is that the bufmgr might think that a GIST NSN is an
> LSN that should affect when to force out a dirty buffer?  What if we
> taught it the difference?  We could for example dedicate a pd_flags
> bit to marking pages whose pd_lsn isn't actually an LSN.
>
> This solution would probably imply that all pages in the shared buffer
> pool have to have a standard PageHeaderData header, not just an LSN at
> the front as is assumed now.  But that doesn't seem like a bad thing to
> me, unless maybe we were dumb enough to not use a standard page header
> in some of the secondary forks.

Ah, interesting idea. visibilitymap.c has this:

#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))

and fsm_internals.h has this:

#define NodesPerPage (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - \
offsetof(FSMPageData, fp_nodes))

...which seems to imply, at least on a quick look, that we might be
OK, as far as the extra forks go.

I have a vague recollection that the documentation says somewhere that
a full page header on every page isn't required. And I'm a little
worried about my ability to track down every place in the system that
might get broken by a change in this area. What do you think about
committing the unlogged tables patch initially without support for
GIST indexes, and working on fixing this as a separate commit?

Another possibly-useful thing about mandating a full page header for
every page is that it might give us a way of avoiding unnecessary full
page writes. As I wrote previously:

> However, I think we can avoid this too, by
> allocating an additional bit in pd_flags, PD_FPI. Instead of emitting
> an FPI when the old LSN precedes the redo pointer, we'll emit an FPI
> when the FPI bit is set (in which case we'll also clear the bit) OR
> when the old LSN precedes the redo pointer. Upon emitting a WAL
> record that is torn-page safe (such as a freeze or all-visible
> record), we'll pass a flag to XLogInsert that arranges to suppress
> FPIs, bump the LSN, and set PD_FPI. That way, if the page is touched
> again before the next checkpoint by an operation that does NOT
> suppress FPI, one will be emitted then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:31:41
Message-ID: 4267.1292614301@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Another possibly-useful thing about mandating a full page header for
> every page is that it might give us a way of avoiding unnecessary full
> page writes. As I wrote previously:

Could we do that via a bufmgr status bit, instead? Heikki's idea has
the merit that it actually reduces bufmgr's knowledge of page headers,
rather than increasing it (since a buffer marked UNLOGGED would need
no assumptions at all about its content).

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:32:20
Message-ID: AANLkTinTHZU9nx0xvezEYMTg+Wb34if2DqODuR1-8qg+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 2:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 17.12.2010 21:07, Tom Lane wrote:
>>> IIUC, the problem is that the bufmgr might think that a GIST NSN is an
>>> LSN that should affect when to force out a dirty buffer?  What if we
>>> taught it the difference?  We could for example dedicate a pd_flags
>>> bit to marking pages whose pd_lsn isn't actually an LSN.
>
>> I'm not very fond of expanding buffer manager's knowledge of the page
>> layout. How about a new flag in the buffer desc, BM_UNLOGGED?
>
> That could work too, if you can explain how the flag comes to be set
> without a bunch of ugliness all over the system.  I don't want callers
> of ReadBuffer to have to supply the bit for example.

That's easy enough to solve - ReadBuffer() takes a Relation as an
argument, so you can easily check RelationNeedsWAL(reln).

I guess the question is whether it's right to conflate "table is
unlogged" with "LSN is fake". It's not immediately obvious to me that
those concepts are isomorphic, although though the reverse isn't
obvious to me either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:33:28
Message-ID: AANLkTikN5D=3RGzS_YSm3k2_5VchRzR2vCquuTE4zbUS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 2:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Another possibly-useful thing about mandating a full page header for
>> every page is that it might give us a way of avoiding unnecessary full
>> page writes.  As I wrote previously:
>
> Could we do that via a bufmgr status bit, instead?  Heikki's idea has
> the merit that it actually reduces bufmgr's knowledge of page headers,
> rather than increasing it (since a buffer marked UNLOGGED would need
> no assumptions at all about its content).

That was my first thought, but it doesn't work. The buffer could be
evicted from shared_buffers and read back in. If a checkpoint
intervenes meanwhile, we're OK, but otherwise you fail to emit an
otherwise-needed FPI.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 19:48:58
Message-ID: 4D0BBEAA.7010008@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17.12.2010 21:32, Robert Haas wrote:
> I guess the question is whether it's right to conflate "table is
> unlogged" with "LSN is fake". It's not immediately obvious to me that
> those concepts are isomorphic, although though the reverse isn't
> obvious to me either.

The buffer manager only needs to know if it has to flush the WAL before
writing the page to disk. The flag just means that the buffer manager
never needs to do that for this buffer. You're still free to store a
real LSN there if you want to, it just won't cause any WAL flushes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 20:03:34
Message-ID: 4788.1292616214@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 17.12.2010 21:32, Robert Haas wrote:
>> I guess the question is whether it's right to conflate "table is
>> unlogged" with "LSN is fake". It's not immediately obvious to me that
>> those concepts are isomorphic, although though the reverse isn't
>> obvious to me either.

> The buffer manager only needs to know if it has to flush the WAL before
> writing the page to disk. The flag just means that the buffer manager
> never needs to do that for this buffer. You're still free to store a
> real LSN there if you want to, it just won't cause any WAL flushes.

Yeah. I think that BM_UNLOGGED might be a poor choice for the flag name,
just because it overstates what the bufmgr needs to assume. It might be
better to reverse the flag sense, and have a new flag that *is* set if
the page contains an LSN that we have to check against WAL.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 20:08:07
Message-ID: AANLkTinVy3RL7zChfMX=LNf7-EjOLNhCSWoM5qWRtEor@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 3:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 17.12.2010 21:32, Robert Haas wrote:
>>> I guess the question is whether it's right to conflate "table is
>>> unlogged" with "LSN is fake".  It's not immediately obvious to me that
>>> those concepts are isomorphic, although though the reverse isn't
>>> obvious to me either.
>
>> The buffer manager only needs to know if it has to flush the WAL before
>> writing the page to disk. The flag just means that the buffer manager
>> never needs to do that for this buffer. You're still free to store a
>> real LSN there if you want to, it just won't cause any WAL flushes.
>
> Yeah.  I think that BM_UNLOGGED might be a poor choice for the flag name,
> just because it overstates what the bufmgr needs to assume.  It might be
> better to reverse the flag sense, and have a new flag that *is* set if
> the page contains an LSN that we have to check against WAL.

I was actually thinking of adding BM_UNLOGGED even before this
discussion, because that would allow unlogged buffers to be excluded
from non-shutdown checkpoints. We could add two flags with different
semantics that take on, under present rules, the same value, but I'd
be disinclined to burn the extra bit without a concrete need.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 20:15:16
Message-ID: 4982.1292616916@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Dec 17, 2010 at 3:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. I think that BM_UNLOGGED might be a poor choice for the flag name,
>> just because it overstates what the bufmgr needs to assume.

> I was actually thinking of adding BM_UNLOGGED even before this
> discussion, because that would allow unlogged buffers to be excluded
> from non-shutdown checkpoints. We could add two flags with different
> semantics that take on, under present rules, the same value, but I'd
> be disinclined to burn the extra bit without a concrete need.

bufmgr is currently using eight bits out of a 16-bit flag field, and
IIRC at least five of those have been there since the beginning. So our
accretion rate is something like one bit every four years. I think not
being willing to use two bits to describe two unrelated behaviors is
penny-wise and pound-foolish --- bufmgr is already complicated enough,
let's not add useless barriers to readability.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 20:50:52
Message-ID: AANLkTinsRUmYEih5eHJZiTD2_-aV7M_0QOWWwxo8YRAE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 3:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Dec 17, 2010 at 3:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Yeah.  I think that BM_UNLOGGED might be a poor choice for the flag name,
>>> just because it overstates what the bufmgr needs to assume.
>
>> I was actually thinking of adding BM_UNLOGGED even before this
>> discussion, because that would allow unlogged buffers to be excluded
>> from non-shutdown checkpoints.  We could add two flags with different
>> semantics that take on, under present rules, the same value, but I'd
>> be disinclined to burn the extra bit without a concrete need.
>
> bufmgr is currently using eight bits out of a 16-bit flag field, and
> IIRC at least five of those have been there since the beginning.  So our
> accretion rate is something like one bit every four years.  I think not
> being willing to use two bits to describe two unrelated behaviors is
> penny-wise and pound-foolish --- bufmgr is already complicated enough,
> let's not add useless barriers to readability.

Allright, what do you want to call the other bit, then? BM_SKIP_XLOG_FLUSH?

I have a feeling we may also be creating BM_UNTIDY rather soon, per
previous discussion of hint bit I/O.

Since these bits will only be set/cleared when the buffer mapping is
changed, can we examine this bit without taking the spinlock? If not,
we're going to have to stick an extra spinlock acquire/release into
FlushBuffer(), which sounds rather unappealing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 21:15:20
Message-ID: 6074.1292620520@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Allright, what do you want to call the other bit, then? BM_SKIP_XLOG_FLUSH?

Like I said, I'd be tempted to invert the sense, so that the flag is set
for normal relations. Then it becomes something like BM_FLUSH_XLOG.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-17 21:17:56
Message-ID: 6126.1292620676@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ hit send too soon ... ]

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Since these bits will only be set/cleared when the buffer mapping is
> changed, can we examine this bit without taking the spinlock?

Only if you're willing for the result to be unreliable. In the
case of the xlog flush bit, I don't believe an extra locking cycle
should be necessary anyway, as you surely had the lock when you
found the page to be dirty in the first place. You could grab the
bit then. I'm not sure where you envision checking the other bit,
but I doubt it can be all that far removed from a lock acquisition.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2010-12-18 01:50:54
Message-ID: AANLkTikuCukaVdQK2xLH0ViuFF3qzCOb4=V3JjQAZDdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 4:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Since these bits will only be set/cleared when the buffer mapping is
>> changed, can we examine this bit without taking the spinlock?
>
> Only if you're willing for the result to be unreliable.

If we read the bits while someone else is writing them, we'll get
either the old or the new value, but the BM_FLUSH_XLOG bit will be the
same either way. When FlushBuffer() is called, a shared content log
and a pin are held, so the buffer can't be renamed under us. If
that's really unacceptable, we can do something like the attached, but
I think this is unnecessarily gross. We already assume in other
places that the read or write of an integer is atomic. In this case
we don't even need it to be fully atomic - we just need the particular
byte that contains this bit not to go through some intermediate state
where the bit is unset. Is there really a memory architecture out
there that's crazy enough to let such a thing happen? If so, I'd
expect things to be breaking right and left on that machine anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
bm-flush-xlog-paranoid.patch application/octet-stream 7.7 KB

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 06:54:28
Message-ID: CAM2+6=UTts3rAM4PoH1F475M9nsuquasnN+Z_rDzX6fmfYXx=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Robert / Tom

I think to support GiST with unlogged table we need to do three things:

1. Teach the buffer manager that the LSN of a page not marked
BM_PERMANENT can be ignored

2. Teach GetXLogRecPtrForTemp() to allocate fake LSNs for GiST buffers
using a 64-bit, counter that is persisted across clean shutdowns

3. Remove the error checks that prevent creating an unlogged GiST
index and update the documentation accordingly

PFA, patch which try to do above things.

For (2), I have added a new function called, GetXLogRecPtrForUnloogedRel()
which returns a fake LSN for GiST indexes. However, I have removed
GetXLogRecPtrForTemp() function and added its functionality inside this new
function itself to avoid complexity.

With this patch I am able to create GiST for unlogged tables.
It works well with my examples.
Also make check has no issues.

Please have a look and let me know your views.

Thanks

On Sat, Dec 18, 2010 at 7:20 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Dec 17, 2010 at 4:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Since these bits will only be set/cleared when the buffer mapping is
> >> changed, can we examine this bit without taking the spinlock?
> >
> > Only if you're willing for the result to be unreliable.
>
> If we read the bits while someone else is writing them, we'll get
> either the old or the new value, but the BM_FLUSH_XLOG bit will be the
> same either way. When FlushBuffer() is called, a shared content log
> and a pin are held, so the buffer can't be renamed under us. If
> that's really unacceptable, we can do something like the attached, but
> I think this is unnecessarily gross. We already assume in other
> places that the read or write of an integer is atomic. In this case
> we don't even need it to be fully atomic - we just need the particular
> byte that contains this bit not to go through some intermediate state
> where the bit is unset. Is there really a memory architecture out
> there that's crazy enough to let such a thing happen? If so, I'd
> expect things to be breaking right and left on that machine anyway.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Attachment Content-Type Size
support_unlogged_gist_indexes.patch application/octet-stream 8.9 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 18:10:05
Message-ID: 50F59B7D.1010109@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.01.2013 08:54, Jeevan Chalke wrote:
> For (2), I have added a new function called, GetXLogRecPtrForUnloogedRel()
> which returns a fake LSN for GiST indexes. However, I have removed
> GetXLogRecPtrForTemp() function and added its functionality inside this new
> function itself to avoid complexity.

I don't much care for using a new field in the control file for this.
First, it seems like a big modularity violation to store a gist-specific
counter in the control file. Second, you'd be generating a lot of
traffic on the ControlFileLock. It's not heavily contended at the
moment, but when the control file is updated, it's held over an fsync,
which could cause unnecessary stalls to insertions to unlogged gist
tables. And it's just a bad idea to share a lock for two things with
completely different characteristics in general.

Could we stash the counter e.g. in the root page of the index?

- Heikki


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 18:33:17
Message-ID: CA+TgmoaKxYxLQaj21F3MVzAMfdm8G+JVabvJta6OVt2WnDyVRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 15, 2013 at 1:10 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 15.01.2013 08:54, Jeevan Chalke wrote:
>>
>> For (2), I have added a new function called, GetXLogRecPtrForUnloogedRel()
>> which returns a fake LSN for GiST indexes. However, I have removed
>> GetXLogRecPtrForTemp() function and added its functionality inside this
>> new
>> function itself to avoid complexity.
>
>
> I don't much care for using a new field in the control file for this. First,
> it seems like a big modularity violation to store a gist-specific counter in
> the control file. Second, you'd be generating a lot of traffic on the
> ControlFileLock. It's not heavily contended at the moment, but when the
> control file is updated, it's held over an fsync, which could cause
> unnecessary stalls to insertions to unlogged gist tables. And it's just a
> bad idea to share a lock for two things with completely different
> characteristics in general.
>
> Could we stash the counter e.g. in the root page of the index?

That would require maintaining a counter per table rather than a
single global counter, which would be bad because then we'd need to
store one counter in shared memory for every table, rather than just
one, period, which runs up against the fixed sizing of shared memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 18:44:37
Message-ID: 50F5A395.2070205@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.01.2013 20:33, Robert Haas wrote:
> On Tue, Jan 15, 2013 at 1:10 PM, Heikki Linnakangas
>> Could we stash the counter e.g. in the root page of the index?
>
> That would require maintaining a counter per table rather than a
> single global counter, which would be bad because then we'd need to
> store one counter in shared memory for every table, rather than just
> one, period, which runs up against the fixed sizing of shared memory.

I was thinking of just adding a new field to the root page header, and
use that field as the counter. Something like:

XLogRecPtr
GetXLogRecPtrForTemp(void)
{
rootbuf = ReadBuffer(rel, GIST_ROOT_BLKNO);
opaq = GistPageGetOpaque(BufferGetPage(rootbuf));

LockBuffer(rootbuf, GIST_EXCLUSIVE);
nsn = opaq->counter++
UnlockReleaseBuffer(rootbuf)
return nsn;
}

or perhaps we need to use locking mechanism for that, like just a new
global lwlock or spinlock, to avoid deadlocks if someone is just
splitting the root page. In any case, the fixed-sizedness of shared
memory isn't an issue here.

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 18:48:09
Message-ID: 29865.1358275689@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 15, 2013 at 1:10 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> Could we stash the counter e.g. in the root page of the index?

> That would require maintaining a counter per table rather than a
> single global counter, which would be bad because then we'd need to
> store one counter in shared memory for every table, rather than just
> one, period, which runs up against the fixed sizing of shared memory.

I think what Heikki had in mind was that the copy in the index would be
the authoritative one, not some image in shared memory. This'd imply
dirtying the root page on every insert, as well as increased contention
for the root page, so it might have performance problems.

I think a bigger issue is where we'd find any space for it. There's no
easily-spare space in a GIST page. This reminds me again that the lack
of a metapage in GIST was a serious design error, which we should
correct if we ever break on-disk compatibility again.

I concur that adding such a counter to pg_control is a nonstarter,
though.

Given that we don't need crash recovery for an unlogged table, could
we get away with some variant of NSN that has weaker semantics than
XLOG LSNs?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 18:53:15
Message-ID: CA+TgmoYPiSeY5Vgq0C6X0ctBXK1SX3yUb-nbFtqOVQxcBaHoOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 15, 2013 at 1:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jan 15, 2013 at 1:10 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>> Could we stash the counter e.g. in the root page of the index?
>
>> That would require maintaining a counter per table rather than a
>> single global counter, which would be bad because then we'd need to
>> store one counter in shared memory for every table, rather than just
>> one, period, which runs up against the fixed sizing of shared memory.
>
> I think what Heikki had in mind was that the copy in the index would be
> the authoritative one, not some image in shared memory. This'd imply
> dirtying the root page on every insert, as well as increased contention
> for the root page, so it might have performance problems.
>
> I think a bigger issue is where we'd find any space for it. There's no
> easily-spare space in a GIST page. This reminds me again that the lack
> of a metapage in GIST was a serious design error, which we should
> correct if we ever break on-disk compatibility again.
>
> I concur that adding such a counter to pg_control is a nonstarter,
> though.
>
> Given that we don't need crash recovery for an unlogged table, could
> we get away with some variant of NSN that has weaker semantics than
> XLOG LSNs?

It needs to be strictly ascending and survive clean shutdowns. Is
there some place we could preserve it other than the control file?

I was assuming we wanted a single sequence shared across all relations
rather than a sequence per relation, but I don't know of any reason
why that's actually required.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 18:58:00
Message-ID: 50F5A6B8.8080105@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.01.2013 20:48, Tom Lane wrote:
> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jan 15, 2013 at 1:10 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>> Could we stash the counter e.g. in the root page of the index?
>
>> That would require maintaining a counter per table rather than a
>> single global counter, which would be bad because then we'd need to
>> store one counter in shared memory for every table, rather than just
>> one, period, which runs up against the fixed sizing of shared memory.
>
> I think what Heikki had in mind was that the copy in the index would be
> the authoritative one, not some image in shared memory. This'd imply
> dirtying the root page on every insert, as well as increased contention
> for the root page, so it might have performance problems.

Not every insert, just every split. Which might still be a performance
problem, but an order of magnitude smaller.

> I think a bigger issue is where we'd find any space for it. There's no
> easily-spare space in a GIST page.

We could use a larger opaque struct, with the additional field, for the
root page, for new indexes. As long as we continue to support the
current layout too, that won't break on-disk compatibility. We didn't
support unlogged gist indexes before, so we won't have to worry about
upgrading unlogged indexes that miss the field.

Or if 32 bits is enough for this, we could reuse the right-link. The
root page has no right link, so it can be repurposed.

> This reminds me again that the lack
> of a metapage in GIST was a serious design error, which we should
> correct if we ever break on-disk compatibility again.

Yeah.

> I concur that adding such a counter to pg_control is a nonstarter,
> though.
>
> Given that we don't need crash recovery for an unlogged table, could
> we get away with some variant of NSN that has weaker semantics than
> XLOG LSNs?

One thought I had is that we only generate an NSN when a page is split,
and gist never deletes pages, so how about simply using the block number
of the newly split page as the NSN? That closes the chance of
reinventing page recycling in the future, though.

- Heikki


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 19:07:37
Message-ID: 20130115190737.GP5115@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-01-15 20:58:00 +0200, Heikki Linnakangas wrote:
> On 15.01.2013 20:48, Tom Lane wrote:
> >Robert Haas<robertmhaas(at)gmail(dot)com> writes:
> >>On Tue, Jan 15, 2013 at 1:10 PM, Heikki Linnakangas
> >><hlinnakangas(at)vmware(dot)com> wrote:
> >>>Could we stash the counter e.g. in the root page of the index?
> >
> >>That would require maintaining a counter per table rather than a
> >>single global counter, which would be bad because then we'd need to
> >>store one counter in shared memory for every table, rather than just
> >>one, period, which runs up against the fixed sizing of shared memory.
> >
> >I think what Heikki had in mind was that the copy in the index would be
> >the authoritative one, not some image in shared memory. This'd imply
> >dirtying the root page on every insert, as well as increased contention
> >for the root page, so it might have performance problems.
>
> Not every insert, just every split. Which might still be a performance
> problem, but an order of magnitude smaller.

I might be dense here and I don't really know that code, but if its only
splits why not do an XLogInsert(XLOG_GIST_NSN) or something there?
Inventing some other form of logging just because its an unlogged table
seems like reinventing the wheel.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 20:56:04
Message-ID: CA+TgmoZFiLnAVoZ7N7FeZNrUXk2n7hKiPZydXeZaQrqPHAUxcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 15, 2013 at 1:58 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
>> I think what Heikki had in mind was that the copy in the index would be
>> the authoritative one, not some image in shared memory. This'd imply
>> dirtying the root page on every insert, as well as increased contention
>> for the root page, so it might have performance problems.
>
> Not every insert, just every split. Which might still be a performance
> problem, but an order of magnitude smaller.

I think that might be acceptable from a performance point of view -
after all, if the index is unlogged, you're saving the cost of WAL -
but I guess I still prefer a generic solution to this problem (a
generalization of GetXLogRecPtrForTemp) rather than a special-purpose
solution based on the nitty-gritty of how GiST uses these values.
What's the difference between storing this value in pg_control and,
say, the OID counter?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 21:26:18
Message-ID: 3281.1358285178@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think that might be acceptable from a performance point of view -
> after all, if the index is unlogged, you're saving the cost of WAL -
> but I guess I still prefer a generic solution to this problem (a
> generalization of GetXLogRecPtrForTemp) rather than a special-purpose
> solution based on the nitty-gritty of how GiST uses these values.
> What's the difference between storing this value in pg_control and,
> say, the OID counter?

Well, the modularity argument is that GiST shouldn't have any special
privileges compared to a third-party index AM. (I realize that
third-party AMs already have problems plugging into WAL replay, but
that doesn't mean we should create more problems.)

We could possibly dodge that objection by regarding the global counter
as some sort of generic "unlogged operation counter", available to
anybody who needs it. It would be good to have a plausible example of
something else needing it, but assume somebody can think of one.

The bigger issue is that the reason we don't have to update pg_control
every other millisecond is that the OID counter is capable of tracking
its state between checkpoints without touching pg_control, that is it
can emit WAL records to track its increments. I think that we should
insist that GiST do likewise, even if we give it some space in
pg_control. Remember that pg_control is a single point of failure for
the database, and the more often it's written to, the more likely it is
that something will go wrong there.

So I guess what would make sense to me is that we invent an "unlogged
ops counter" that is managed exactly like the OID counter, including
having WAL records that are treated as consuming some number of values
in advance. If it's 64 bits wide then the WAL records could safely be
made to consume quite a lot of values, like a thousand or so, thus
reducing the actual WAL I/O burden to about nothing.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-15 21:54:51
Message-ID: CA+Tgmoa+=+Lv0ppkuB2YSAh-eECJJAapbqaqCQiB5xX091g1uA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 15, 2013 at 4:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I think that might be acceptable from a performance point of view -
>> after all, if the index is unlogged, you're saving the cost of WAL -
>> but I guess I still prefer a generic solution to this problem (a
>> generalization of GetXLogRecPtrForTemp) rather than a special-purpose
>> solution based on the nitty-gritty of how GiST uses these values.
>> What's the difference between storing this value in pg_control and,
>> say, the OID counter?
>
> Well, the modularity argument is that GiST shouldn't have any special
> privileges compared to a third-party index AM. (I realize that
> third-party AMs already have problems plugging into WAL replay, but
> that doesn't mean we should create more problems.)
>
> We could possibly dodge that objection by regarding the global counter
> as some sort of generic "unlogged operation counter", available to
> anybody who needs it. It would be good to have a plausible example of
> something else needing it, but assume somebody can think of one.
>
> The bigger issue is that the reason we don't have to update pg_control
> every other millisecond is that the OID counter is capable of tracking
> its state between checkpoints without touching pg_control, that is it
> can emit WAL records to track its increments. I think that we should
> insist that GiST do likewise, even if we give it some space in
> pg_control. Remember that pg_control is a single point of failure for
> the database, and the more often it's written to, the more likely it is
> that something will go wrong there.
>
> So I guess what would make sense to me is that we invent an "unlogged
> ops counter" that is managed exactly like the OID counter, including
> having WAL records that are treated as consuming some number of values
> in advance. If it's 64 bits wide then the WAL records could safely be
> made to consume quite a lot of values, like a thousand or so, thus
> reducing the actual WAL I/O burden to about nothing.

I didn't look at the actual patch (silly me?) but the only time you
need to update the control file is when writing the shutdown
checkpoint just before stopping the database server. If the server
crashes, it's OK to roll the value back to some smaller value, because
unlogged relations will be reset anyway. And while the server is
running the information can live in a shared memory copy protected by
a spinlock. So the control file traffic should be limited to once per
server lifetime, AFAICS.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-23 09:04:11
Message-ID: CAM2+6=UnCJNXMVBaiO_=2XprjRCfv4RAbaKsnUU1j-NGe2jgYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 16, 2013 at 3:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Jan 15, 2013 at 4:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> I think that might be acceptable from a performance point of view -
> >> after all, if the index is unlogged, you're saving the cost of WAL -
> >> but I guess I still prefer a generic solution to this problem (a
> >> generalization of GetXLogRecPtrForTemp) rather than a special-purpose
> >> solution based on the nitty-gritty of how GiST uses these values.
> >> What's the difference between storing this value in pg_control and,
> >> say, the OID counter?
> >
> > Well, the modularity argument is that GiST shouldn't have any special
> > privileges compared to a third-party index AM. (I realize that
> > third-party AMs already have problems plugging into WAL replay, but
> > that doesn't mean we should create more problems.)
> >
> > We could possibly dodge that objection by regarding the global counter
> > as some sort of generic "unlogged operation counter", available to
> > anybody who needs it. It would be good to have a plausible example of
> > something else needing it, but assume somebody can think of one.
> >
> > The bigger issue is that the reason we don't have to update pg_control
> > every other millisecond is that the OID counter is capable of tracking
> > its state between checkpoints without touching pg_control, that is it
> > can emit WAL records to track its increments. I think that we should
> > insist that GiST do likewise, even if we give it some space in
> > pg_control. Remember that pg_control is a single point of failure for
> > the database, and the more often it's written to, the more likely it is
> > that something will go wrong there.
> >
> > So I guess what would make sense to me is that we invent an "unlogged
> > ops counter" that is managed exactly like the OID counter, including
> > having WAL records that are treated as consuming some number of values
> > in advance. If it's 64 bits wide then the WAL records could safely be
> > made to consume quite a lot of values, like a thousand or so, thus
> > reducing the actual WAL I/O burden to about nothing.
>
> I didn't look at the actual patch (silly me?) but the only time you
> need to update the control file is when writing the shutdown
> checkpoint just before stopping the database server. If the server
> crashes, it's OK to roll the value back to some smaller value, because
> unlogged relations will be reset anyway. And while the server is
> running the information can live in a shared memory copy protected by
> a spinlock. So the control file traffic should be limited to once per
> server lifetime, AFAICS.
>
>
Yes.

I guess my earlier patch, which was directly incrementing
ControlFile->unloggedLSN counter was the concern as it will take
ControlFileLock several times.

In this version of patch I did what Robert has suggested. At start of the
postmaster, copying unloggedLSn value to XLogCtl, a shared memory struct.
And
in all access to unloggedLSN, using this shared variable using a SpinLock.
And since we want to keep this counter persistent across clean shutdown,
storing it in ControlFile before updating it.

With this approach, we are updating ControlFile only when we shutdown the
server, rest of the time we are having a shared memory counter. That means
we
are not touching pg_control every other millisecond or so. Also since we are
not caring about crashes, XLogging this counter like OID counter is not
required as such.

Thanks

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Attachment Content-Type Size
support_unlogged_gist_indexes_v2.patch application/octet-stream 10.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-23 15:30:42
Message-ID: CA+Tgmoa4gd7CoREPFoVKrBU=RKbWQJ3ZebmngJtTfYMGOhkSiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 23, 2013 at 4:04 AM, Jeevan Chalke
<jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> Yes.
>
> I guess my earlier patch, which was directly incrementing
> ControlFile->unloggedLSN counter was the concern as it will take
> ControlFileLock several times.
>
> In this version of patch I did what Robert has suggested. At start of the
> postmaster, copying unloggedLSn value to XLogCtl, a shared memory struct.
> And
> in all access to unloggedLSN, using this shared variable using a SpinLock.
> And since we want to keep this counter persistent across clean shutdown,
> storing it in ControlFile before updating it.
>
> With this approach, we are updating ControlFile only when we shutdown the
> server, rest of the time we are having a shared memory counter. That means
> we
> are not touching pg_control every other millisecond or so. Also since we are
> not caring about crashes, XLogging this counter like OID counter is not
> required as such.

On a quick read-through this looks reasonable to me, but others may
have different opinions, and I haven't reviewed in detail.

One obvious oversight is that bufmgr.c needs a detailed comment
explaining the reason behind the change you are making there.
Otherwise, someone might think to undo it in the future, and that
would be bad. Perhaps add something like:

However, this rule does not apply for unlogged relations, which will
be lost after a crash anyway. Most unlogged relation pages do not
bear LSNs since we never emit WAL records for them, and therefore
flushing up through the buffer LSN would be useless, but harmless.
However, GiST indexes use LSNs internally to track page-splits, and
therefore temporary and unlogged GiST pages bear "fake" LSNs generated
by GetXLogRecPtrForUnloggedRel. It is unlikely but possible that the
fake-LSN counter used for unlogged relations could advance past the
WAL insertion point; and if it did happen, attempting to flush WAL
through that location would fail, with disastrous system-wide
consequences. To make sure that can't happen, skip the flush if the
buffer isn't permanent.

A related problem is that you're examining the buffer header flags
without taking the buffer-header spinlock. I'm not sure this can
actually matter, because we'll always hold the content lock on the
buffer at this point, which presumably precludes any operation that
might flip that bit, but it looks like the callers all have the buffer
header flags conveniently available, so maybe they should pass that
information down to FlushBuffer. That would also avoid accessing that
word in memory both before and after releasing the spinlock (all
callers have just released it) which can lead to memory-access stalls.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-28 09:04:22
Message-ID: 51063F16.1010609@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23.01.2013 17:30, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 4:04 AM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
>> I guess my earlier patch, which was directly incrementing
>> ControlFile->unloggedLSN counter was the concern as it will take
>> ControlFileLock several times.
>>
>> In this version of patch I did what Robert has suggested. At start of the
>> postmaster, copying unloggedLSn value to XLogCtl, a shared memory struct.
>> And
>> in all access to unloggedLSN, using this shared variable using a SpinLock.
>> And since we want to keep this counter persistent across clean shutdown,
>> storing it in ControlFile before updating it.
>>
>> With this approach, we are updating ControlFile only when we shutdown the
>> server, rest of the time we are having a shared memory counter. That means
>> we
>> are not touching pg_control every other millisecond or so. Also since we are
>> not caring about crashes, XLogging this counter like OID counter is not
>> required as such.
>
> On a quick read-through this looks reasonable to me, but others may
> have different opinions, and I haven't reviewed in detail.
> ...
> [a couple of good points]

In addition to those things Robert pointed out:

> /*
> + * Temporary GiST indexes are not WAL-logged, but we need LSNs to detect
> + * concurrent page splits anyway. GetXLogRecPtrForUnloggedRel() provides a fake
> + * sequence of LSNs for that purpose. Each call generates an LSN that is
> + * greater than any previous value returned by this function in the same
> + * session using static counter
> + * Similarily unlogged GiST indexes are also not WAL-logged. But we need a
> + * persistent counter across clean shutdown. Use counter from ControlFile which
> + * is copied in XLogCtl.unloggedLSN to accomplish that
> + * If relation is UNLOGGED, return persistent counter from XLogCtl else return
> + * session wide temporary counter
> + */
> +XLogRecPtr
> +GetXLogRecPtrForUnloggedRel(Relation rel)

From a modularity point of view, it's not good that you xlog.c needs to
know about Relation struct. Perhaps move the logic to check the relation
is unlogged or not to a function in gistutil.c, and only have a small
GetUnloggedLSN() function in xlog.c

I'd suggest adding a separate spinlock to protect unloggedLSN. I'm not
sure if there's much contention on XLogCtl->info_lck today, but
nevertheless it'd be better to keep this separate, also from a
modularity point of view.

> @@ -7034,6 +7078,8 @@ CreateCheckPoint(int flags)
> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> ControlFile->state = DB_SHUTDOWNING;
> ControlFile->time = (pg_time_t) time(NULL);
> + /* Store unloggedLSN value as we want it persistent across shutdown */
> + ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
> UpdateControlFile();
> LWLockRelease(ControlFileLock);
> }

This needs to acquire the spinlock to read unloggedLSN.

Do we need to do anything to unloggedLSN in pg_resetxlog?

- Heikki


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-29 02:39:27
Message-ID: CA+Tgmobya4Lvja+UyFo7-wxW2tyy6N0_=9_TSxaS0iyjp78Scw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 28, 2013 at 4:04 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> Do we need to do anything to unloggedLSN in pg_resetxlog?

Does the server go into recovery after pg_resetxlog? If so, no. If
not, probably, but I have no idea what. There's no "safe" value in
that case; what we ought to do is force a reset of all unlogged
relations, or just punt and hope nothing bad happens (which after all
is what pg_resetxlog is mostly about anyway).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-01-29 12:33:58
Message-ID: CAM2+6=UTy=Ly6VfhP_ni8BJtuyeFEtM_H6Ms=-OUCLVcBmA+sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Heikki,

On Mon, Jan 28, 2013 at 2:34 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 23.01.2013 17:30, Robert Haas wrote:
>
>> On Wed, Jan 23, 2013 at 4:04 AM, Jeevan Chalke
>> <jeevan(dot)chalke(at)enterprisedb(dot)**com <jeevan(dot)chalke(at)enterprisedb(dot)com>>
>> wrote:
>>
>>> I guess my earlier patch, which was directly incrementing
>>>
>>> ControlFile->unloggedLSN counter was the concern as it will take
>>> ControlFileLock several times.
>>>
>>> In this version of patch I did what Robert has suggested. At start of the
>>> postmaster, copying unloggedLSn value to XLogCtl, a shared memory struct.
>>> And
>>> in all access to unloggedLSN, using this shared variable using a
>>> SpinLock.
>>> And since we want to keep this counter persistent across clean shutdown,
>>> storing it in ControlFile before updating it.
>>>
>>> With this approach, we are updating ControlFile only when we shutdown the
>>> server, rest of the time we are having a shared memory counter. That
>>> means
>>> we
>>> are not touching pg_control every other millisecond or so. Also since we
>>> are
>>> not caring about crashes, XLogging this counter like OID counter is not
>>> required as such.
>>>
>>
>> On a quick read-through this looks reasonable to me, but others may
>> have different opinions, and I haven't reviewed in detail.
>>
> > ...
>
>> [a couple of good points]
>>
>
> In addition to those things Robert pointed out:
>
> /*
>> + * Temporary GiST indexes are not WAL-logged, but we need LSNs to detect
>> + * concurrent page splits anyway. GetXLogRecPtrForUnloggedRel() provides
>> a fake
>> + * sequence of LSNs for that purpose. Each call generates an LSN that is
>> + * greater than any previous value returned by this function in the same
>> + * session using static counter
>> + * Similarily unlogged GiST indexes are also not WAL-logged. But we need
>> a
>> + * persistent counter across clean shutdown. Use counter from
>> ControlFile which
>> + * is copied in XLogCtl.unloggedLSN to accomplish that
>> + * If relation is UNLOGGED, return persistent counter from XLogCtl else
>> return
>> + * session wide temporary counter
>> + */
>> +XLogRecPtr
>> +GetXLogRecPtrForUnloggedRel(**Relation rel)
>>
>
> From a modularity point of view, it's not good that you xlog.c needs to
> know about Relation struct. Perhaps move the logic to check the relation is
> unlogged or not to a function in gistutil.c, and only have a small
> GetUnloggedLSN() function in xlog.c
>

I kept a function as is, but instead sending Relation as a parameter, it
now takes boolean, isUnlogged. Added a MACRO for that.

>
> I'd suggest adding a separate spinlock to protect unloggedLSN. I'm not
> sure if there's much contention on XLogCtl->info_lck today, but
> nevertheless it'd be better to keep this separate, also from a modularity
> point of view.
>

Hmm. OK.
Added ulsn_lck for this.

>
> @@ -7034,6 +7078,8 @@ CreateCheckPoint(int flags)
>> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>> ControlFile->state = DB_SHUTDOWNING;
>> ControlFile->time = (pg_time_t) time(NULL);
>> + /* Store unloggedLSN value as we want it persistent
>> across shutdown */
>> + ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
>> UpdateControlFile();
>> LWLockRelease(ControlFileLock)**;
>> }
>>
>
> This needs to acquire the spinlock to read unloggedLSN.
>

OK. Done.

>
> Do we need to do anything to unloggedLSN in pg_resetxlog?
>

I guess NO.

Also, added Robert's comment in bufmgr.c
I am still unsure about the spinlock around buf flags as we are just
examining it.

Please let me know if I missed any.

Thanks

>
> - Heikki
>

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Attachment Content-Type Size
support_unlogged_gist_indexes_v3.patch application/octet-stream 11.6 KB

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-02-11 06:44:26
Message-ID: CAM2+6=WcCELXe_0D4W2_2nkZAg6Hn1eLkg-JwhZ9zGt8yqj4fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Any review comments on this ?

On Tue, Jan 29, 2013 at 6:03 PM, Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

> Hi Heikki,
>
>
> On Mon, Jan 28, 2013 at 2:34 PM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 23.01.2013 17:30, Robert Haas wrote:
>>
>>> On Wed, Jan 23, 2013 at 4:04 AM, Jeevan Chalke
>>> <jeevan(dot)chalke(at)enterprisedb(dot)**com <jeevan(dot)chalke(at)enterprisedb(dot)com>>
>>> wrote:
>>>
>>>> I guess my earlier patch, which was directly incrementing
>>>>
>>>> ControlFile->unloggedLSN counter was the concern as it will take
>>>> ControlFileLock several times.
>>>>
>>>> In this version of patch I did what Robert has suggested. At start of
>>>> the
>>>> postmaster, copying unloggedLSn value to XLogCtl, a shared memory
>>>> struct.
>>>> And
>>>> in all access to unloggedLSN, using this shared variable using a
>>>> SpinLock.
>>>> And since we want to keep this counter persistent across clean shutdown,
>>>> storing it in ControlFile before updating it.
>>>>
>>>> With this approach, we are updating ControlFile only when we shutdown
>>>> the
>>>> server, rest of the time we are having a shared memory counter. That
>>>> means
>>>> we
>>>> are not touching pg_control every other millisecond or so. Also since
>>>> we are
>>>> not caring about crashes, XLogging this counter like OID counter is not
>>>> required as such.
>>>>
>>>
>>> On a quick read-through this looks reasonable to me, but others may
>>> have different opinions, and I haven't reviewed in detail.
>>>
>> > ...
>>
>>> [a couple of good points]
>>>
>>
>> In addition to those things Robert pointed out:
>>
>> /*
>>> + * Temporary GiST indexes are not WAL-logged, but we need LSNs to detect
>>> + * concurrent page splits anyway. GetXLogRecPtrForUnloggedRel()
>>> provides a fake
>>> + * sequence of LSNs for that purpose. Each call generates an LSN that is
>>> + * greater than any previous value returned by this function in the same
>>> + * session using static counter
>>> + * Similarily unlogged GiST indexes are also not WAL-logged. But we
>>> need a
>>> + * persistent counter across clean shutdown. Use counter from
>>> ControlFile which
>>> + * is copied in XLogCtl.unloggedLSN to accomplish that
>>> + * If relation is UNLOGGED, return persistent counter from XLogCtl else
>>> return
>>> + * session wide temporary counter
>>> + */
>>> +XLogRecPtr
>>> +GetXLogRecPtrForUnloggedRel(**Relation rel)
>>>
>>
>> From a modularity point of view, it's not good that you xlog.c needs to
>> know about Relation struct. Perhaps move the logic to check the relation is
>> unlogged or not to a function in gistutil.c, and only have a small
>> GetUnloggedLSN() function in xlog.c
>>
>
> I kept a function as is, but instead sending Relation as a parameter, it
> now takes boolean, isUnlogged. Added a MACRO for that.
>
>
>>
>> I'd suggest adding a separate spinlock to protect unloggedLSN. I'm not
>> sure if there's much contention on XLogCtl->info_lck today, but
>> nevertheless it'd be better to keep this separate, also from a modularity
>> point of view.
>>
>
> Hmm. OK.
> Added ulsn_lck for this.
>
>
>>
>> @@ -7034,6 +7078,8 @@ CreateCheckPoint(int flags)
>>> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>>> ControlFile->state = DB_SHUTDOWNING;
>>> ControlFile->time = (pg_time_t) time(NULL);
>>> + /* Store unloggedLSN value as we want it persistent
>>> across shutdown */
>>> + ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
>>> UpdateControlFile();
>>> LWLockRelease(ControlFileLock)**;
>>> }
>>>
>>
>> This needs to acquire the spinlock to read unloggedLSN.
>>
>
> OK. Done.
>
>
>>
>> Do we need to do anything to unloggedLSN in pg_resetxlog?
>>
>
> I guess NO.
>
> Also, added Robert's comment in bufmgr.c
> I am still unsure about the spinlock around buf flags as we are just
> examining it.
>
> Please let me know if I missed any.
>
> Thanks
>
>
>>
>> - Heikki
>>
>
>
>
> --
> Jeevan B Chalke
> Senior Software Engineer, R&D
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Phone: +91 20 30589500
>
> Website: www.enterprisedb.com
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the intended
> recipient, any use, dissemination, distribution, retention, archiving, or
> copying of this communication is strictly prohibited. If you have received
> this e-mail in error, please notify the sender immediately by reply e-mail
> and delete this message.
>

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-02-11 13:58:06
Message-ID: 5118F8EE.2050805@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11.02.2013 08:44, Jeevan Chalke wrote:
> Hi,
>
> Any review comments on this ?

Sorry for the delay.

I did some minor cleanup on this. I added code to pg_resetxlog and
pg_controldata to reset / display the current unlogged LSN value. I
moved the static counter, for temporary relations, back to gistutil.c,
so that the function in xlog.c only deals with unlogged relations. It's
debatable if that's better, but IMHO it is. Also, the unloggedLSN
counter is now reset to 1 at crash recovery. There's no fundamental
reason it needs to be reset, rather than just continue from the last
shutdowned value like nothing happened, but it seems cleaner that way.

I'm happy with this now, but please take one more look before I commit this.

- Heikki

Attachment Content-Type Size
unlogged-gist-indexes-v4.patch text/x-diff 12.5 KB

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unlogged tables vs. GIST
Date: 2013-02-12 09:20:21
Message-ID: CAM2+6=XO5cEM4_5e79zRVCvpz03gnGJdRm14SboWAeF95jUb1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Heikki,

On Mon, Feb 11, 2013 at 7:28 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 11.02.2013 08:44, Jeevan Chalke wrote:
>
>> Hi,
>>
>> Any review comments on this ?
>>
>
> Sorry for the delay.
>
> I did some minor cleanup on this. I added code to pg_resetxlog and
> pg_controldata to reset / display the current unlogged LSN value. I moved
> the static counter, for temporary relations, back to gistutil.c, so that
> the function in xlog.c only deals with unlogged relations. It's debatable
> if that's better, but IMHO it is. Also, the unloggedLSN counter is now
> reset to 1 at crash recovery. There's no fundamental reason it needs to be
> reset, rather than just continue from the last shutdowned value like
> nothing happened, but it seems cleaner that way.
>
> I'm happy with this now, but please take one more look before I commit
> this.
>

This morning I had a look over this. But it seems that you have already
committed it.

Changes are fine and even better.
No issues with my unit testing too.

Thanks for the commit.

>
> - Heikki
>

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.