Re: init_sequence spill to hash table

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: init_sequence spill to hash table
Date: 2013-11-15 09:43:15
Message-ID: 20131115094315.GA23517@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-11-15 14:22:30 +1300, David Rowley wrote:
> On Fri, Nov 15, 2013 at 3:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > Hi,
> >
> > On 2013-11-13 22:55:43 +1300, David Rowley wrote:
> > > Here http://www.postgresql.org/message-id/24278.1352922571@sss.pgh.pa.usthere
> > > was some talk about init_sequence being a bottleneck when many sequences
> > > are used in a single backend.
> > >
> > > The attached I think implements what was talked about in the above link
> > > which for me seems to double the speed of a currval() loop over 30000
> > > sequences. It goes from about 7 seconds to 3.5 on my laptop.
> >
> > I think it'd be a better idea to integrate the sequence caching logic
> > into the relcache. There's a comment about it:
> > * (We can't
> > * rely on the relcache, since it's only, well, a cache, and may decide to
> > * discard entries.)
> > but that's not really accurate anymore. We have the infrastructure for
> > keeping values across resets and we don't discard entries.
> >
> >
> I just want to check this idea against an existing todo item to move
> sequences into a single table, as I think by the sounds of it this binds
> sequences being relations even closer together.

> This had been on the back of my mind while implementing the hash table
> stuff for init_sequence and again when doing my benchmarks where I created
> 30000 sequences and went through the pain of having a path on my file
> system with 30000 8k files.

Well. But in which real world usecases is that actually the bottleneck?

> 1. The search_path stuff makes this a bit more complex. It sounds like this
> would require some duplication of the search_path logic.

I'd assumed that if we were to do this, the sequences themselves would
still continue to live in pg_class. Just instead of a relfilenode
containing their state it would be stored in an extra table.

> 2. There is also the problem with tracking object dependency.
>
> Currently:
> create sequence t_a_seq;
> create table t (a int not null default nextval('t_a_seq'));
> alter sequence t_a_seq owned by t.a;
> drop table t;
> drop sequence t_a_seq; -- already deleted by drop table t
> ERROR: sequence "t_a_seq" does not exist
>
> Moving sequences to a single table sounds like a special case for this
> logic.

There should already be code such dependencies.

4) Scalability problems: The one block sequences use already can be a
major contention issue when you have paralell inserts to the same
table. A workload which I, unlike a couple thousand unrelated sequences,
actually think is more realistic. So we'd need to force 1 sequence tuple
per block, which we currently cannot do.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-15 09:49:39 Re: init_sequence spill to hash table
Previous Message Heikki Linnakangas 2013-11-15 08:40:20 Re: [PATCH] pg_upgrade: support for btrfs copy-on-write clones