Re: Syntax for partitioning

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Subject: Re: Syntax for partitioning
Date: 2011-11-09 21:05:34
Message-ID: m2lirpj9b5.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> Whenever I cross the topic of
> partitioning in PostgreSQL, it's always a tad embarrassing to explain
> that it's still hacky compared to other database systems (and this
> came up again last week), so this is of particular interest to me. At

The more I think about this problem, the more I think that the reason
why we still don't have declarative partitioning is that it basically
sucks. Other vendors offer it because they couldn't do better, but they
are just asking the DBA to implement a service the system should be able
to care for itself.

Who knows better than PostgreSQL which part of the data are the most
often queried and how to best lay them out on disk to ensure QoS? If
you think that's the DBA, go ask Tom to implement query hints…

More seriously, partitioning in PostgreSQL could be mostly transparent
to the users and "just working": it's all about data storage locality
and we already have a sub-relation storage model. By using segment
exclusion and some level of automatic clustering (physical organization)
of data, we could have all the benefits of partitioning without the
hassle of maintaining yet another explicit level of data definition.

In particular, what part of the declarative partitioning system is
intended to take care about creating the right partitions before new
data are sent to them?

In a first implementation, we could decide to "partition" the data over
an index that the DBA has to pick, and then maintain a "segment index"
which is a map of the data distribution in the physical segments, for
the indexed columns. The granularity could be different and maybe
dynamic so that you start at a block level map index for very little
tables and grow up to a segment map index for huge tables that require
thousands of segments, 1GB each.

Then the system have to organize data modifications so that it optimizes
the ranges to be the smallest possible on each map entry. And the
executor then has to be able to consult that map at query time and
exclude whole segments of data (or blocks for little tables) when the
segment indexing is able to exclude data. With some tricks because we
realize that depending on the size of the portions you skip you might
not benefit from moving the head on the platter in another way than what
the ongoing seqscan does, but we already have GUCs about that.

We might also need some internal facilities to lock out per segment (or
rather "map entries") rather than per table so that we have something
like a WHERE clause support for TRUNCATE.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rudyar Cortés 2011-11-09 21:10:45 MPI programming in postgreSQL backend source code
Previous Message Robert Haas 2011-11-09 21:04:37 Re: a modest improvement to get_object_address()