Quick Links

Re: PerformPortalClose warning in 7.3

Lists:	pgsql-generalpgsql-interfacespgsql-performance

From:	Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Docs: GIST
Date:	2002-12-11 18:07:09
Message-ID:	5.1.0.14.1.20021212010621.0273d040@mbox.jaring.my
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

I'm a bit confused.

In 7.3 is it possible to use GIST without using any of the stuff in
contrib/? If it is, how can it be done and in which cases should it be done?

The pgsql docs about indexes keep talking about GIST here and there, but I
can't seem to use them on anything. And there's no gist in the "ops" and
access method listing.

Having the docs say Postgresql provides GIST as one of the four access
methods, GIST supports multicolumn indexes, GIST etc, is just confusing if
the docs pertaining to indexes don't also say that in a default postgresql
installation you cannot create an index using GIST (if you can actually
create a GIST index "out of box", how??).

Another thing: is Eugene Selkov's 1998 message on GIST indexes in the 7.3
docs (see GIST Indexes) still valid? There's mention of Postgresql 6.3 and
postgres95 there too.

BTW, 7.3 is GREAT! Multiple col/row returns, prepare queries, schemas etc.
Also set enable_seq_scan=off can get rolled back to whatever it was before
now right? Cool, coz I have to force index use for a particular select.

Thanks to the postgresql dev team and everyone involved!

Cheerio,
Link.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Docs: GIST
Date:	2002-12-12 05:41:33
Message-ID:	3455.1039671693@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my> writes:
> I'm a bit confused.
> In 7.3 is it possible to use GIST without using any of the stuff in
> contrib/?

No, because there are no GIST opclasses in the standard installation.
They are only in contrib.

Yes, that's a bit silly. As GIST improves out of the "academic toy"
category into the "production tool" category, I expect we will migrate
GIST opclasses into the standard installation.

regards, tom lane

From:	Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Docs: GIST
Date:	2002-12-12 21:02:28
Message-ID:	5.1.0.14.1.20021213043655.027360b0@mbox.jaring.my
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

I did figure it out eventually but it'll be clearer to mention that in the
docs - e.g. the only way to use GIST is to use the stuff in contrib. Coz I
had a bit of wishful thinking - thought that maybe some bits of GIST might
have at least become useable by default in 7.3 e.g. the simpler stuff (the
docs didn't quite contradict that).

Definitely not asking for it to be rushed in tho. Software is more reliable
when the developers know what they are doing, and they get to release stuff
when they think it's ready, not when others say it is.

Cheerio,
Link.

At 12:41 AM 12/12/02 -0500, Tom Lane wrote:

>Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my> writes:
> > I'm a bit confused.
> > In 7.3 is it possible to use GIST without using any of the stuff in
> > contrib/?
>
>No, because there are no GIST opclasses in the standard installation.
>They are only in contrib.

From:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To:	pgsql-general(at)postgresql(dot)org
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	CLUSTER command
Date:	2002-12-12 21:31:46
Message-ID:	3DF90042.E5290B7F@nsd.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Hi all,

I just read about the cluster command and was a little (very)
disapointed.
Clustered tables do not remain clustered after inserts.
Clustered tables are usefull when the table is very large and there are
few different keys.

Because the table file is already extended (2G limit) using different
files extension (.N)
how complicated (modifying the code) would it be to have the table files
split according to the cluster key?

This would:

Greatly improve performance when the cluster key in included in search
criteria.
Allow for a much larger table before a file has to be split (.N).
Simplify the management of symblinks (that's something else we need to
look at).
The index file for that field would no longer be required.

Of course, there should be only one cluster key per table.
The length the "key" should be short and the number of unique key should
be low as well.

SO... ?

JLL

From:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To:	pgsql-general(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject:	Re: [PERFORM] CLUSTER command
Date:	2002-12-12 21:40:24
Message-ID:	3DF90248.9BAF931C@nsd.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Oh, and something else,

I think the syntax should be:

Cluster <table> on <attribute>

Maybe inheritance can be use here.
The problem is creating the new "table" when a new key is detected.
I know, I can use rules, but the optimiser is not aware of the
clustering.

Enough from me for now.

What do you think?

JLL

Jean-Luc Lachance wrote:
>
> Hi all,
>
> I just read about the cluster command and was a little (very)
> disapointed.
> Clustered tables do not remain clustered after inserts.
> Clustered tables are usefull when the table is very large and there are
> few different keys.
>
> Because the table file is already extended (2G limit) using different
> files extension (.N)
> how complicated (modifying the code) would it be to have the table files
> split according to the cluster key?
>
> This would:
>
> Greatly improve performance when the cluster key in included in search
> criteria.
> Allow for a much larger table before a file has to be split (.N).
> Simplify the management of symblinks (that's something else we need to
> look at).
> The index file for that field would no longer be required.
>
> Of course, there should be only one cluster key per table.
> The length the "key" should be short and the number of unique key should
> be low as well.
>
> SO... ?
>
> JLL
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

From:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
Cc:	<pgsql-general(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: CLUSTER command
Date:	2002-12-12 22:03:56
Message-ID:	20021212135913.Q11714-100000@megazone23.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

On Thu, 12 Dec 2002, Jean-Luc Lachance wrote:

> Hi all,
>
> I just read about the cluster command and was a little (very)
> disapointed.
> Clustered tables do not remain clustered after inserts.
> Clustered tables are usefull when the table is very large and there are
> few different keys.
>
>
> Because the table file is already extended (2G limit) using different
> files extension (.N)
> how complicated (modifying the code) would it be to have the table files
> split according to the cluster key?

I'd vote against changing the existing CLUSTER since the existing CLUSTER
while not great does handle many different key values fairly well as well
and this solution wouldn't. Many different key values are still
useful to cluster if you're doing searches over ranges since it lowers the
number of heap file reads necessary. If done this should probably be
separate from the existing cluster or at least both versions should be
possible.

From:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject:	Re: CLUSTER command
Date:	2002-12-12 22:15:37
Message-ID:	3DF90A89.61040846@nsd.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

The current cluster command is equivalant to:

create b as select * from a order by i;

So you would not be loosing anything.

Stephan Szabo wrote:
>
> On Thu, 12 Dec 2002, Jean-Luc Lachance wrote:
>
> > Hi all,
> >
> > I just read about the cluster command and was a little (very)
> > disapointed.
> > Clustered tables do not remain clustered after inserts.
> > Clustered tables are usefull when the table is very large and there are
> > few different keys.
> >
> >
> > Because the table file is already extended (2G limit) using different
> > files extension (.N)
> > how complicated (modifying the code) would it be to have the table files
> > split according to the cluster key?
>
> I'd vote against changing the existing CLUSTER since the existing CLUSTER
> while not great does handle many different key values fairly well as well
> and this solution wouldn't. Many different key values are still
> useful to cluster if you're doing searches over ranges since it lowers the
> number of heap file reads necessary. If done this should probably be
> separate from the existing cluster or at least both versions should be
> possible.

From:	johnnnnnn <john(at)phaedrusdeinus(dot)org>
To:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject:	Re: [GENERAL] CLUSTER command
Date:	2002-12-12 22:26:41
Message-ID:	20021212222641.GA8278@performics.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

On Thu, Dec 12, 2002 at 02:03:56PM -0800, Stephan Szabo wrote:
> I'd vote against changing the existing CLUSTER since the existing
> CLUSTER while not great does handle many different key values fairly
> well as well and this solution wouldn't.

I would agree. What's being proposed sounds much more like table
partitioning than clustering.

That's not to say that the existing CLUSTER couldn't be improved, at
the very least to the point where it allows inserts to respect the
clustered structure. That's a post for another thread, though.

-johnnnnnnnnnnn

From:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
Cc:	<pgsql-general(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: CLUSTER command
Date:	2002-12-12 22:27:02
Message-ID:	20021212142547.J12742-100000@megazone23.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

On Thu, 12 Dec 2002, Jean-Luc Lachance wrote:

> The current cluster command is equivalant to:
>
> create b as select * from a order by i;
>
> So you would not be loosing anything.

Except for the fact that the CLUSTER is intended (although
I don't know if it does yet) to retain things like constraints
and other indexes which the above doesn't.

From:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To:	johnnnnnn <john(at)phaedrusdeinus(dot)org>
Cc:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject:	Re: [PERFORM] CLUSTER command
Date:	2002-12-12 22:39:44
Message-ID:	3DF91030.A372DF7C@nsd.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

OK fine,

Let's create a new command:

PARTITION <table> ON <attribute>

I did not want to start a fight. You can keep the CLUSTER command as it
is.

I still think clustering/partitioning would be a great idea.
This is what I want to talk about. Look at the original post for the
reasons.

JLL

johnnnnnn wrote:
>
> On Thu, Dec 12, 2002 at 02:03:56PM -0800, Stephan Szabo wrote:
> > I'd vote against changing the existing CLUSTER since the existing
> > CLUSTER while not great does handle many different key values fairly
> > well as well and this solution wouldn't.
>
> I would agree. What's being proposed sounds much more like table
> partitioning than clustering.
>
> That's not to say that the existing CLUSTER couldn't be improved, at
> the very least to the point where it allows inserts to respect the
> clustered structure. That's a post for another thread, though.
>
> -johnnnnnnnnnnn
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

From:	Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>, pgsql-general(at)postgresql(dot)org
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: CLUSTER command
Date:	2002-12-12 22:49:12
Message-ID:	5.1.0.14.1.20021213063727.0278b270@mbox.jaring.my
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Splitting table files by indexed value may not help if the operating system
doesn't manage to keep the tables unfragmented on disk. I suppose the O/S
should know how to do it though.

Cheerio,
Link.

At 04:31 PM 12/12/02 -0500, Jean-Luc Lachance wrote:

>Hi all,
>
>I just read about the cluster command and was a little (very)
>disapointed.
>Clustered tables do not remain clustered after inserts.
>Clustered tables are usefull when the table is very large and there are
>few different keys.
>
>
>Because the table file is already extended (2G limit) using different
>files extension (.N)
>how complicated (modifying the code) would it be to have the table files
>split according to the cluster key?

From:	Michael Engelhart <mengelhart(at)mac(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	PerformPortalClose warning in 7.3
Date:	2002-12-12 22:50:55
Message-ID:	2C6CDB6A-0E24-11D7-8E88-000393A48A3C@mac.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Hi -
I've been running PostgreSQL 7.3 on Mac OS X 10.2 since it was released
and it's been running fine. I'm using pyPgSQL 2.3 for client side
programming which also was working great until tonight. Now whenever
I do any query of any type, I get warnings like this:

WARNING: PerformPortalClose: portal "pgsql_00179f10" not found

It "appears" that everything is still working the way it was but it's a
bit discomforting to have these show up on my screen while running my
applications.

Anyone that can explain this?

Here's a tiny bit of Python sample code that I used to make sure it
wasn't my other code causing the problems

from pyPgSQL import PgSQL

dbname = "template1"
conn = PgSQL.connect(database=dbname)
cursor = conn.cursor()
sql = "SELECT now()";
cursor.execute(sql)
res = cursor.fetchall()
for i in res:
print i
cursor.close()
conn.commit()

strangely if I remove the last 2 lines (cursor.close() and
conn.commit()) I don't get the errors.

Also I don't notice that I don't have this problem with psql command
line either. Is this the Python API causing this?

Thanks for any help

Mike

From:	johnnnnnn <john(at)phaedrusdeinus(dot)org>
To:	pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject:	Re: [GENERAL] CLUSTER command
Date:	2002-12-12 23:00:02
Message-ID:	20021212230002.GC8278@performics.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

On Thu, Dec 12, 2002 at 05:39:44PM -0500, Jean-Luc Lachance wrote:
> Let's create a new command:
>
> PARTITION <table> ON <attribute>
<snip>
> Because the table file is already extended (2G limit) using
> different files extension (.N)
> how complicated (modifying the code) would it be to have the table
> files split according to the cluster key?

I think the code changes would be complicated. Just at a 30-second
consideration, this would need to touch:
- all sql (selects, inserts, updates, deletes)
- vacuuming
- indexing
- statistics gathering
- existing clustering

That's not to say it's not worthwhile to look into, but it's big.

All of that aside, a view over unions is possible now:

create table u1 (...);
create table u2 (...);
create table u3 (...);

create view uv as (select "A" as partition_key, ... from u1
union all
select "B" as partition_key, ... from u2
union all
select "C" as partition_key, ... from u3);

That keeps the tables in different files on-disk while still allowing
you to query against all of them. You need to index them separately
and logic is necessary when changing data.

Hope that helps.

-johnnnnnnnnnn

From:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To:	johnnnnnn <john(at)phaedrusdeinus(dot)org>
Cc:	<pgsql-performance(at)postgresql(dot)org>, <pgsql-general(at)postgresql(dot)org>
Subject:	Re: [PERFORM] CLUSTER command
Date:	2002-12-13 00:03:47
Message-ID:	20021212154146.T13718-100000@megazone23.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

On Thu, 12 Dec 2002, johnnnnnn wrote:

> On Thu, Dec 12, 2002 at 05:39:44PM -0500, Jean-Luc Lachance wrote:
> > Let's create a new command:
> >
> > PARTITION <table> ON <attribute>
> <snip>
> > Because the table file is already extended (2G limit) using
> > different files extension (.N)
> > how complicated (modifying the code) would it be to have the table
> > files split according to the cluster key?
>

> I think the code changes would be complicated. Just at a 30-second
> consideration, this would need to touch:
> - all sql (selects, inserts, updates, deletes)
> - vacuuming
> - indexing
> - statistics gathering
> - existing clustering

I think his idea was to treat it similarly to the way that the
system treats tables >2G with .N files. The only thing is that
I believe the code that deals with that wouldn't be particularly
easy to change to do it though, but I've only taken a cursory look at
what I think is the place that does that(storage/smgr/md.c). Some sort of
good partitioning system would be nice though.

> create table u1 (...);
> create table u2 (...);
> create table u3 (...);
>
> create view uv as (select "A" as partition_key, ... from u1
> union all
> select "B" as partition_key, ... from u2
> union all
> select "C" as partition_key, ... from u3);
>
> That keeps the tables in different files on-disk while still allowing
> you to query against all of them. You need to index them separately
> and logic is necessary when changing data.

Unfortunately, I think that the optimizer isn't going to do what you'd
hope here and scan only the appropriate table if you were to say
partition_key='A' and foo='bar'. I'd love to be shown that I'm wrong, but
the best I could see hoping for would be that if partition_key was part of
u1-u3 and there was an index on partition_key,foo that it could use that
and do minimal work on the other tables.

In addition, doing something like the above is a nightmare if you don't
know beforehand what the partitions should be (for example if you know
there aren't alot of distinct values, but you don't know what they are) or
for that matter even with 10-15 partitions, writing the rules and such
would probably be really error prone.

From:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc:	johnnnnnn <john(at)phaedrusdeinus(dot)org>, pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject:	Re: [PERFORM] CLUSTER command
Date:	2002-12-13 00:47:19
Message-ID:	20021213004719.GA19217@dcc.uchile.cl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

On Thu, Dec 12, 2002 at 04:03:47PM -0800, Stephan Szabo wrote:
> On Thu, 12 Dec 2002, johnnnnnn wrote:
>
> > I think the code changes would be complicated. Just at a 30-second
> > consideration, this would need to touch:
> > - all sql (selects, inserts, updates, deletes)
> > - vacuuming
> > - indexing
> > - statistics gathering
> > - existing clustering
>
> I think his idea was to treat it similarly to the way that the
> system treats tables >2G with .N files. The only thing is that
> I believe the code that deals with that wouldn't be particularly
> easy to change to do it though, but I've only taken a cursory look at
> what I think is the place that does that(storage/smgr/md.c). Some sort of
> good partitioning system would be nice though.

I don't think this is doable without a huge amount of work. The storage
manager doesn't know anything about what is in a page, let alone a
tuple. And it shouldn't, IMHO. Upper levels don't know how are pages
organized in disk; they don't know about .1 segments and so on, and they
shouldn't.

I think this kind of partition doesn't buy too much. I would really
like to have some kind of auto-clustering, but it should be implemented
in some upper level; e.g., by leaving some empty space in pages for
future tuples, and arranging the whole heap again when it runs out of
free space somewhere. Note that this is very far from the storage
manager.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La realidad se compone de muchos sueños, todos ellos diferentes,
pero en cierto aspecto, parecidos..." (Yo, hablando de sueños eróticos)

From:	"Charles H(dot) Woloszynski" <chw(at)clearmetrix(dot)com>
To:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc:	pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject:	Re: [PERFORM] CLUSTER command
Date:	2002-12-13 01:06:35
Message-ID:	3DF9329B.1020908@clearmetrix.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

I think Oracle does something like this with its clustering. You set a
%fill and Oracle uses this when doing inserts into a segment and when to
add a new one. There is also some control over the grouping of data
within a page. I don't have an Oracle manual present, but I think the
clustering works on a specific index.

I agree that adding auto-clustering would be a very good thing and that
we can learn about functionality by studying what other applications
have already done and if/how those strategies were successful.

Charlie

Alvaro Herrera wrote:

>On Thu, Dec 12, 2002 at 04:03:47PM -0800, Stephan Szabo wrote:
>
>
>>On Thu, 12 Dec 2002, johnnnnnn wrote:
>>
>>
>>
>>>I think the code changes would be complicated. Just at a 30-second
>>>consideration, this would need to touch:
>>>- all sql (selects, inserts, updates, deletes)
>>>- vacuuming
>>>- indexing
>>>- statistics gathering
>>>- existing clustering
>>>
>>>
>>I think his idea was to treat it similarly to the way that the
>>system treats tables >2G with .N files. The only thing is that
>>I believe the code that deals with that wouldn't be particularly
>>easy to change to do it though, but I've only taken a cursory look at
>>what I think is the place that does that(storage/smgr/md.c). Some sort of
>>good partitioning system would be nice though.
>>
>>
>
>I don't think this is doable without a huge amount of work. The storage
>manager doesn't know anything about what is in a page, let alone a
>tuple. And it shouldn't, IMHO. Upper levels don't know how are pages
>organized in disk; they don't know about .1 segments and so on, and they
>shouldn't.
>
>I think this kind of partition doesn't buy too much. I would really
>like to have some kind of auto-clustering, but it should be implemented
>in some upper level; e.g., by leaving some empty space in pages for
>future tuples, and arranging the whole heap again when it runs out of
>free space somewhere. Note that this is very far from the storage
>manager.
>
>
>

Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com

From:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To:	Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc:	johnnnnnn <john(at)phaedrusdeinus(dot)org>, <pgsql-performance(at)postgresql(dot)org>, <pgsql-general(at)postgresql(dot)org>
Subject:	Re: [PERFORM] CLUSTER command
Date:	2002-12-13 02:11:50
Message-ID:	20021212175208.B15052-100000@megazone23.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

On Thu, 12 Dec 2002, Alvaro Herrera wrote:

> On Thu, Dec 12, 2002 at 04:03:47PM -0800, Stephan Szabo wrote:
> > On Thu, 12 Dec 2002, johnnnnnn wrote:
> >
> > > I think the code changes would be complicated. Just at a 30-second
> > > consideration, this would need to touch:
> > > - all sql (selects, inserts, updates, deletes)
> > > - vacuuming
> > > - indexing
> > > - statistics gathering
> > > - existing clustering
> >
> > I think his idea was to treat it similarly to the way that the
> > system treats tables >2G with .N files. The only thing is that
> > I believe the code that deals with that wouldn't be particularly
> > easy to change to do it though, but I've only taken a cursory look at
> > what I think is the place that does that(storage/smgr/md.c). Some sort of
> > good partitioning system would be nice though.
>
> I don't think this is doable without a huge amount of work. The storage
> manager doesn't know anything about what is in a page, let alone a
> tuple. And it shouldn't, IMHO. Upper levels don't know how are pages
> organized in disk; they don't know about .1 segments and so on, and they
> shouldn't.

Which is part of why I said it wouldn't be easy to change to do that,
there's no good way to communicate that information. Like I said, I
didn't look deeply, but I had to look though, because you can never tell
with bits of old university code to do mostly what you want that haven't
been exercised in years floating around.

> I think this kind of partition doesn't buy too much. I would really
> like to have some kind of auto-clustering, but it should be implemented
> in some upper level; e.g., by leaving some empty space in pages for
> future tuples, and arranging the whole heap again when it runs out of
> free space somewhere. Note that this is very far from the storage
> manager.

Auto clustering would be nice.

I think Jean-Luc's suggested partitioning mechanism has certain usage
patterns that it's a win for and most others that it's not. Since the
usage pattern I can think of (very large table with a small number of
breakdowns where your conditions are primarily on those breakdowns) aren't
even remotely in the domain of things I've worked with, I can't say
whether it'd end up really being a win to avoid the index reads for the
table.

From:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc:	johnnnnnn <john(at)phaedrusdeinus(dot)org>, pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject:	Re: [PERFORM] CLUSTER command
Date:	2002-12-13 16:42:25
Message-ID:	3DFA0DF1.DE3B5462@nsd.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Stephan,

Someone commented earlier about the separation/abstraction of the
storage manager.
I agree that it should not be done at the storage level.

Maybe a better idea, would be to create a new pg_partition table that
would have the functionality of an index on the key field and also be
used to point to a file/table ID.

That would be alot more work to code on thet planner though.

If a newly inherited table could also inherite the constraints and
indecies of its parent maybe things would be easier.

JLL

Stephan Szabo wrote:
>
> On Thu, 12 Dec 2002, johnnnnnn wrote:
>
> > On Thu, Dec 12, 2002 at 05:39:44PM -0500, Jean-Luc Lachance wrote:
> > > Let's create a new command:
> > >
> > > PARTITION <table> ON <attribute>
> > <snip>
> > > Because the table file is already extended (2G limit) using
> > > different files extension (.N)
> > > how complicated (modifying the code) would it be to have the table
> > > files split according to the cluster key?
> >
>
> > I think the code changes would be complicated. Just at a 30-second
> > consideration, this would need to touch:
> > - all sql (selects, inserts, updates, deletes)
> > - vacuuming
> > - indexing
> > - statistics gathering
> > - existing clustering
>
> I think his idea was to treat it similarly to the way that the
> system treats tables >2G with .N files. The only thing is that
> I believe the code that deals with that wouldn't be particularly
> easy to change to do it though, but I've only taken a cursory look at
> what I think is the place that does that(storage/smgr/md.c). Some sort of
> good partitioning system would be nice though.
>
> > create table u1 (...);
> > create table u2 (...);
> > create table u3 (...);
> >
> > create view uv as (select "A" as partition_key, ... from u1
> > union all
> > select "B" as partition_key, ... from u2
> > union all
> > select "C" as partition_key, ... from u3);
> >
> > That keeps the tables in different files on-disk while still allowing
> > you to query against all of them. You need to index them separately
> > and logic is necessary when changing data.
>
> Unfortunately, I think that the optimizer isn't going to do what you'd
> hope here and scan only the appropriate table if you were to say
> partition_key='A' and foo='bar'. I'd love to be shown that I'm wrong, but
> the best I could see hoping for would be that if partition_key was part of
> u1-u3 and there was an index on partition_key,foo that it could use that
> and do minimal work on the other tables.
>
> In addition, doing something like the above is a nightmare if you don't
> know beforehand what the partitions should be (for example if you know
> there aren't alot of distinct values, but you don't know what they are) or
> for that matter even with 10-15 partitions, writing the rules and such
> would probably be really error prone.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Michael Engelhart <mengelhart(at)mac(dot)com>
Cc:	PostgreSQL-interfaces <pgsql-interfaces(at)postgresql(dot)org>
Subject:	Re: [GENERAL] PerformPortalClose warning in 7.3
Date:	2002-12-14 22:58:30
Message-ID:	200212142258.gBEMwUH26494@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

I tried to reproduce the problem here but it seems my python is too old.
I am CC'ing this to the interfaces list in case someone there knows or
can test it.

---------------------------------------------------------------------------

Michael Engelhart wrote:
> Hi -
> I've been running PostgreSQL 7.3 on Mac OS X 10.2 since it was released
> and it's been running fine. I'm using pyPgSQL 2.3 for client side
> programming which also was working great until tonight. Now whenever
> I do any query of any type, I get warnings like this:
>
> WARNING: PerformPortalClose: portal "pgsql_00179f10" not found
>
> It "appears" that everything is still working the way it was but it's a
> bit discomforting to have these show up on my screen while running my
> applications.
>
> Anyone that can explain this?
>
> Here's a tiny bit of Python sample code that I used to make sure it
> wasn't my other code causing the problems
>
> from pyPgSQL import PgSQL
>
> dbname = "template1"
> conn = PgSQL.connect(database=dbname)
> cursor = conn.cursor()
> sql = "SELECT now()";
> cursor.execute(sql)
> res = cursor.fetchall()
> for i in res:
> print i
> cursor.close()
> conn.commit()
>
> strangely if I remove the last 2 lines (cursor.close() and
> conn.commit()) I don't get the errors.
>
> Also I don't notice that I don't have this problem with psql command
> line either. Is this the Python API causing this?
>
> Thanks for any help
>
> Mike
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

From:	Gerhard Haering <haering_postgresql(at)gmx(dot)de>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PerformPortalClose warning in 7.3
Date:	2002-12-23 01:18:46
Message-ID:	slrnb0cp0i.aji.haering_postgresql@lilith.my-fqdn.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Michael Engelhart wrote in gmane.comp.db.postgresql.general:
> Hi -
> I've been running PostgreSQL 7.3 on Mac OS X 10.2 since it was released
> and it's been running fine. I'm using pyPgSQL 2.3 for client side
> programming which also was working great until tonight. Now whenever
> I do any query of any type, I get warnings like this:
>
> WARNING: PerformPortalClose: portal "pgsql_00179f10" not found
>
> It "appears" that everything is still working the way it was but it's a
> bit discomforting to have these show up on my screen while running my
> applications.
>
> Anyone that can explain this?
>
> Here's a tiny bit of Python sample code that I used to make sure it
> wasn't my other code causing the problems
>
> from pyPgSQL import PgSQL
>
> dbname = "template1"
> conn = PgSQL.connect(database=dbname)
> cursor = conn.cursor()
> sql = "SELECT now()";
> cursor.execute(sql)
> res = cursor.fetchall()
> for i in res:
> print i
> cursor.close()
> conn.commit()

Actually, pyPgSQL is using PostgreSQL portals behind your back. This
is a feature!

To show this, we use the undocumented, but very handy toggleShowQuery
flag. The effect is that we can see what SQL pyPgSQL sends to the
backend using libpq (the lines staring with QUERY: below):

#v+
gerhard(at)gargamel:~$ python
Python 2.2.2 (#1, Nov 30 2002, 23:19:58)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyPgSQL import PgSQL
>>> con = PgSQL.connect()
>>> con.conn.toggleShowQuery
'On'
>>> cursor = con.cursor()
QUERY: BEGIN WORK
>>> cursor.execute("select * from test")
QUERY: DECLARE "PgSQL_0811F1EC" CURSOR FOR select * from test
QUERY: FETCH 1 FROM "PgSQL_0811F1EC"
QUERY: SELECT typname, -1 , typelem FROM pg_type WHERE oid = 23
QUERY: SELECT typname, -1 , typelem FROM pg_type WHERE oid = 1043
>>> result = cursor.fetchmany(5)
QUERY: FETCH 4 FROM "PgSQL_0811F1EC"
>>> result
[[None, 'A'], [None, 'B'], [None, 'C'], [None, 'F'], [None, 'F']]
>>> con.commit()
QUERY: CLOSE PgSQL_0811F1EC
QUERY: COMMIT WORK
>>>
#v-

This gives me a warning like this:

#v+
WARNING: PerformPortalClose: portal "pgsql_0811f1ec" not found
#v-

As far as I can see, the SQL pyPgSQL emits is perfectly ok. But I'd be
glad to hear a clarification.

> strangely if I remove the last 2 lines (cursor.close() and
> conn.commit()) I don't get the errors.
>
> Also I don't notice that I don't have this problem with psql command
> line either. Is this the Python API causing this?

If you use the same SQL statements using portals in psql, you get the
same warning (obviously). I just tried.

Gerhard (pyPgSQL developer)
--
Favourite database: http://www.postgresql.org/
Favourite programming language: http://www.python.org/
Combine the two: http://pypgsql.sf.net/
Embedded database for Python: http://pysqlite.sf.net/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gerhard Haering <haering_postgresql(at)gmx(dot)de>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PerformPortalClose warning in 7.3
Date:	2002-12-23 02:36:27
Message-ID:	28406.1040610987@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-interfaces pgsql-performance

Gerhard Haering <haering_postgresql(at)gmx(dot)de> writes:
> To show this, we use the undocumented, but very handy toggleShowQuery
> flag. The effect is that we can see what SQL pyPgSQL sends to the
> backend using libpq (the lines staring with QUERY: below):

> QUERY: DECLARE "PgSQL_0811F1EC" CURSOR FOR select * from test
> ...
> QUERY: CLOSE PgSQL_0811F1EC

This looks like a pyPgSQL bug to me. If it's going to use a mixed-case
name for the cursor then it must either always double-quote the name or
never do so. Failing to double-quote in the CLOSE command is wrong.

regards, tom lane