Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

Lists: pgsql-hackerspgsql-patches
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Subject: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-27 20:07:29
Message-ID: 12059.1201464449@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Per today's -hackers discussion, add a GUC variable to allow clients to
disable the new synchronized-scanning behavior, and make pg_dump disable
sync scans so that it will reliably preserve row ordering. This is a
pretty trivial patch, but seeing how late we are in the 8.3 release
cycle, I thought I'd better post it for comment anyway.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 7.0 KB

From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-27 20:54:59
Message-ID: 36e682920801271254yf3a4901v899a972e58cd3076@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Jan 27, 2008 3:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Per today's -hackers discussion, add a GUC variable to allow clients to
> disable the new synchronized-scanning behavior, and make pg_dump disable
> sync scans so that it will reliably preserve row ordering. This is a
> pretty trivial patch, but seeing how late we are in the 8.3 release
> cycle, I thought I'd better post it for comment anyway.

+1

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah(dot)harris(at)enterprisedb(dot)com
Edison, NJ 08837 | http://www.enterprisedb.com/


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-27 21:54:58
Message-ID: 87hcgzrl1p.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:

> On Jan 27, 2008 3:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Per today's -hackers discussion, add a GUC variable to allow clients to
>> disable the new synchronized-scanning behavior, and make pg_dump disable
>> sync scans so that it will reliably preserve row ordering. This is a
>> pretty trivial patch, but seeing how late we are in the 8.3 release
>> cycle, I thought I'd better post it for comment anyway.
>
> +1

I liked the "synchronized_sequential_scans" idea myself.

But otherwise, sure.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!


From: "Gevik Babakhani" <pgdev(at)xs4all(dot)nl>
To: "'Gregory Stark'" <stark(at)enterprisedb(dot)com>, "'Jonah H(dot) Harris'" <jonah(dot)harris(at)gmail(dot)com>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-27 23:09:51
Message-ID: 000601c86139$bc5b2860$0a01a8c0@gevmus
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> >> Per today's -hackers discussion, add a GUC variable to
> allow clients
> >> to disable the new synchronized-scanning behavior, and
> make pg_dump
> >> disable sync scans so that it will reliably preserve row
> ordering.
> >> This is a pretty trivial patch, but seeing how late we are
> in the 8.3
> >> release cycle, I thought I'd better post it for comment anyway.
> >
> > +1
>
> I liked the "synchronized_sequential_scans" idea myself.
>
> But otherwise, sure.
>

this will save some hackwork I have to do for a import/export project I am
doing.

+1, please

Regards,
Gevik Babakhani
------------------------------------------------
PostgreSQL NL http://www.postgresql.nl
TrueSoftware BV http://www.truesoftware.nl
------------------------------------------------


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 00:37:42
Message-ID: 15573.1201480662@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> I liked the "synchronized_sequential_scans" idea myself.

The name is still open for discussion --- it's an easy
search-and-replace in the patch ...

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 01:34:27
Message-ID: 1201484067.1204.102.camel@goldbach
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
> I liked the "synchronized_sequential_scans" idea myself.

I think that's a bit too long. How about "synchronized_scans", or
"synchronized_seqscans"?

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 02:04:03
Message-ID: 16296.1201485843@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

[ redirecting thread to -hackers ]

Neil Conway <neilc(at)samurai(dot)com> writes:
> On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
>> I liked the "synchronized_sequential_scans" idea myself.

> I think that's a bit too long. How about "synchronized_scans", or
> "synchronized_seqscans"?

We have enable_seqscan already, so that last choice seems to fit in.

regards, tom lane


From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 04:09:57
Message-ID: 690C8C70-E633-46C3-BE0C-36F455F08636@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Jan 27, 2008, at 21:04 , Tom Lane wrote:

> [ redirecting thread to -hackers ]
>
> Neil Conway <neilc(at)samurai(dot)com> writes:
>> On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
>>> I liked the "synchronized_sequential_scans" idea myself.
>
>> I think that's a bit too long. How about "synchronized_scans", or
>> "synchronized_seqscans"?
>
> We have enable_seqscan already, so that last choice seems to fit in.

Would it make sense to match the plural as well?

Michael Glaesemann
grzm seespotcode net


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Glaesemann <grzm(at)seespotcode(dot)net>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 04:22:23
Message-ID: 17505.1201494143@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Michael Glaesemann <grzm(at)seespotcode(dot)net> writes:
>> Neil Conway <neilc(at)samurai(dot)com> writes:
>>> I think that's a bit too long. How about "synchronized_scans", or
>>> "synchronized_seqscans"?

> Would it make sense to match the plural as well?

Actually, the best suggestion I've seen so far is Guillaume's
"synchronize_seqscans" --- make it a verb phrase.

regards, tom lane


From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgreSQL(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 06:27:46
Message-ID: 479D75E2.7080002@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Per today's -hackers discussion, add a GUC variable to allow clients to
> disable the new synchronized-scanning behavior, and make pg_dump disable
> sync scans so that it will reliably preserve row ordering. This is a
> pretty trivial patch, but seeing how late we are in the 8.3 release
> cycle, I thought I'd better post it for comment anyway.
>
> regards, tom lane
>
>
> Index: src/bin/pg_dump/pg_dump.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v
> retrieving revision 1.481
> diff -c -r1.481 pg_dump.c
> *** src/bin/pg_dump/pg_dump.c 1 Jan 2008 19:45:55 -0000 1.481
> --- src/bin/pg_dump/pg_dump.c 27 Jan 2008 20:00:18 -0000
> ***************
> *** 553,558 ****
> --- 553,572 ----
> do_sql_command(g_conn, "SET DATESTYLE = ISO");
>
> /*
> + * If supported, set extra_float_digits so that we can dump float data
> + * exactly (given correctly implemented float I/O code, anyway)
> + */
> + if (g_fout->remoteVersion >= 70400)
> + do_sql_command(g_conn, "SET extra_float_digits TO 2");
> +
> + /*
> + * If synchronized scanning is supported, disable it, to prevent
> + * unpredictable changes in row ordering across a dump and reload.
> + */
> + if (g_fout->remoteVersion >= 80300)
> + do_sql_command(g_conn, "SET synchronized_scanning TO off");
> +
> + /*
> * Start serializable transaction to dump consistent data.
> */
> do_sql_command(g_conn, "BEGIN");
>
Hi,

Can somebody explain why it's important to load with
synchronized_scanning off?

do_sql_command(g_conn, "SET synchronized_scanning TO off");

Thanks

Russell Smith


From: Neil Conway <neilc(at)samurai(dot)com>
To: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgreSQL(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 06:54:24
Message-ID: 1201503264.1204.140.camel@goldbach
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2008-01-28 at 17:27 +1100, Russell Smith wrote:
> Can somebody explain why it's important to load with
> synchronized_scanning off?

*Loading* with synchronized scanning off is not important (and is not
implemented by the patch).

*Dumping* with synchronized scanning off is necessary to ensure that the
order of the rows in the pg_dump matches the on-disk order of the rows
in the table, which is important if you want to preserve the clustering
of the table data on restore.

See the -hackers thread:

http://markmail.org/message/qbytsco6oj2qkxsa

-Neil


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 06:55:16
Message-ID: 1d4e0c10801272255s69fb7e78qbbb49ae4c80a9be@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi Russell,

On Jan 28, 2008 7:27 AM, Russell Smith <mr-russ(at)pws(dot)com(dot)au> wrote:
> Can somebody explain why it's important to load with
> synchronized_scanning off?
>
> do_sql_command(g_conn, "SET synchronized_scanning TO off");

It's the start point of this patch. See this thread [
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php ]
for more information.

--
Guillaume


From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, neilc(at)samurai(dot)com
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 08:59:18
Message-ID: 479D9966.6000108@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Guillaume Smet wrote:
> Hi Russell,
>
> On Jan 28, 2008 7:27 AM, Russell Smith <mr-russ(at)pws(dot)com(dot)au> wrote:
>
>> Can somebody explain why it's important to load with
>> synchronized_scanning off?
>>
>> do_sql_command(g_conn, "SET synchronized_scanning TO off");
>>
>
> It's the start point of this patch. See this thread [
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php ]
> for more information
Sorry, total brain fade in interpreting the patch.

g_conn is our connection to the database, not the command we are dumping.

Sorry

Russell


From: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 11:25:45
Message-ID: E1539E0ED7043848906A8FF995BDA57902B63227@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> >> I liked the "synchronized_sequential_scans" idea myself.
>
> > I think that's a bit too long. How about "synchronized_scans", or
> > "synchronized_seqscans"?
>
> We have enable_seqscan already, so that last choice seems to fit in.

Yes looks good, how about synchronized_seqscan without plural ?

Andreas


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 17:14:39
Message-ID: 1201540479.4257.737.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 2008-01-27 at 21:04 -0500, Tom Lane wrote:
> [ redirecting thread to -hackers ]
>
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
> >> I liked the "synchronized_sequential_scans" idea myself.
>
> > I think that's a bit too long. How about "synchronized_scans", or
> > "synchronized_seqscans"?
>
> We have enable_seqscan already, so that last choice seems to fit in.

If we're going to have a GUC, we may as well make it as useful as
possible.

Currently we set synch scan on when the table is larger than 25% of
shared_buffers. So increasing shared_buffers can actually turn this
feature off.

Rather than having a boolean GUC, we should have a number and make the
parameter "synchronised_scan_threshold". This would then be the size of
a table above which we would perform synch scans. If its set to -1, then
this would be the same as "off" in all cases. The default value would be
25% of shared_buffers. (Think we can only do that at initdb time
currently).

If we do that, its clearly different from the enable_* parameters, so
the name is easier to decide ;-)

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 18:44:58
Message-ID: 93EDC9E5-06FD-4EE0-AF0D-117310024B81@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


On Jan 28, 2008, at 6:14 PM, Simon Riggs wrote:

> On Sun, 2008-01-27 at 21:04 -0500, Tom Lane wrote:
>> [ redirecting thread to -hackers ]
>>
>> Neil Conway <neilc(at)samurai(dot)com> writes:
>>> On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
>>>> I liked the "synchronized_sequential_scans" idea myself.
>>
>>> I think that's a bit too long. How about "synchronized_scans", or
>>> "synchronized_seqscans"?
>>
>> We have enable_seqscan already, so that last choice seems to fit in.
>
> If we're going to have a GUC, we may as well make it as useful as
> possible.
>
> Currently we set synch scan on when the table is larger than 25% of
> shared_buffers. So increasing shared_buffers can actually turn this
> feature off.
>
> Rather than having a boolean GUC, we should have a number and make the
> parameter "synchronised_scan_threshold". This would then be the
> size of
> a table above which we would perform synch scans. If its set to -1,
> then
> this would be the same as "off" in all cases. The default value
> would be
> 25% of shared_buffers. (Think we can only do that at initdb time
> currently).
>
> If we do that, its clearly different from the enable_* parameters, so
> the name is easier to decide ;-)

+1
This is in fact a lot more flexible and transparent.
It gives us a lot more control over the process and it is easy to
explain / understand.

best regards,

hans

--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, 2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 21:21:44
Message-ID: 7380.1201555304@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Rather than having a boolean GUC, we should have a number and make the
> parameter "synchronised_scan_threshold".

This would open up a can of worms I'd prefer not to touch, having to do
with whether the buffer-access-strategy behavior should track that or
not. As the note in heapam.c says,

* If the table is large relative to NBuffers, use a bulk-read access
* strategy and enable synchronized scanning (see syncscan.c). Although
* the thresholds for these features could be different, we make them the
* same so that there are only two behaviors to tune rather than four.

It's a bit late in the cycle to be revisiting that choice. Now we do
already have three behaviors to worry about (BAS on and syncscan off)
but throwing in a randomly settable knob will take it back to four,
and we have no idea how that fourth case will behave. The other tack we
could take (having the one GUC variable control both thresholds) is
not good since it will result in pg_dump trashing the buffer cache.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 22:39:41
Message-ID: 1201559981.4257.799.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Rather than having a boolean GUC, we should have a number and make the
> > parameter "synchronised_scan_threshold".
>
> This would open up a can of worms I'd prefer not to touch, having to do
> with whether the buffer-access-strategy behavior should track that or
> not. As the note in heapam.c says,
>
> * If the table is large relative to NBuffers, use a bulk-read access
> * strategy and enable synchronized scanning (see syncscan.c). Although
> * the thresholds for these features could be different, we make them the
> * same so that there are only two behaviors to tune rather than four.
>
> It's a bit late in the cycle to be revisiting that choice. Now we do
> already have three behaviors to worry about (BAS on and syncscan off)
> but throwing in a randomly settable knob will take it back to four,
> and we have no idea how that fourth case will behave. The other tack we
> could take (having the one GUC variable control both thresholds) is
> not good since it will result in pg_dump trashing the buffer cache.

OK, good points.

I'm still concerned that the thresholds gets higher as we increase
shared_buffers. We may be removing performance features as fast as we
gain performance when we set shared_buffers higher.

Might we agree that the threshold should be fixed at 8MB, rather than
varying upwards as we try to tune?

The objective of having a tuning hook would have been simply to
normalise the effects of increasing shared_buffers anyway, so fixing it
would solve the problem I see.

(8MB is the default, based upon 25% of 32MB).

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgreSQL(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-28 22:58:13
Message-ID: 1201561093.10057.655.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 2008-01-27 at 15:07 -0500, Tom Lane wrote:
> Per today's -hackers discussion, add a GUC variable to allow clients to
> disable the new synchronized-scanning behavior, and make pg_dump disable
> sync scans so that it will reliably preserve row ordering. This is a
> pretty trivial patch, but seeing how late we are in the 8.3 release
> cycle, I thought I'd better post it for comment anyway.

I apologize for the late reply, but I have one comment I'd like to add.

> + if (g_fout->remoteVersion >= 80300)
> + do_sql_command(g_conn, "SET synchronized_scanning TO off");
> +
> + /*
> * Start serializable transaction to dump consistent data.
> */

I think that pg_dump is a reasonable use case for synchoronized scans
when the table has not been clustered. It could potentially make pg_dump
have much less of a performance impact when run against an active
system.

I think it's worth considering enabling sync scans for non-clustered
tables if it would not interfere with the release. Of course, a painless
8.3 release is the top priority.

Regards,
Jeff Davis


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-28 23:13:18
Message-ID: 479E618E.5050309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
>> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>>> Rather than having a boolean GUC, we should have a number and make the
>>> parameter "synchronised_scan_threshold".
>> This would open up a can of worms I'd prefer not to touch, having to do
>> with whether the buffer-access-strategy behavior should track that or
>> not. As the note in heapam.c says,
>>
>> * If the table is large relative to NBuffers, use a bulk-read access
>> * strategy and enable synchronized scanning (see syncscan.c). Although
>> * the thresholds for these features could be different, we make them the
>> * same so that there are only two behaviors to tune rather than four.
>>
>> It's a bit late in the cycle to be revisiting that choice. Now we do
>> already have three behaviors to worry about (BAS on and syncscan off)
>> but throwing in a randomly settable knob will take it back to four,
>> and we have no idea how that fourth case will behave. The other tack we
>> could take (having the one GUC variable control both thresholds) is
>> not good since it will result in pg_dump trashing the buffer cache.
>
> OK, good points.
>
> I'm still concerned that the thresholds gets higher as we increase
> shared_buffers. We may be removing performance features as fast as we
> gain performance when we set shared_buffers higher.
>
> Might we agree that the threshold should be fixed at 8MB, rather than
> varying upwards as we try to tune?

Synchronized scans, and the bulk-read strategy, don't help if the table
fits in cache. If it fits in shared buffers, you're better off keeping
it there, than swap pages between the OS cache and shared buffers, or
spend any effort synchronizing scans. That's why we agreed back then
that the threshold should be X% of shared_buffers.

It's a good point that we don't want pg_dump to screw up the cluster
order, but that's the only use case I've seen this far for disabling
sync scans. Even that wouldn't matter much if our estimate for
"clusteredness" didn't get screwed up by a table that looks like this:
"5 6 7 8 9 1 2 3 4"

Now, maybe there's more use cases where you'd want to tune the
threshold, but I'd like to see some before we add more knobs.

To benefit from a lower threshold, you'd need to have a table large
enough that its cache footprint matters, but is still smaller than 25%
of shared_buffers, and have seq scans on it. In that scenario, you might
benefit from a lower threshold, because that would leave some
shared_buffers free for other use. Even that is quite hand-wavey; the
buffer cache LRU algorithm handles that kind of scenarios reasonably
well already, and whether or not

To benefit from a larger threshold, you'd need to have a table larger
than 25% of shared_buffers, but still smaller than shared_buffers, and
seq scan it often enough that you want to keep it in shared buffers. If
you're frequently seq scanning a table of that size, you're most likely
suffering from a bad plan. Even then, the performance difference
shouldn't be that great, the table surely fits in OS cache anyway, with
typical shared_buffers settings.

Tables that are seq scanned are typically very small, like a summary
table with just a few rows, or huge tables in a data warehousing
system. Between the extremes, I don't think the threshold actually has a
very big impact.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-28 23:41:05
Message-ID: 1201563665.10057.667.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
> It's a good point that we don't want pg_dump to screw up the cluster
> order, but that's the only use case I've seen this far for disabling
> sync scans. Even that wouldn't matter much if our estimate for
> "clusteredness" didn't get screwed up by a table that looks like this:
> "5 6 7 8 9 1 2 3 4"

It doesn't seem like there is any reason for the estimate to get
confused, but it apparently does. I loaded a test table with a similar
distribution to your example, and it shows a correlation of about -0.5,
but it should be as good as something near -1 or +1.

I am not a statistics expert, but it seems like a better measurement
would be: "what is the chance that, if the tuples are close together in
index order, the corresponding heap tuples are close together?".

The answer to that question in your example is "very likely", so there
would be no problem.

Is there a reason we don't do this?

Regards,
Jeff Davis


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-28 23:51:54
Message-ID: 1201564314.4257.815.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
> Tables that are seq scanned are typically very small, like a summary
> table with just a few rows, or huge tables in a data warehousing
> system. Between the extremes, I don't think the threshold actually has
> a very big impact.

And if you have a partitioned table with partitions inconveniently
sized? You'd need to *reduce* shared_buffers specifically to get synch
scans and BAS to kick in. Or increase partition size. Both of which
reduce the impact of the benefits we've added.

I don't think the argument that "a table is smaller than shared buffers
therefore it is already in shared buffers" holds true in all cases. I/O
does matter.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 00:13:09
Message-ID: 479E6F95.4010901@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeff Davis wrote:
> On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
>
>> "clusteredness" didn't get screwed up by a table that looks like this:
>> "5 6 7 8 9 1 2 3 4"
>>
> ...test table with a similar
> distribution to your example, and it shows a correlation of about -0.5,
> but it should be as good as something near -1 or +1.
>
> I am not a statistics expert, but it seems like a better measurement
> would be: "what is the chance that, if the tuples are close together in
> index order, the corresponding heap tuples are close together?".
>
Same applies for data clustered by zip-code.

All rows for any State or City or County or SchoolZone
are "close together" on the same pages; yet postgres's
stats think they're totally unclustered.
> The answer to that question in your example is "very likely", so there
> would be no problem.
> Is there a reason we don't do this?
>
I've been tempted to do things like

update pg_statistic set stanumbers3='{1.0}' where starelid=2617 and
staattnum=7;

after every analyze when I have data like this from tables clustered
by zip. Seems it'd help more plans than it hurts, but haven't been
brave enough to try in production.


From: Kris Jurka <books(at)ejurka(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgreSQL(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-29 05:09:28
Message-ID: Pine.BSO.4.64.0801290006040.1881@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 28 Jan 2008, Jeff Davis wrote:

> I think that pg_dump is a reasonable use case for synchoronized scans
> when the table has not been clustered. It could potentially make pg_dump
> have much less of a performance impact when run against an active
> system.
>

One of the advantages I see with maintaining table dump order is that
rsyncing backups to remote locations will work better.

Kris Jurka


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanningGUCvariable
Date: 2008-01-29 08:20:41
Message-ID: 479EE1D9.9090106@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeff Davis wrote:
> On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
>> It's a good point that we don't want pg_dump to screw up the cluster
>> order, but that's the only use case I've seen this far for disabling
>> sync scans. Even that wouldn't matter much if our estimate for
>> "clusteredness" didn't get screwed up by a table that looks like this:
>> "5 6 7 8 9 1 2 3 4"
>
> It doesn't seem like there is any reason for the estimate to get
> confused, but it apparently does. I loaded a test table with a similar
> distribution to your example, and it shows a correlation of about -0.5,
> but it should be as good as something near -1 or +1.
>
> I am not a statistics expert, but it seems like a better measurement
> would be: "what is the chance that, if the tuples are close together in
> index order, the corresponding heap tuples are close together?".
>
> The answer to that question in your example is "very likely", so there
> would be no problem.
>
> Is there a reason we don't do this?

It has been discussed before, but no-one has come up with a good
measurement for that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 09:40:40
Message-ID: E1539E0ED7043848906A8FF995BDA57902C23E73@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> It's a good point that we don't want pg_dump to screw up the cluster
> order, but that's the only use case I've seen this far for disabling
> sync scans. Even that wouldn't matter much if our estimate for
> "clusteredness" didn't get screwed up by a table that looks
> like this:
> "5 6 7 8 9 1 2 3 4"

I do think the guc to turn it off is useful, only I don't understand the
reasoning that pg_dump needs it to maintain the basic clustered
property.

Sorry, but I don't grok this at all.
Why the heck would we care if we have 2 parts of the table perfectly
clustered,
because we started in the middle ? Surely our stats collector should
recognize
such a table as perfectly clustered. Does it not ? We are talking about
one
breakage in the readahead logic here, this should only bring the
clustered property
from 100% to some 99.99% depending on table size vs readahead window.

Andreas


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 10:55:38
Message-ID: 87r6g0vr2t.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:

> Sorry, but I don't grok this at all. Why the heck would we care if we have 2
> parts of the table perfectly clustered, because we started in the middle ?
> Surely our stats collector should recognize such a table as perfectly
> clustered. Does it not ? We are talking about one breakage in the readahead
> logic here, this should only bring the clustered property from 100% to some
> 99.99% depending on table size vs readahead window.

Well clusteredness is used or could be used for a few different heuristics,
not all of which this would be quite as well satisfied as readahead. But for
the most common application, namely trying to figure out whether index probes
for sequential ids will be sequential i/o or random i/o you're right.

Currently the statistic we use to estimate this is the correlation of the
column value with the physical location on disk. That's not a perfect metric
for estimating how much random i/o would be needed to scan the table in index
order though.

It would be great if Postgres picked up a serious statistics geek who could
pipe up in discussions like this with "how about using the Euler-Jacobian
Centroid" or some such thing. If you have any suggestions of what metric to
use and how to calculate the info we need from it that would be great.

One suggestion from a long way back was scanning the index and counting how
many times the item pointer moves backward to an earlier block. That would
still require a full index scan though. And it doesn't help for columns which
aren't indexed though I'm not sure we need this info for columns which aren't
indexed. It's also not clear how to interpolate from that the amount of random
access a given query would perform.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Kris Jurka" <books(at)ejurka(dot)com>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgreSQL(dot)org>
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-29 11:04:48
Message-ID: 87myqovqnj.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Kris Jurka" <books(at)ejurka(dot)com> writes:

> On Mon, 28 Jan 2008, Jeff Davis wrote:
>
>> I think that pg_dump is a reasonable use case for synchoronized scans
>> when the table has not been clustered. It could potentially make pg_dump
>> have much less of a performance impact when run against an active
>> system.
>
> One of the advantages I see with maintaining table dump order is that rsyncing
> backups to remote locations will work better.

I can't see what scenario you're talking about here. pg_dump your live
database, restore it elsewhere, then shut down the production database and run
rsync from the live database to the restored one? Why not just run rsync for
the initial transfer?

I can't see that working well for a real database and a database loaded from a
pg_dump anyways. Every dead record will introduce skew, plus page headers, and
every record will have a different system data such as xmin for one.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Kris Jurka <books(at)ejurka(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgreSQL(dot)org
Subject: Re: Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-29 11:26:44
Message-ID: 479F0D74.5080409@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark wrote:
> "Kris Jurka" <books(at)ejurka(dot)com> writes:
>
>> On Mon, 28 Jan 2008, Jeff Davis wrote:
>>
>>> I think that pg_dump is a reasonable use case for synchoronized scans
>>> when the table has not been clustered. It could potentially make pg_dump
>>> have much less of a performance impact when run against an active
>>> system.
>> One of the advantages I see with maintaining table dump order is that rsyncing
>> backups to remote locations will work better.
>
> I can't see what scenario you're talking about here. pg_dump your live
> database, restore it elsewhere, then shut down the production database and run
> rsync from the live database to the restored one? Why not just run rsync for
> the initial transfer?

take a dump (maybe in plaintext format) save it to disk and use rsync to
copy it elsewhere. the more "similiar" the dumps the more efficient
rsync can copy the data over.

Stefan


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 13:31:01
Message-ID: 479F2A95.8040700@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:

> And if you have a partitioned table with partitions inconveniently
> sized? You'd need to *reduce* shared_buffers specifically to get synch
> scans and BAS to kick in. Or increase partition size. Both of which
> reduce the impact of the benefits we've added.
>
> I don't think the argument that "a table is smaller than shared buffers
> therefore it is already in shared buffers" holds true in all cases. I/O
> does matter.
>
+1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release
cycle we do test the cases Simon described above and we agree we need to
do a fine tune to benefit from this feature, we will need to deprecate
'enable_sync_seqscans' and invent another one (sync_seqscans_threshold).
Looking at this perpective, IMHO we should go with the number (0.25)
instead of the boolean.

--
Euler Taveira de Oliveira
http://www.timbira.com/


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 13:40:38
Message-ID: 20080129134038.GI4201@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Jan 29, 2008 at 10:40:40AM +0100, Zeugswetter Andreas ADI SD wrote:
>
> > It's a good point that we don't want pg_dump to screw up the cluster
> > order, but that's the only use case I've seen this far for disabling
> > sync scans. Even that wouldn't matter much if our estimate for
> > "clusteredness" didn't get screwed up by a table that looks
> > like this:
> > "5 6 7 8 9 1 2 3 4"
>
> I do think the guc to turn it off is useful, only I don't understand the
> reasoning that pg_dump needs it to maintain the basic clustered
> property.
>
> Sorry, but I don't grok this at all.
> Why the heck would we care if we have 2 parts of the table perfectly
> clustered,
> because we started in the middle ? Surely our stats collector should
> recognize
> such a table as perfectly clustered. Does it not ? We are talking about
> one
> breakage in the readahead logic here, this should only bring the
> clustered property
> from 100% to some 99.99% depending on table size vs readahead window.
>
> Andreas
>

Andreas,

I agree with your logic. If the process that PostgreSQL uses to determine
how clustered a table is that breaks with such a layout, we may need to
see what should be changed to make it work. Having had pg_dump cause a
database to grind to a halt, I would definitely like the option of using
the synchronized scans even for clustered tables.

Ken


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 15:10:22
Message-ID: 20985.1201619422@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release
> cycle we do test the cases Simon described above and we agree we need to
> do a fine tune to benefit from this feature, we will need to deprecate
> 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold).
> Looking at this perpective, IMHO we should go with the number (0.25)
> instead of the boolean.

Surely the risk-of-needing-to-deprecate argument applies ten times more
strongly to a number than a boolean.

regards, tom lane


From: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Euler Taveira de Oliveira" <euler(at)timbira(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 15:37:34
Message-ID: E1539E0ED7043848906A8FF995BDA57902C23F2F@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> > +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future
release
> > cycle we do test the cases Simon described above and we agree we
need to
> > do a fine tune to benefit from this feature, we will need to
deprecate
> > 'enable_sync_seqscans' and invent another one
(sync_seqscans_threshold).
> > Looking at this perpective, IMHO we should go with the number (0.25)

> > instead of the boolean.
>
> Surely the risk-of-needing-to-deprecate argument applies ten times
more
> strongly to a number than a boolean.

Yes, I would expect the tuning to be more system than user specific.
So imho a boolean userset would couple well with a tuning guc, that
may usefully not be userset (if we later discover a need for tuning at
all).

so +1 for the bool.

synchronize[d]_seqscan sounds a bit better in my ears than the plural
synchronize_seqscans.
To me the latter somehow suggests influece on the whole cluster,
probably not
worth further discussion though, so if someone says no, ok.

Andreas


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 17:48:15
Message-ID: 1201628895.10057.677.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 2008-01-29 at 10:55 +0000, Gregory Stark wrote:
> "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:
>
> > Sorry, but I don't grok this at all. Why the heck would we care if we have 2
> > parts of the table perfectly clustered, because we started in the middle ?
> > Surely our stats collector should recognize such a table as perfectly
> > clustered. Does it not ? We are talking about one breakage in the readahead
> > logic here, this should only bring the clustered property from 100% to some
> > 99.99% depending on table size vs readahead window.
>
> Well clusteredness is used or could be used for a few different heuristics,
> not all of which this would be quite as well satisfied as readahead. But for

Can you give an example? Treating a file as a circular structure does
not impose any significant cost that I can see.

> It would be great if Postgres picked up a serious statistics geek who could
> pipe up in discussions like this with "how about using the Euler-Jacobian
> Centroid" or some such thing. If you have any suggestions of what metric to
> use and how to calculate the info we need from it that would be great.

Agreed.

> One suggestion from a long way back was scanning the index and counting how
> many times the item pointer moves backward to an earlier block. That would

An interesting metric. As you say, we really need a statistician to
definitively say what the correct metrics are, and what kind of sampling
we need to make good estimates.

> still require a full index scan though. And it doesn't help for columns which
> aren't indexed though I'm not sure we need this info for columns which aren't
> indexed. It's also not clear how to interpolate from that the amount of random
> access a given query would perform.

I don't think "clusteredness" has any meaning at all in postgres for an
unindexed column. I suppose a table could be clustered without an index,
but currently there's no way to do that in postgresql.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: "Euler Taveira de Oliveira" <euler(at)timbira(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 19:09:13
Message-ID: 24107.1201633753@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:
> synchronize[d]_seqscan sounds a bit better in my ears than the plural
> synchronize_seqscans.

The plural seems better to me; there's no such thing as a solitary
synchronized scan, no? The whole point of the feature is to affect
the behavior of multiple scans.

BTW, so far as the rest of the thread goes, I'm not necessarily opposed
to exposing the switchover threshold as a tunable. But I think it needs
more thought to design than we can give it in time for 8.3 (because of
the interaction with the buffer access strategy stuff). Also I don't
like having pg_dump manipulating a tuning parameter. I don't see
anything wrong with having both an on/off feature switch and a tunable
in future releases. The feature switch can be justified on grounds
of backwards compatibility quite independently of whether pg_dump uses
it. Or is someone prepared to argue that there are no applications out
there that will be broken if the same query, against the same unchanging
table, yields different results from one trial to the next?

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Neil Conway" <neilc(at)samurai(dot)com>, "Euler Taveira de Oliveira" <euler(at)timbira(dot)com>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 20:35:36
Message-ID: 479F39B7.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

>>> On Tue, Jan 29, 2008 at 1:09 PM, in message <24107(dot)1201633753(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Or is someone prepared to argue that there are no applications out
> there that will be broken if the same query, against the same unchanging
> table, yields different results from one trial to the next?

If geqo kicks in, we're already there, aren't we?

Isn't an application which counts on the order of result rows
without specifying ORDER BY fundamentally broken?

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Neil Conway" <neilc(at)samurai(dot)com>, "Euler Taveira de Oliveira" <euler(at)timbira(dot)com>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 21:00:49
Message-ID: 25718.1201640449@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Or is someone prepared to argue that there are no applications out
>> there that will be broken if the same query, against the same unchanging
>> table, yields different results from one trial to the next?

> If geqo kicks in, we're already there, aren't we?

Yup, and that's one of the reasons we have a way to turn geqo off.
(geqo is actually a good precedent for this --- notice that it has
an on/off switch that's separate from its tuning knobs.)

> Isn't an application which counts on the order of result rows
> without specifying ORDER BY fundamentally broken?

No doubt, but if it's always worked before, people are going to be
unhappy anyway.

Also, it's not just ordering that's at stake. Try

regression=# create table foo as select x from generate_series(1,1000000) x;
SELECT
regression=# select * from foo limit 10000;
x
-------
1
2
3
4
....
regression=# select * from foo limit 10000;
x
-------
7233
7234
7235
7236
....
regression=# select * from foo limit 10000;
x
-------
14465
14466
14467
14468
....

Now admittedly we've never promised LIMIT without ORDER BY to be
well-defined either, but not everybody reads the fine print.
This case is particularly nasty because at smaller LIMIT values
the result *is* consistent, so you might never notice the problem
while testing.

regards, tom lane


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 21:59:05
Message-ID: 479FA1A9.1060200@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>>> Or is someone prepared to argue that there are no applications out
>>> there that will be broken if the same query, against the same unchanging
>>> table, yields different results from one trial to the next?
>>>
Won't even autovacuum "analyze" cause this too if the
new stats changes the plan?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Neil Conway <neilc(at)samurai(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-29 22:03:41
Message-ID: 26480.1201644221@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> Tom Lane wrote:
> Or is someone prepared to argue that there are no applications out
> there that will be broken if the same query, against the same unchanging
> table, yields different results from one trial to the next?
>
> Won't even autovacuum "analyze" cause this too if the
> new stats changes the plan?

Given that the table is unchanging, that's moderately unlikely to happen
(especially for "select * from foo" ;-))

regards, tom lane


From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanningGUCvariable
Date: 2008-01-29 22:33:39
Message-ID: F0238EBA67824444BC1CB4700960CB48048E0F48@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeff Davis wrote
> > Well clusteredness is used or could be used for a few
> different heuristics,
> > not all of which this would be quite as well satisfied as
> readahead. But for
>
> Can you give an example? Treating a file as a circular structure does
> not impose any significant cost that I can see.

(Pure speculation follows... if you prefer facts, skip this noise)

The data used to create pg_stats.correlation is involved in estimating the cost of an index scan.
It could also be used in estimating the cost of a sequential scan, if the query includes a limit.

Consider:
select * from huge_table_clustered_by_A where A<most_As limit 1000

If the correlation for A is close to 1, a sequential scan should be cheaper than an index scan.

(If the query also included an order by clause, the sequential scan would have to read the entire table to ensure it had found the top 1000, instead of any old 1000 returned in order)

If A is a circular structure, you would have to know where it started, and include this info in the dump/restore (or lose A's correlation).

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, "Euler Taveira de Oliveira" <euler(at)timbira(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 02:07:10
Message-ID: 1d4e0c10801291807t4bdc3634v3c360b76211344ca@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Jan 29, 2008 8:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:
> > synchronize[d]_seqscan sounds a bit better in my ears than the plural
> > synchronize_seqscans.
>
> The plural seems better to me; there's no such thing as a solitary
> synchronized scan, no? The whole point of the feature is to affect
> the behavior of multiple scans.

+1. The plural is important IMHO.

> BTW, so far as the rest of the thread goes, I'm not necessarily opposed
> to exposing the switchover threshold as a tunable. But I think it needs
> more thought to design than we can give it in time for 8.3 (because of
> the interaction with the buffer access strategy stuff).

+1. The current patch is simple and so far in the cycle, I really
think we should keep it that way.

> The feature switch can be justified on grounds
> of backwards compatibility quite independently of whether pg_dump uses
> it. Or is someone prepared to argue that there are no applications out
> there that will be broken if the same query, against the same unchanging
> table, yields different results from one trial to the next?

As I stated earlier, I don't really like this argument (we already
broke badly designed applications a few times in the past) but we
really need a way to guarantee that the execution of a query is stable
and doesn't depend on external factors. And the original problem was
to guarantee that pg_dump builds a dump as identical as possible to
the existing data by ignoring external factors. It's now the case with
your patch.
The fact that it allows us not to break existing applications relying
too much on physical ordering is a nice side effect though :).

--
Guillaume


From: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Euler Taveira de Oliveira" <euler(at)timbira(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 09:56:47
Message-ID: E1539E0ED7043848906A8FF995BDA57902C23FC1@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> > The plural seems better to me; there's no such thing as a solitary
> > synchronized scan, no? The whole point of the feature is to affect
> > the behavior of multiple scans.
>
> +1. The plural is important IMHO.

ok, good.

> As I stated earlier, I don't really like this argument (we already
> broke badly designed applications a few times in the past) but we
> really need a way to guarantee that the execution of a query is stable
> and doesn't depend on external factors. And the original problem was
> to guarantee that pg_dump builds a dump as identical as possible to
> the existing data by ignoring external factors. It's now the case with
> your patch.
> The fact that it allows us not to break existing applications relying
> too much on physical ordering is a nice side effect though :).

One more question. It would be possible that a session that turned off
the synchronized_seqscans still be a pack leader for other later
sessions.
Do/should we consider that ?

The procedure would be:
start from page 0
iff no other pack is present fill the current scan position for others

Andreas


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 13:25:18
Message-ID: 20080130132518.GS4201@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, Jan 30, 2008 at 10:56:47AM +0100, Zeugswetter Andreas ADI SD wrote:
>
> > > The plural seems better to me; there's no such thing as a solitary
> > > synchronized scan, no? The whole point of the feature is to affect
> > > the behavior of multiple scans.
> >
> > +1. The plural is important IMHO.
>
> ok, good.
>
> > As I stated earlier, I don't really like this argument (we already
> > broke badly designed applications a few times in the past) but we
> > really need a way to guarantee that the execution of a query is stable
> > and doesn't depend on external factors. And the original problem was
> > to guarantee that pg_dump builds a dump as identical as possible to
> > the existing data by ignoring external factors. It's now the case with
> > your patch.
> > The fact that it allows us not to break existing applications relying
> > too much on physical ordering is a nice side effect though :).
>
> One more question. It would be possible that a session that turned off
> the synchronized_seqscans still be a pack leader for other later
> sessions.
> Do/should we consider that ?
>
> The procedure would be:
> start from page 0
> iff no other pack is present fill the current scan position for others
>

I think that allowing other scans to use the scan started by a query that
disabled the sync scans would have value. It would prevent these types
of queries from completely tanking the I/O.

+1

Ken


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 16:00:37
Message-ID: 20080130160037.GF4536@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Zeugswetter Andreas ADI SD escribió:
>
> > > The plural seems better to me; there's no such thing as a solitary
> > > synchronized scan, no? The whole point of the feature is to affect
> > > the behavior of multiple scans.
> >
> > +1. The plural is important IMHO.
>
> ok, good.

Hmm, if you guys are going to add another GUC variable, please hurry
because we have to translate the description text.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-30 16:36:33
Message-ID: 1201710993.4453.92.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Rather than having a boolean GUC, we should have a number and make the
> > parameter "synchronised_scan_threshold".
>
> This would open up a can of worms I'd prefer not to touch, having to do
> with whether the buffer-access-strategy behavior should track that or
> not. As the note in heapam.c says,
>
> * If the table is large relative to NBuffers, use a bulk-read access
> * strategy and enable synchronized scanning (see syncscan.c). Although
> * the thresholds for these features could be different, we make them the
> * same so that there are only two behaviors to tune rather than four.
>
> It's a bit late in the cycle to be revisiting that choice. Now we do
> already have three behaviors to worry about (BAS on and syncscan off)
> but throwing in a randomly settable knob will take it back to four,
> and we have no idea how that fourth case will behave. The other tack we
> could take (having the one GUC variable control both thresholds) is
> not good since it will result in pg_dump trashing the buffer cache.

I'm still not very happy with any of the options here.

BAS is great if you didn't want to trash the cache, but its also
annoying to people that really did want to load a large table into
cache. However we set it, we're going to have problems because not
everybody has the same database.

We're trying to guess which data is in memory and which is on disk and
then act accordinly. The answer to that question cannot be answered
solely by how big shared_buffers is. It really ought to be a combination
of (at least) shared_buffers and total database size. I think we must
either put some more intelligence into the setting of the threshold, or
give it to the user as a parameter, possibly as a parameter not
mentioned in the sample .conf.

If we set the threshold unintelligently or in a way that cannot be
overridden, we will still get weird bug reports from people who set
shared_buffers higher and got a performance drop.

We need to make a final decision on this quickly, so I'll say no more on
this for 8.3 to help that process.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 17:19:22
Message-ID: 14021.1201713562@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Hmm, if you guys are going to add another GUC variable, please hurry
> because we have to translate the description text.

Yeah, I'm going to put it in today --- just the on/off switch.
Any discussions of exposing threshold parameters will have to wait
for 8.4.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-30 17:22:28
Message-ID: 14071.1201713748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> I'm still not very happy with any of the options here.

> BAS is great if you didn't want to trash the cache, but its also
> annoying to people that really did want to load a large table into
> cache. However we set it, we're going to have problems because not
> everybody has the same database.

That argument leads immediately to the conclusion that you need
per-table control over the behavior. Which maybe you do, but it's
far too late to be proposing it for 8.3. We should put this whole
area of more-control-over-BAS-and-syncscan on the TODO agenda.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Euler Taveira de Oliveira" <euler(at)timbira(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 18:07:50
Message-ID: 14808.1201716470@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:
> One more question. It would be possible that a session that turned off
> the synchronized_seqscans still be a pack leader for other later
> sessions.
> Do/should we consider that ?

Seems like a reasonable thing to consider ... for 8.4. I'm not willing
to go poking the syncscan code that much at this late point in the 8.3
cycle. (I'm not sure if it's been mentioned yet on -hackers, but the
current plan is to freeze 8.3.0 tomorrow evening.)

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zeugswetter Andreas ADI SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 18:15:32
Message-ID: 1201716932.4453.138.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 2008-01-30 at 13:07 -0500, Tom Lane wrote:
> "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:
> > One more question. It would be possible that a session that turned off
> > the synchronized_seqscans still be a pack leader for other later
> > sessions.
> > Do/should we consider that ?
>
> Seems like a reasonable thing to consider ... for 8.4.

Definitely. I thought about this the other day and decided it had some
strange behaviour in some circumstances, so wouldn't be desirable
overall.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-30 18:42:04
Message-ID: 47A0C4FC.5010806@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> I'm still not very happy with any of the options here.
>
>> BAS is great if you didn't want to trash the cache, but its also
>> annoying to people that really did want to load a large table into
>> cache. However we set it, we're going to have problems because not
>> everybody has the same database.
>
> That argument leads immediately to the conclusion that you need
> per-table control over the behavior.

It's even worse than that. Elsewhere in this thread Simon mentioned a
partitioned table, where each partition on its own is smaller than the
threshold, but you're seq scanning several partitions and the total size
of the seq scans is larger than memory size. In that scenario, you would
want BAS and synchronized scans, but even a per-table setting wouldn't
cut it.

One idea would be to look at the access plan and estimate the total size
of all scans in the plan. Another idea would be to switch to a more seq
scan resistant cache replacement algorithm, and forget about the threshold.

For synchronized scans to help in the partitioned situation, I guess
you'd want to synchronize across partitions. If someone is already
scanning partition 5, you'd want to start from that partition and join
the pack, instead of starting from partition 1.

I think we'll be wiser after we see some real world use of what we have
there now..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-30 20:25:02
Message-ID: 1201724702.4453.159.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 2008-01-30 at 18:42 +0000, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> >> I'm still not very happy with any of the options here.
> >
> >> BAS is great if you didn't want to trash the cache, but its also
> >> annoying to people that really did want to load a large table into
> >> cache. However we set it, we're going to have problems because not
> >> everybody has the same database.
> >
> > That argument leads immediately to the conclusion that you need
> > per-table control over the behavior.
>
> It's even worse than that. Elsewhere in this thread Simon mentioned a
> partitioned table, where each partition on its own is smaller than the
> threshold, but you're seq scanning several partitions and the total size
> of the seq scans is larger than memory size. In that scenario, you would
> want BAS and synchronized scans, but even a per-table setting wouldn't
> cut it.

> For synchronized scans to help in the partitioned situation, I guess
> you'd want to synchronize across partitions. If someone is already
> scanning partition 5, you'd want to start from that partition and join
> the pack, instead of starting from partition 1.

You're right, but in practice its not quite that bad with the
multi-table route. When you have partitions you generally exclude most
of them, with typically 1-2 per query, usually different ones.

If you were scanning lots of partitions in sequence so frequently that
you'd get benefit from synch scans then your partitioning scheme isn't
working for you - and that is the worst problem by far.

But yes, it does need to be addressed.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-30 20:55:22
Message-ID: 47A0E43A.7050106@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> On Wed, 2008-01-30 at 18:42 +0000, Heikki Linnakangas wrote:
>> It's even worse than that. Elsewhere in this thread Simon mentioned a
>> partitioned table, where each partition on its own is smaller than the
>> threshold, but you're seq scanning several partitions and the total size
>> of the seq scans is larger than memory size. In that scenario, you would
>> want BAS and synchronized scans, but even a per-table setting wouldn't
>> cut it.
>
>> For synchronized scans to help in the partitioned situation, I guess
>> you'd want to synchronize across partitions. If someone is already
>> scanning partition 5, you'd want to start from that partition and join
>> the pack, instead of starting from partition 1.
>
> You're right, but in practice its not quite that bad with the
> multi-table route. When you have partitions you generally exclude most
> of them, with typically 1-2 per query, usually different ones.

Yep. And in that case, you *don't'* want BAS or sync scans to kick in,
because you're only accessing a relatively small chunk of data, and it's
worthwhile to cache it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 01:33:33
Message-ID: 200801310133.m0V1XXk20450@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I'm still not very happy with any of the options here.
>
> > BAS is great if you didn't want to trash the cache, but its also
> > annoying to people that really did want to load a large table into
> > cache. However we set it, we're going to have problems because not
> > everybody has the same database.
>
> That argument leads immediately to the conclusion that you need
> per-table control over the behavior. Which maybe you do, but it's
> far too late to be proposing it for 8.3. We should put this whole
> area of more-control-over-BAS-and-syncscan on the TODO agenda.

Another question --- why don't we just turn off synchronized_seqscans
when we do COPY TO? That would fix pg_dump and be transparent.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 01:38:31
Message-ID: 200801310138.m0V1cVW21111@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > I'm still not very happy with any of the options here.
> >
> > > BAS is great if you didn't want to trash the cache, but its also
> > > annoying to people that really did want to load a large table into
> > > cache. However we set it, we're going to have problems because not
> > > everybody has the same database.
> >
> > That argument leads immediately to the conclusion that you need
> > per-table control over the behavior. Which maybe you do, but it's
> > far too late to be proposing it for 8.3. We should put this whole
> > area of more-control-over-BAS-and-syncscan on the TODO agenda.
>
> Another question --- why don't we just turn off synchronized_seqscans
> when we do COPY TO? That would fix pg_dump and be transparent.

Sorry, I was unclear. I meant don't have a GUC at all but just set an
internal variable to turn off synchronized sequential scans when we do
COPY TO.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 01:41:20
Message-ID: 28144.1201743680@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Another question --- why don't we just turn off synchronized_seqscans
> when we do COPY TO? That would fix pg_dump and be transparent.

Enforcing this from the server side seems a pretty bad idea. Note that
there were squawks about having pg_dump behave this way at all; if the
control is on the pg_dump side then at least we have the chance to make
it a user option later.

Also, you forgot about pg_dump -d.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 02:15:35
Message-ID: 200801310215.m0V2FZG15153@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Another question --- why don't we just turn off synchronized_seqscans
> > when we do COPY TO? That would fix pg_dump and be transparent.
>
> Enforcing this from the server side seems a pretty bad idea. Note that
> there were squawks about having pg_dump behave this way at all; if the
> control is on the pg_dump side then at least we have the chance to make
> it a user option later.
>
> Also, you forgot about pg_dump -d.

OK, but keep in mind if we use synchronized_seqscans in pg_dump we will
have to recognize that GUC forever.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 05:42:07
Message-ID: 415.1201758127@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> OK, but keep in mind if we use synchronized_seqscans in pg_dump we will
> have to recognize that GUC forever.

No, because it's being used on the query side, not in the emitted dump.
We have *never* promised that pg_dump version N could dump from server
version N+1 .., in fact, personally I'd like to make that case be a hard
error, rather than something people could override with -i.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 13:42:12
Message-ID: 200801311342.m0VDgCN16707@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > OK, but keep in mind if we use synchronized_seqscans in pg_dump we will
> > have to recognize that GUC forever.
>
> No, because it's being used on the query side, not in the emitted dump.
> We have *never* promised that pg_dump version N could dump from server
> version N+1 .., in fact, personally I'd like to make that case be a hard
> error, rather than something people could override with -i.

Oh, yea, interesting, it is part of the connection. That will help.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 14:20:54
Message-ID: 20080131142054.GB7471@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Tom Lane wrote:

> > in fact, personally I'd like to make that case be a hard error,
> > rather than something people could override with -i.

+1 to this idea. TODO for 8.4?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 14:33:53
Message-ID: 1201790033.4453.291.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 2008-01-31 at 11:20 -0300, Alvaro Herrera wrote:
> > Tom Lane wrote:
>
> > > in fact, personally I'd like to make that case be a hard error,
> > > rather than something people could override with -i.
>
> +1 to this idea. TODO for 8.4?

-1 without some more planning about the effects and implications.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-01-31 14:47:26
Message-ID: 20080131144726.GB8602@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs escribió:
> On Thu, 2008-01-31 at 11:20 -0300, Alvaro Herrera wrote:
> > > Tom Lane wrote:
> >
> > > > in fact, personally I'd like to make that case be a hard error,
> > > > rather than something people could override with -i.
> >
> > +1 to this idea. TODO for 8.4?
>
> -1 without some more planning about the effects and implications.

Effect: we would stop receiving complaints that an old pg_dump can talk
to a server that most likely is incompatible with it. People would
learn on the spot that they must install the newer pg_dump.

Implication: you cannot dump a newer database and expect it to load on
an older server (it would work for certain versions, but not all). So
if people have a bleeding-edge test server, they cannot migrate stuff
from it to their older production server.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 15:05:16
Message-ID: 200801311605.18796.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

Le jeudi 31 janvier 2008, Tom Lane a écrit :
> We have *never* promised that pg_dump version N could dump from server
> version N+1 .., in fact, personally I'd like to make that case be a hard
> error, rather than something people could override with -i.

Are you thinking about next major or minor version ? All the same?
Is there some real good reason not to dump say 8.2.6 server with the pg_dump
from an 8.2.5 installation?

--
dim


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 15:13:50
Message-ID: 200801311513.m0VFDoR24185@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Dimitri Fontaine wrote:
-- Start of PGP signed section.
> Hi,
>
> Le jeudi 31 janvier 2008, Tom Lane a ?crit?:
> > We have *never* promised that pg_dump version N could dump from server
> > version N+1 .., in fact, personally I'd like to make that case be a hard
> > error, rather than something people could override with -i.
>
> Are you thinking about next major or minor version ? All the same?
> Is there some real good reason not to dump say 8.2.6 server with the pg_dump
> from an 8.2.5 installation?

They are talking about cross-major dumps, 8.2 pg_dump dumping an 8.3
database. They are saying that should be disallowed.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-01-31 15:13:55
Message-ID: 200801311613.56828.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> Effect: we would stop receiving complaints that an old pg_dump can talk
> to a server that most likely is incompatible with it. People would
> learn on the spot that they must install the newer pg_dump.

I think a more moderate measure might be to clarify the error message

"aborting because of version mismatch (Use the -i option to proceed
anyway.)\n"

I'm not sure how many of the mentioned complaints we are getting, but the
error message might make people think that it's complaining because the
versions are not equal rather than that the versions are known not
compatible.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-01-31 15:14:58
Message-ID: 200801311514.m0VFEwY24520@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> > Effect: we would stop receiving complaints that an old pg_dump can talk
> > to a server that most likely is incompatible with it. People would
> > learn on the spot that they must install the newer pg_dump.
>
> I think a more moderate measure might be to clarify the error message
>
> "aborting because of version mismatch (Use the -i option to proceed
> anyway.)\n"
>
> I'm not sure how many of the mentioned complaints we are getting, but the
> error message might make people think that it's complaining because the
> versions are not equal rather than that the versions are known not
> compatible.

Agreed --- that error message is just prompting people to try something
they shouldn't and not explaining why.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 15:25:54
Message-ID: 7952.1201793154@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Dimitri Fontaine <dfontaine(at)hi-media(dot)com> writes:
> Le jeudi 31 janvier 2008, Tom Lane a crit:
>> We have *never* promised that pg_dump version N could dump from server
>> version N+1 .., in fact, personally I'd like to make that case be a hard
>> error, rather than something people could override with -i.

> Are you thinking about next major or minor version ? All the same?
> Is there some real good reason not to dump say 8.2.6 server with the pg_dump
> from an 8.2.5 installation?

I'm thinking next major. In principle there could be cases where a
minor update could break pg_dump, but it seems unlikely enough that
it's not reasonable to embed such a policy in the code. As for
next major, though, the mere existence of the -i switch is a foot-gun
with no significant value.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-01-31 15:28:59
Message-ID: 8023.1201793339@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
>> Effect: we would stop receiving complaints that an old pg_dump can talk
>> to a server that most likely is incompatible with it. People would
>> learn on the spot that they must install the newer pg_dump.

> I think a more moderate measure might be to clarify the error message
> "aborting because of version mismatch (Use the -i option to proceed
> anyway.)\n"

I would be satisfied with that if I thought people would actually read
the message. My complaint is really directed at certain admin packages
(and they know who they are) that invoke pg_dump *by default*, behind
the user's back, with -i.

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 16:00:26
Message-ID: 200801311700.29497.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Le jeudi 31 janvier 2008, Tom Lane a écrit :
> I'm thinking next major. In principle there could be cases where a
> minor update could break pg_dump, but it seems unlikely enough that
> it's not reasonable to embed such a policy in the code. As for
> next major, though, the mere existence of the -i switch is a foot-gun
> with no significant value.

+2 then :)

1. Current behavior is to issue the « -i warning » even when having minor
version mismatch, getting rid of this would be great... even more so now
we know there's almost no risk here.

2. Major version mismatch seems to be one of the major migration pitfalls out
there. The famous "You have to dump with the newer pg_dump version against
the current production setup" cookbook entry will certainly be better
embedded into the code.

Regards,
--
dim


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-01-31 16:02:03
Message-ID: 200801311602.m0VG23C28750@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> >> Effect: we would stop receiving complaints that an old pg_dump can talk
> >> to a server that most likely is incompatible with it. People would
> >> learn on the spot that they must install the newer pg_dump.
>
> > I think a more moderate measure might be to clarify the error message
> > "aborting because of version mismatch (Use the -i option to proceed
> > anyway.)\n"
>
> I would be satisfied with that if I thought people would actually read
> the message. My complaint is really directed at certain admin packages
> (and they know who they are) that invoke pg_dump *by default*, behind
> the user's back, with -i.

Oh? That isn't good.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Date: 2008-01-31 16:08:38
Message-ID: 87ve5a5661.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> Dimitri Fontaine wrote:
> -- Start of PGP signed section.
>> Hi,
>>
>> Le jeudi 31 janvier 2008, Tom Lane a ?crit?:
>> > We have *never* promised that pg_dump version N could dump from server
>> > version N+1 .., in fact, personally I'd like to make that case be a hard
>> > error, rather than something people could override with -i.
>>
>> Are you thinking about next major or minor version ? All the same?
>> Is there some real good reason not to dump say 8.2.6 server with the pg_dump
>> from an 8.2.5 installation?
>
> They are talking about cross-major dumps, 8.2 pg_dump dumping an 8.3
> database. They are saying that should be disallowed.

And just to be clearly *only* cross-major dumps with an *older* pg_dump than
the database. Dumping with a newer pg_dump than the database is the
recommended way to do cross-major dumps.

Perhaps we can have something like --force-unsupported-incompatible-connections

1/2 :)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-02-05 15:27:36
Message-ID: 20080205152736.GA24114@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Jan 31, 2008 at 11:02:03AM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > > Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> > >> Effect: we would stop receiving complaints that an old pg_dump can talk
> > >> to a server that most likely is incompatible with it. People would
> > >> learn on the spot that they must install the newer pg_dump.
> >
> > > I think a more moderate measure might be to clarify the error message
> > > "aborting because of version mismatch (Use the -i option to proceed
> > > anyway.)\n"
> >
> > I would be satisfied with that if I thought people would actually read
> > the message. My complaint is really directed at certain admin packages
> > (and they know who they are) that invoke pg_dump *by default*, behind
> > the user's back, with -i.
>
> Oh? That isn't good.

Right. Dave - why do we do that? ;-)

//Magnus


From: "Dave Page" <dpage(at)postgresql(dot)org>
To: "Magnus Hagander" <magnus(at)hagander(dot)net>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-02-05 16:18:29
Message-ID: 937d27e10802050818n6cc3a017qec047200f3a994ed@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Feb 5, 2008 3:27 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Thu, Jan 31, 2008 at 11:02:03AM -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > I would be satisfied with that if I thought people would actually read
> > > the message. My complaint is really directed at certain admin packages
> > > (and they know who they are) that invoke pg_dump *by default*, behind
> > > the user's back, with -i.
> >
> > Oh? That isn't good.
>
> Right. Dave - why do we do that? ;-)

I didn't realise we did until Tom mentioned it - I didn't write that code.

Please go ahead and remove the -i - it's not like users cannot cannot
specify which set of pg utilities to use if they need a specific
version.

/D


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)postgresql(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-02-05 18:11:41
Message-ID: 47A8A6DD.2000208@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Dave Page wrote:
> On Feb 5, 2008 3:27 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Thu, Jan 31, 2008 at 11:02:03AM -0500, Bruce Momjian wrote:
>>> Tom Lane wrote:
>>>> I would be satisfied with that if I thought people would actually read
>>>> the message. My complaint is really directed at certain admin packages
>>>> (and they know who they are) that invoke pg_dump *by default*, behind
>>>> the user's back, with -i.
>>> Oh? That isn't good.
>> Right. Dave - why do we do that? ;-)
>
> I didn't realise we did until Tom mentioned it - I didn't write that code.
>
> Please go ahead and remove the -i - it's not like users cannot cannot
> specify which set of pg utilities to use if they need a specific
> version.

Ok, done!

//Magnus


From: "Dave Page" <dpage(at)postgresql(dot)org>
To: "Magnus Hagander" <magnus(at)hagander(dot)net>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)
Date: 2008-02-05 19:45:21
Message-ID: 937d27e10802051145t569e3640xba6989922ba33009@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Feb 5, 2008 6:11 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> > Please go ahead and remove the -i - it's not like users cannot cannot
> > specify which set of pg utilities to use if they need a specific
> > version.
>
> Ok, done!

Thanks.

/D


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: pg_dump -i wording
Date: 2008-03-23 01:10:22
Message-ID: 200803230110.m2N1AM306382@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> On Thu, 2008-01-31 at 11:20 -0300, Alvaro Herrera wrote:
> > > Tom Lane wrote:
> >
> > > > in fact, personally I'd like to make that case be a hard error,
> > > > rather than something people could override with -i.
> >
> > +1 to this idea. TODO for 8.4?
>
> -1 without some more planning about the effects and implications.

I have developed the attached patch with improves wording for the
pg_dump -i (ignore version) option.

The previous wording just said "proceed" which was very neutral. The
new wording says "skip", which is more active and more likely for people
to understand it isn't something to do lightly.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/pgpatches/pg_dump text/x-diff 6.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pg_dump -i wording
Date: 2008-03-23 01:21:51
Message-ID: 25613.1206235311@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I have developed the attached patch with improves wording for the
> pg_dump -i (ignore version) option.

I think this is going in exactly the wrong direction --- it makes
both the documentation and the warning message less scary not more
so.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pg_dump -i wording
Date: 2008-03-25 18:04:37
Message-ID: 200803251804.m2PI4bA07117@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > I have developed the attached patch with improves wording for the
> > pg_dump -i (ignore version) option.
>
> I think this is going in exactly the wrong direction --- it makes
> both the documentation and the warning message less scary not more
> so.

OK, updated pg_dump -i wording, more scary.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/pgpatches/pg_dump text/x-diff 7.8 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pg_dump -i wording
Date: 2008-03-26 14:33:45
Message-ID: 200803261433.m2QEXjr19797@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > I have developed the attached patch with improves wording for the
> > > pg_dump -i (ignore version) option.
> >
> > I think this is going in exactly the wrong direction --- it makes
> > both the documentation and the warning message less scary not more
> > so.
>
> OK, updated pg_dump -i wording, more scary.

Updated patch applied, with improved wording when the -i option is
specified.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 7.8 KB