Re: Recreate primary key without dropping foreign keys?

Lists: pgsql-adminpgsql-general
From: Chris Ernst <cernst(at)zvelo(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Recreate primary key without dropping foreign keys?
Date: 2012-04-16 00:41:05
Message-ID: 4F8B6AA1.1040709@zvelo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

Hi all,

In PostgreSQL 9.1.3, I have a few fairly large tables with bloated
primary key indexes. I'm trying to replace them using newly created
unique indexes as outlined in the docs. Something like:

CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON distributors (dist_id);
ALTER TABLE distributors DROP CONSTRAINT distributors_pkey,
ADD CONSTRAINT distributors_pkey PRIMARY KEY USING INDEX
dist_id_temp_idx;

However, the initial drop of the primary key constraint fails because
there are a whole bunch of foreign keys depending on it.

I've done some searching and haven't found a workable solution. Is
there any way to swap in the new index for the primary key constraint
without dropping all dependent foreign keys? Or am I pretty much stuck
with dropping and recreating all of the foreign keys?

Thanks in advance.

Chris Ernst
Data Operations Engineer
Zvelo, Inc.
http://zvelo.com/


From: Frank Lanitz <frank(at)frank(dot)uvena(dot)de>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-16 04:57:17
Message-ID: 20120416065717.70a31a6361ac5d7bdad416e8@frank.uvena.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On Sun, 15 Apr 2012 18:41:05 -0600
Chris Ernst <cernst(at)zvelo(dot)com> wrote:

> Hi all,
>
> In PostgreSQL 9.1.3, I have a few fairly large tables with bloated
> primary key indexes. I'm trying to replace them using newly created
> unique indexes as outlined in the docs. Something like:
>
> CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON distributors
> (dist_id); ALTER TABLE distributors DROP CONSTRAINT distributors_pkey,
> ADD CONSTRAINT distributors_pkey PRIMARY KEY USING INDEX
> dist_id_temp_idx;
>
> However, the initial drop of the primary key constraint fails because
> there are a whole bunch of foreign keys depending on it.
>
> I've done some searching and haven't found a workable solution. Is
> there any way to swap in the new index for the primary key constraint
> without dropping all dependent foreign keys? Or am I pretty much
> stuck with dropping and recreating all of the foreign keys?

REINDEX is not working here?

Cheers,
Frank
--
Frank Lanitz <frank(at)frank(dot)uvena(dot)de>


From: Chris Ernst <cernst(at)zvelo(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-16 08:32:28
Message-ID: 4F8BD91C.6070405@zvelo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On 04/15/2012 10:57 PM, Frank Lanitz wrote:
> On Sun, 15 Apr 2012 18:41:05 -0600 Chris Ernst <cernst(at)zvelo(dot)com>
> wrote:
>
>> Hi all,
>>
>> In PostgreSQL 9.1.3, I have a few fairly large tables with
>> bloated primary key indexes. I'm trying to replace them using
>> newly created unique indexes as outlined in the docs. Something
>> like:
>>
>> CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON
>> distributors (dist_id); ALTER TABLE distributors DROP CONSTRAINT
>> distributors_pkey, ADD CONSTRAINT distributors_pkey PRIMARY KEY
>> USING INDEX dist_id_temp_idx;
>>
>> However, the initial drop of the primary key constraint fails
>> because there are a whole bunch of foreign keys depending on it.
>>
>> I've done some searching and haven't found a workable solution.
>> Is there any way to swap in the new index for the primary key
>> constraint without dropping all dependent foreign keys? Or am I
>> pretty much stuck with dropping and recreating all of the foreign
>> keys?
>
> REINDEX is not working here?

Hi Frank,

Thanks, but REINDEX is not an option as it would take an exclusive
lock on the table for several hours.

For all of the other indexes, I create a new index concurrently, drop
the old and swap in the new. But the primary key is a bit trickier
because I can't drop the primary key index without dropping the
primary key constraint and I can't drop the primary key constraint
without dropping all of the foreign keys that reference that column.

- Chris


From: Frank Lanitz <frank(at)frank(dot)uvena(dot)de>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-16 08:39:14
Message-ID: 4F8BDAB2.7010209@frank.uvena.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

Am 16.04.2012 10:32, schrieb Chris Ernst:
> On 04/15/2012 10:57 PM, Frank Lanitz wrote:
>> On Sun, 15 Apr 2012 18:41:05 -0600 Chris Ernst <cernst(at)zvelo(dot)com>
>> wrote:
>>
>>> Hi all,
>>>
>>> In PostgreSQL 9.1.3, I have a few fairly large tables with
>>> bloated primary key indexes. I'm trying to replace them using
>>> newly created unique indexes as outlined in the docs. Something
>>> like:
>>>
>>> CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON
>>> distributors (dist_id); ALTER TABLE distributors DROP CONSTRAINT
>>> distributors_pkey, ADD CONSTRAINT distributors_pkey PRIMARY KEY
>>> USING INDEX dist_id_temp_idx;
>>>
>>> However, the initial drop of the primary key constraint fails
>>> because there are a whole bunch of foreign keys depending on it.
>>>
>>> I've done some searching and haven't found a workable solution.
>>> Is there any way to swap in the new index for the primary key
>>> constraint without dropping all dependent foreign keys? Or am I
>>> pretty much stuck with dropping and recreating all of the foreign
>>> keys?
>>
>> REINDEX is not working here?
>
> Hi Frank,
>
> Thanks, but REINDEX is not an option as it would take an exclusive
> lock on the table for several hours.

Well, from my little view I guess all rebuilding index action would
require such, as its the primary key with uniqueness. I'd think of a
complete reinit of the cluster with pg_dump and restoring, but this
would also need a downtime at least for write access.

Why is the index so bloated?

Cheers,
Frank


From: Chris Ernst <cernst(at)zvelo(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-16 13:54:22
Message-ID: 4F8C248E.7070307@zvelo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On 04/16/2012 02:39 AM, Frank Lanitz wrote:
> Am 16.04.2012 10:32, schrieb Chris Ernst:
>> On 04/15/2012 10:57 PM, Frank Lanitz wrote:
>>> On Sun, 15 Apr 2012 18:41:05 -0600 Chris Ernst <cernst(at)zvelo(dot)com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In PostgreSQL 9.1.3, I have a few fairly large tables with
>>>> bloated primary key indexes. I'm trying to replace them using
>>>> newly created unique indexes as outlined in the docs. Something
>>>> like:
>>>>
>>>> CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON
>>>> distributors (dist_id); ALTER TABLE distributors DROP CONSTRAINT
>>>> distributors_pkey, ADD CONSTRAINT distributors_pkey PRIMARY KEY
>>>> USING INDEX dist_id_temp_idx;
>>>>
>>>> However, the initial drop of the primary key constraint fails
>>>> because there are a whole bunch of foreign keys depending on it.
>>>>
>>>> I've done some searching and haven't found a workable solution.
>>>> Is there any way to swap in the new index for the primary key
>>>> constraint without dropping all dependent foreign keys? Or am I
>>>> pretty much stuck with dropping and recreating all of the foreign
>>>> keys?
>>>
>>> REINDEX is not working here?
>>
>> Hi Frank,
>>
>> Thanks, but REINDEX is not an option as it would take an exclusive
>> lock on the table for several hours.
>
> Well, from my little view I guess all rebuilding index action would
> require such, as its the primary key with uniqueness. I'd think of a
> complete reinit of the cluster with pg_dump and restoring, but this
> would also need a downtime at least for write access.
>
> Why is the index so bloated?

As in my original post, you can create a unique index concurrently and
then replace the primary key index with it. This way, the index
creation doesn't require an exclusive lock. You only need a very brief
exclusive lock to drop and recreate the primary key constraint using the
new index.

However, the index creation is not the issue here. That part is done.
The issue is that there are several foreign keys depending on the
primary key index that I want to drop and replace with the newly built
unique index. I would prefer not to drop and recreate all of the
foreign keys as that would require many hours of down time as well (the
very situation I was trying to avoid by building the index concurrently
and swapping it in).

I believe the index bloat is due to a combination of under aggressive
autovacuum settings and recently deleting about 30% of the table.

- Chris


From: amador alvarez <aalvarez(at)d2(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-17 01:02:18
Message-ID: 4F8CC11A.5020003@d2.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

How about deferring the FK's while recreating the PK ?
or using a temporary parallel table to be pointed by the other tables
(FK) and swap it up on the recreation.

Cheers,
A.A

On 04/16/2012 06:54 AM, Chris Ernst wrote:
> On 04/16/2012 02:39 AM, Frank Lanitz wrote:
>> Am 16.04.2012 10:32, schrieb Chris Ernst:
>>> On 04/15/2012 10:57 PM, Frank Lanitz wrote:
>>>> On Sun, 15 Apr 2012 18:41:05 -0600 Chris Ernst<cernst(at)zvelo(dot)com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In PostgreSQL 9.1.3, I have a few fairly large tables with
>>>>> bloated primary key indexes. I'm trying to replace them using
>>>>> newly created unique indexes as outlined in the docs. Something
>>>>> like:
>>>>>
>>>>> CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON
>>>>> distributors (dist_id); ALTER TABLE distributors DROP CONSTRAINT
>>>>> distributors_pkey, ADD CONSTRAINT distributors_pkey PRIMARY KEY
>>>>> USING INDEX dist_id_temp_idx;
>>>>>
>>>>> However, the initial drop of the primary key constraint fails
>>>>> because there are a whole bunch of foreign keys depending on it.
>>>>>
>>>>> I've done some searching and haven't found a workable solution.
>>>>> Is there any way to swap in the new index for the primary key
>>>>> constraint without dropping all dependent foreign keys? Or am I
>>>>> pretty much stuck with dropping and recreating all of the foreign
>>>>> keys?
>>>> REINDEX is not working here?
>>> Hi Frank,
>>>
>>> Thanks, but REINDEX is not an option as it would take an exclusive
>>> lock on the table for several hours.
>> Well, from my little view I guess all rebuilding index action would
>> require such, as its the primary key with uniqueness. I'd think of a
>> complete reinit of the cluster with pg_dump and restoring, but this
>> would also need a downtime at least for write access.
>>
>> Why is the index so bloated?
> As in my original post, you can create a unique index concurrently and
> then replace the primary key index with it. This way, the index
> creation doesn't require an exclusive lock. You only need a very brief
> exclusive lock to drop and recreate the primary key constraint using the
> new index.
>
> However, the index creation is not the issue here. That part is done.
> The issue is that there are several foreign keys depending on the
> primary key index that I want to drop and replace with the newly built
> unique index. I would prefer not to drop and recreate all of the
> foreign keys as that would require many hours of down time as well (the
> very situation I was trying to avoid by building the index concurrently
> and swapping it in).
>
> I believe the index bloat is due to a combination of under aggressive
> autovacuum settings and recently deleting about 30% of the table.
>
> - Chris
>


From: Chris Ernst <cernst(at)zvelo(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-17 02:55:17
Message-ID: 4F8CDB95.3040407@zvelo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On 04/16/2012 07:02 PM, amador alvarez wrote:
> How about deferring the FK's while recreating the PK ?
> or using a temporary parallel table to be pointed by the other tables
> (FK) and swap it up on the recreation.

Hmm.. Interesting. But it appears that you have to declare the foreign
key as deferrable at creation. Is there any way to set an existing
foreign key as deferrable?

- Chris


From: Igor Neyman <ineyman(at)perceptron(dot)com>
To: Chris Ernst <cernst(at)zvelo(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-17 13:43:59
Message-ID: A76B25F2823E954C9E45E32FA49D70EC03DACFB6@mail.corp.perceptron.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

> -----Original Message-----
> From: Chris Ernst [mailto:cernst(at)zvelo(dot)com]
> Sent: Monday, April 16, 2012 10:55 PM
> To: pgsql-admin(at)postgresql(dot)org
> Subject: Re: Recreate primary key without dropping foreign keys?
>
> On 04/16/2012 07:02 PM, amador alvarez wrote:
> > How about deferring the FK's while recreating the PK ?
> > or using a temporary parallel table to be pointed by the other tables
> > (FK) and swap it up on the recreation.
>
> Hmm.. Interesting. But it appears that you have to declare the
> foreign
> key as deferrable at creation. Is there any way to set an existing
> foreign key as deferrable?
>
> - Chris

May be this (from the docs) would help:

"ADD table_constraint [ NOT VALID ]

This form adds a new constraint to a table using the same syntax as CREATE TABLE, plus the option NOT VALID, which is currently only allowed for foreign key constraints. If the constraint is marked NOT VALID, the potentially-lengthy initial check to verify that all rows in the table satisfy the constraint is skipped. The constraint will still be enforced against subsequent inserts or updates (that is, they'll fail unless there is a matching row in the referenced table). But the database will not assume that the constraint holds for all rows in the table, until it is validated by using the VALIDATE CONSTRAINT option."

Using this option you can drop and recreate corresponding FKs in a very short time, and start using them, while postponing to run "VALIDATE CONSTRAINT" for later.

It's similar to Oracle's adding FK with "NOCHECK" option, but if IRC there is no need to run "VALIDATE CONSTRAINT" later.

Regards,
Igor Neyman


From: amador alvarez <aalvarez(at)d2(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-17 19:36:30
Message-ID: 4F8DC63E.7000004@d2.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general


Unfortunately I checked out that the deferrable option does not let us
drop the PK (postgres8.4) while remaining FK's , I did not try on the
constraint as NOT VALID is not supported by postgres8.
So unless you have a 9 release or you get a try on a parallel table, you
have to follow the manual procedure :

Generate new index
drop FK's
Drop PK
Recreate PK swiching to the new index
Recreate FK's

Can you afford a quick temporary user access to the database?

On 04/17/2012 06:43 AM, Igor Neyman wrote:
>> -----Original Message-----
>> From: Chris Ernst [mailto:cernst(at)zvelo(dot)com]
>> Sent: Monday, April 16, 2012 10:55 PM
>> To: pgsql-admin(at)postgresql(dot)org
>> Subject: Re: Recreate primary key without dropping foreign keys?
>>
>> On 04/16/2012 07:02 PM, amador alvarez wrote:
>>> How about deferring the FK's while recreating the PK ?
>>> or using a temporary parallel table to be pointed by the other tables
>>> (FK) and swap it up on the recreation.
>> Hmm.. Interesting. But it appears that you have to declare the
>> foreign
>> key as deferrable at creation. Is there any way to set an existing
>> foreign key as deferrable?
>>
>> - Chris
> May be this (from the docs) would help:
>
> "ADD table_constraint [ NOT VALID ]
>
> This form adds a new constraint to a table using the same syntax as CREATE TABLE, plus the option NOT VALID, which is currently only allowed for foreign key constraints. If the constraint is marked NOT VALID, the potentially-lengthy initial check to verify that all rows in the table satisfy the constraint is skipped. The constraint will still be enforced against subsequent inserts or updates (that is, they'll fail unless there is a matching row in the referenced table). But the database will not assume that the constraint holds for all rows in the table, until it is validated by using the VALIDATE CONSTRAINT option."
>
> Using this option you can drop and recreate corresponding FKs in a very short time, and start using them, while postponing to run "VALIDATE CONSTRAINT" for later.
>
> It's similar to Oracle's adding FK with "NOCHECK" option, but if IRC there is no need to run "VALIDATE CONSTRAINT" later.
>
> Regards,
> Igor Neyman
>


From: Chris Ernst <cernst(at)zvelo(dot)com>
To: Igor Neyman <ineyman(at)perceptron(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Recreate primary key without dropping foreign keys?
Date: 2012-04-18 00:03:37
Message-ID: 4F8E04D9.6010402@zvelo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On 04/17/2012 07:43 AM, Igor Neyman wrote:
>> -----Original Message-----
>> From: Chris Ernst [mailto:cernst(at)zvelo(dot)com]
>> Sent: Monday, April 16, 2012 10:55 PM
>> To: pgsql-admin(at)postgresql(dot)org
>> Subject: Re: Recreate primary key without dropping foreign keys?
>>
>> On 04/16/2012 07:02 PM, amador alvarez wrote:
>>> How about deferring the FK's while recreating the PK ?
>>> or using a temporary parallel table to be pointed by the other tables
>>> (FK) and swap it up on the recreation.
>>
>> Hmm.. Interesting. But it appears that you have to declare the
>> foreign
>> key as deferrable at creation. Is there any way to set an existing
>> foreign key as deferrable?
>>
>> - Chris
>
> May be this (from the docs) would help:
>
> "ADD table_constraint [ NOT VALID ]
>
> This form adds a new constraint to a table using the same syntax as CREATE TABLE, plus the option NOT VALID, which is currently only allowed for foreign key constraints. If the constraint is marked NOT VALID, the potentially-lengthy initial check to verify that all rows in the table satisfy the constraint is skipped. The constraint will still be enforced against subsequent inserts or updates (that is, they'll fail unless there is a matching row in the referenced table). But the database will not assume that the constraint holds for all rows in the table, until it is validated by using the VALIDATE CONSTRAINT option."
>
> Using this option you can drop and recreate corresponding FKs in a very short time, and start using them, while postponing to run "VALIDATE CONSTRAINT" for later.
>
> It's similar to Oracle's adding FK with "NOCHECK" option, but if IRC there is no need to run "VALIDATE CONSTRAINT" later.

Hi Igor,

Oooooo... I like the sound of this. I'll give this a shot in the test
environment and report back my findings.

Thanks a bunch!

- Chris


From: Evan Rempel <erempel(at)uvic(dot)ca>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Change request - log line prefix
Date: 2012-05-31 04:05:23
Message-ID: F1226B1821911E4BB69462DAF68098536E32333755@EMC6.uvic.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

I am part of a team that fills an operational roll administering 1000+ servers and
100's of applications. Of course we need to "read" all of our logs, and must use computers to
help us. In filtering postgreSQL logs there is one thing that makes life difficult for us admins.

Nice things about the postgreSQL logs

- user definable prefix
- each log line after the prefix contains a log line status such as;
ERROR:
FATAL:
LOG:
NOTICE:
WARNING:
STATEMENT:

- the configurable compile time option to set the wrap column for the log lines.

Now for the bad things

Even when the wrap column is set to a very large value (32k) STATEMENT lines still wrap according to the line breaks in
the original SQL statement.

Wrapped line no longer have the prefix - difficult to grep the log for everything pertaining to a particular database or user

Wrapped lines no longer have the log line status - difficult to auto-ignore all NOTICE status log lines when they wrap, or
ignore all user STATEMENT lines because they almost always wrap.

In conclusion, I would like to see a logging change that included the prefix on EVERY line, and included the STATUS on every line.

Comments?

If everyone :-) is in agreement can the authors just "get it done"?

Thanks for your time.

Evan Rempel
Systems administrator
University of Victoria.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Evan Rempel <erempel(at)uvic(dot)ca>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Change request - log line prefix
Date: 2012-05-31 05:06:35
Message-ID: 28335.1338440795@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

Evan Rempel <erempel(at)uvic(dot)ca> writes:
> Even when the wrap column is set to a very large value (32k) STATEMENT lines still wrap according to the line breaks in
> the original SQL statement.
> Wrapped line no longer have the prefix - difficult to grep the log for everything pertaining to a particular database or user
> Wrapped lines no longer have the log line status - difficult to auto-ignore all NOTICE status log lines when they wrap, or
> ignore all user STATEMENT lines because they almost always wrap.

I think your life would be better if you used CSV log format.

> In conclusion, I would like to see a logging change that included the prefix on EVERY line, and included the STATUS on every line.

This doesn't really sound like an improvement to me. It's going to make
the logs bulkier, but they're still not automatically parseable in any
meaningful sense. CSV is the way to go if you want machine-readable logs.

regards, tom lane


From: Chris Angelico <rosuav(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Change request - log line prefix
Date: 2012-05-31 05:19:15
Message-ID: CAPTjJmpphKtg6-PwbanZSKPZuZvGLo18Q4T9XNav6m2tSkx1dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On Thu, May 31, 2012 at 2:05 PM, Evan Rempel <erempel(at)uvic(dot)ca> wrote:
> Even when the wrap column is set to a very large value (32k) STATEMENT lines still wrap according to the line breaks in
> the original SQL statement.

The problem isn't so much the wrapping, it seems, as that your
statements' line breaks are being propagated through. So as a possible
alternative solution, perhaps there could be an option to replace
newlines with spaces before the line goes to the log?

ChrisA


From: Evan Rempel <erempel(at)uvic(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Change request - log line prefix
Date: 2012-05-31 05:47:05
Message-ID: E4C9ECD7-D6F9-4557-8A8F-576B7F4DC7AF@uvic.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

Can this be done to syslog destination?

Evan Rempel
Systems Administrator
University of Victoria

On 2012-05-30, at 10:37 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Evan Rempel <erempel(at)uvic(dot)ca> writes:
>> Even when the wrap column is set to a very large value (32k) STATEMENT lines still wrap according to the line breaks in
>> the original SQL statement.
>> Wrapped line no longer have the prefix - difficult to grep the log for everything pertaining to a particular database or user
>> Wrapped lines no longer have the log line status - difficult to auto-ignore all NOTICE status log lines when they wrap, or
>> ignore all user STATEMENT lines because they almost always wrap.
>
> I think your life would be better if you used CSV log format.
>
>> In conclusion, I would like to see a logging change that included the prefix on EVERY line, and included the STATUS on every line.
>
> This doesn't really sound like an improvement to me. It's going to make
> the logs bulkier, but they're still not automatically parseable in any
> meaningful sense. CSV is the way to go if you want machine-readable logs.
>
> regards, tom lane


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Evan Rempel <erempel(at)uvic(dot)ca>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Change request - log line prefix
Date: 2012-05-31 09:37:28
Message-ID: 20120531093728.GA25109@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On Wed, May 30, 2012 at 09:05:23PM -0700, Evan Rempel wrote:
> I am part of a team that fills an operational roll administering 1000+ servers and
> 100's of applications. Of course we need to "read" all of our logs, and must use computers to
> help us. In filtering postgreSQL logs there is one thing that makes life difficult for us admins.

consider using pg.grep:
http://www.depesz.com/2012/01/23/some-new-tools-for-postgresql-or-around-postgresql/

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: Chris Angelico <rosuav(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Change request - log line prefix
Date: 2012-05-31 13:54:43
Message-ID: CADmi=6MxWPkPcEdUHNZp+S2P_5Z5_hsNbWS6D+zbRd==Vt9rvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On Thu, May 31, 2012 at 12:19 PM, Chris Angelico <rosuav(at)gmail(dot)com> wrote:
> On Thu, May 31, 2012 at 2:05 PM, Evan Rempel <erempel(at)uvic(dot)ca> wrote:
>> Even when the wrap column is set to a very large value (32k) STATEMENT lines still wrap according to the line breaks in
>> the original SQL statement.
>
> The problem isn't so much the wrapping, it seems, as that your
> statements' line breaks are being propagated through. So as a possible
> alternative solution, perhaps there could be an option to replace
> newlines with spaces before the line goes to the log?

I'd certainly like to see this or similar (encode the querys into a
single line of ascii, lossy is ok). I like my logs both readable and
greppable.

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/


From: Evan Rempel <erempel(at)uvic(dot)ca>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: is there a select for update insert if not exist type command?
Date: 2012-06-09 22:41:34
Message-ID: F1226B1821911E4BB69462DAF68098536E3233376B@EMC6.uvic.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

I have a project where I will have two clients essentially doing the
same things at the same time. The idea is that if one has already done the
work, then the second one does not need to do it.

I was hoping that adding a task related unique identifier to a table
could be used to coordinate these client, something like a primary key and using
select for update.

The challenge I have is during the initial insert. One of the two clients will cause postgresql
to log an error, which I would rather avoid (just seems dirty).

Here is the time line;

Both clients A and B becomes aware to do a task

Client A or client B issues the "select for update ... if not exist do insert" type command
The other client gets blocked on the "select for update.

First client finishes insert/updates to record that it has delt with the task

second client gets unblocked and reads the record realizing that the first client delt with the task already.

It is the "select for update ... if not exist do insert" type command that I am ignorant of how to code.

Anyone care to school me?

Evan.


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Evan Rempel <erempel(at)uvic(dot)ca>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: is there a select for update insert if not exist type command?
Date: 2012-06-09 23:35:22
Message-ID: 20120609193522.51e1cdf1.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

On Sat, 9 Jun 2012 15:41:34 -0700 Evan Rempel <erempel(at)uvic(dot)ca> wrote:

> I have a project where I will have two clients essentially doing the
> same things at the same time. The idea is that if one has already done the
> work, then the second one does not need to do it.
>
> I was hoping that adding a task related unique identifier to a table
> could be used to coordinate these client, something like a primary key and using
> select for update.
>
> The challenge I have is during the initial insert. One of the two clients will cause postgresql
> to log an error, which I would rather avoid (just seems dirty).
>
> Here is the time line;
>
> Both clients A and B becomes aware to do a task
>
> Client A or client B issues the "select for update ... if not exist do insert" type command
> The other client gets blocked on the "select for update.
>
> First client finishes insert/updates to record that it has delt with the task
>
> second client gets unblocked and reads the record realizing that the first client delt with the task already.
>
>
> It is the "select for update ... if not exist do insert" type command that I am ignorant of how to code.
>
> Anyone care to school me?

It's amazing to me how often I have this conversation ...

How would you expect SELECT FOR UPDATE to work when you're checking to see
if you can insert a row? If the row doesn't exist, there's nothing to
lock against, and thus it doesn't help anything. FOR UPDATE is only
useful if you're UPDATING a row.

That being given, there are a number of ways to solve your problem. Which
one you use depends on a number of factors.

If it's x number of processes all contending for one piece of work, you could
just exclusive lock the entire table, and do the check/insert with the
table locked. This essentially creates a wait queue.

If the processes need to coordinate around doing several pieces of work, you
can put a row in for each piece of work with a boolean field indicating
whether a process is currently working on it. Then you can SELECT FOR
UPDATE a particular row representing work to be done, and if the boolean
isn't already true, set it to true and start working. In my experience,
you'll benefit from going a few steps forward and storing some information
about what's being done on it (like the PID of the process working on it,
and the time it started processing) -- it just makes problems easier to
debug later.

There are other approaches as well, but those are the two that come to
mind.

Not sure what your experience level is, but I'll point out that these
kinds of things only work well if you're transaction management is
correct. I have seen people struggle to get these kind of things working
because they didn't really understand how transactions and locking interact,
or they were using some sort of abstraction layer that does transaction
stuff in such an opaque way that they couldn't figure out what was actually
happening.

Hope this helps.

--
Bill Moran <wmoran(at)potentialtech(dot)com>


From: Edson Richter <edsonrichter(at)hotmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: is there a select for update insert if not exist type command?
Date: 2012-06-09 23:57:50
Message-ID: BLU0-SMTP3246D2E75F7F7E8F7A60C6ACFF10@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

You will find this reading a good start point:
http://www.cs.uiuc.edu/class/fa07/cs411/lectures/cs411-f07-tranmgr-3.pdf

There are no "fit all needs" cookbook about this, you will have to learn
the theory about transactional database transaction management and
locking mechanism and work on your solution.

Wish you the best,

Edson.

Em 09/06/2012 19:41, Evan Rempel escreveu:
> I have a project where I will have two clients essentially doing the
> same things at the same time. The idea is that if one has already done the
> work, then the second one does not need to do it.
>
> I was hoping that adding a task related unique identifier to a table
> could be used to coordinate these client, something like a primary key and using
> select for update.
>
> The challenge I have is during the initial insert. One of the two clients will cause postgresql
> to log an error, which I would rather avoid (just seems dirty).
>
> Here is the time line;
>
> Both clients A and B becomes aware to do a task
>
> Client A or client B issues the "select for update ... if not exist do insert" type command
> The other client gets blocked on the "select for update.
>
> First client finishes insert/updates to record that it has delt with the task
>
> second client gets unblocked and reads the record realizing that the first client delt with the task already.
>
>
> It is the "select for update ... if not exist do insert" type command that I am ignorant of how to code.
>
> Anyone care to school me?
>
> Evan.


From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Evan Rempel'" <erempel(at)uvic(dot)ca>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: is there a select for update insert if not exist type command?
Date: 2012-06-10 00:10:38
Message-ID: 001a01cd469d$791b6260$6b522720$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

> -----Original Message-----
>
> Both clients A and B becomes aware to do a task
>

Ideally you would have this aware-ness manifested as an INSERT into some
kind of job table. The clients can issue the "SELECT FOR UPDATE" + "UPDATE"
commands to indicate that they are going to be responsible for said task.
You seem to combining "something needs to be done" with "I am able to do
that something". You may not have a choice depending on your situation but
it is something to think about - how can I just focus on implementing the
"something needs to be done" part.

If you want to avoid the errors appearing in the logs or client you could
just wrap the INSERT command into a function and trap the duplicate key
exception.

It is hard to give suggestions when you are as vague as "becomes aware to do
a task". Ideally even if you have multiple clients monitoring for "aware
state" only one client should ever actually realize said awareness for a
given task. In effect you want to serialize the monitoring routine at this
level, insert the "something needs to be done" record, then serialize (for
update) the "I am able to do that something" action.

David J.


From: Evan Rempel <erempel(at)uvic(dot)ca>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: is there a select for update insert if not exist type command?
Date: 2012-06-10 02:40:12
Message-ID: F1226B1821911E4BB69462DAF68098536E3233376C@EMC6.uvic.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

Thanks for the input. Dave also replied indicating that without more details it
is difficult the really help. I was intentionally vague to see if there was some
SQL standard way like to mysql "insert ... on duplicate update ... " syntax, or
the proposed MSSQL merge command.

Since not, I'll give a lot more detail without writing a book.

We are working on a project where a syslog stream arrives and is parsed in
real time. Every log message is considered to be an "event". Events can usually
be ignored because they are "normal behaviour". Some events indicate a problem
and should create an incident. Repetitions of the same event should not create a new
incident if the current incident is not resolved.

For redundancy, two independent systems will consume and process these events.

Now for the part where postgresql comes in.

When an event that should create an incident is encountered, only one incident
should be created. The incident details are not known until such time as the event
occurs, so no pre-population of tables can occur. So only one of the two server should
perform the insert, and then update it with details of successive events, possibly ticketing
system identification, date/time of last pager message that went out.

Once the incident is placed into postgresql, everything is easy, "select for update",
determine all that should take place like paging, updating tickets, recording date/time
of last alert sent to administrators.

It is just that first insert that is the challenge. One system does the insert, only to have the
other do the "select for update". I would like to have an insert that is locked AND visible to
other sessions.

Exclusive lock on the table is an idea, but it serializes ALL new incident creation, and we
only NEED to serialize for the same incident identifier. Since both (all) of the systems
that will be processing the live log stream, in all likelihood all of the servers will always be
working on the same data and thus the same incident, so will always be locking the same
piece of data anyway, so the full table lock may not be any worse.

I was thinking of using and advisory lock, which would also serialize everything, just like a table lock,
but again, that may not be a problem since all processes work on the same data at the same time
anyways.

I could also use a custom "my_locks" table that just has rows with unique values that I do a
"select for update" on to serialize everything. Again, no functional difference from table or advisory locks.

A follow on questions;

Is there anything inherently atomic about a stored procedure?
Does the stored procedure simply run within the transaction context of where it is called from?

begin transaction
- select storedProc1
- select storedProc2
commit

Would the actions of both stored procedures would be a single atomic action?

Thanks again for lending me your experience, it can, and is, saving me days.

Evan.

________________________________________
From: Bill Moran [wmoran(at)potentialtech(dot)com]
Sent: Saturday, June 09, 2012 4:35 PM
To: Evan Rempel
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] is there a select for update insert if not exist type command?

On Sat, 9 Jun 2012 15:41:34 -0700 Evan Rempel <erempel(at)uvic(dot)ca> wrote:

> I have a project where I will have two clients essentially doing the
> same things at the same time. The idea is that if one has already done the
> work, then the second one does not need to do it.
>
> I was hoping that adding a task related unique identifier to a table
> could be used to coordinate these client, something like a primary key and using
> select for update.
>
> The challenge I have is during the initial insert. One of the two clients will cause postgresql
> to log an error, which I would rather avoid (just seems dirty).
>
> Here is the time line;
>
> Both clients A and B becomes aware to do a task
>
> Client A or client B issues the "select for update ... if not exist do insert" type command
> The other client gets blocked on the "select for update.
>
> First client finishes insert/updates to record that it has delt with the task
>
> second client gets unblocked and reads the record realizing that the first client delt with the task already.
>
>
> It is the "select for update ... if not exist do insert" type command that I am ignorant of how to code.
>
> Anyone care to school me?

It's amazing to me how often I have this conversation ...

How would you expect SELECT FOR UPDATE to work when you're checking to see
if you can insert a row? If the row doesn't exist, there's nothing to
lock against, and thus it doesn't help anything. FOR UPDATE is only
useful if you're UPDATING a row.

That being given, there are a number of ways to solve your problem. Which
one you use depends on a number of factors.

If it's x number of processes all contending for one piece of work, you could
just exclusive lock the entire table, and do the check/insert with the
table locked. This essentially creates a wait queue.

If the processes need to coordinate around doing several pieces of work, you
can put a row in for each piece of work with a boolean field indicating
whether a process is currently working on it. Then you can SELECT FOR
UPDATE a particular row representing work to be done, and if the boolean
isn't already true, set it to true and start working. In my experience,
you'll benefit from going a few steps forward and storing some information
about what's being done on it (like the PID of the process working on it,
and the time it started processing) -- it just makes problems easier to
debug later.

There are other approaches as well, but those are the two that come to
mind.

Not sure what your experience level is, but I'll point out that these
kinds of things only work well if you're transaction management is
correct. I have seen people struggle to get these kind of things working
because they didn't really understand how transactions and locking interact,
or they were using some sort of abstraction layer that does transaction
stuff in such an opaque way that they couldn't figure out what was actually
happening.

Hope this helps.

--
Bill Moran <wmoran(at)potentialtech(dot)com>


From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: Evan Rempel <erempel(at)uvic(dot)ca>, pgsql-general(at)postgresql(dot)org
Subject: Re: is there a select for update insert if not exist type command?
Date: 2012-06-10 02:43:46
Message-ID: CAKt_ZfuoUW1mrFv74Ye3jH9UFOqCx-QJ2GdJEfcMjyiZf0uC+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

Depending on the version of Pg there are two possible solutions to
this problem. The first (old solution) that really only works well
one row at a time is to do a stored procedure that does something
like:

update foo set bar = baz where id = in_id

if not found
insert into foo (bar) values (baz)
end if;

The newer way, which can be done in SQL with Pg 9.1 is to use writable
common table expressions. See
http://vibhorkumar.wordpress.com/2011/10/26/upsertmerge-using-writable-cte-in-postgresql-9-1/
for an example by Vibhor Kumar.

Best Wishes,
Chris Travers


From: Edson Richter <edsonrichter(at)hotmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: is there a select for update insert if not exist type command?
Date: 2012-06-10 02:59:57
Message-ID: BLU0-SMTP14970228085F5A9C00A6000CFF00@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-general

One of the possible strategies that comes to my mind is:

1) Log your "syslog stream" into PostgreSQL database (no need to record
all message, just a simple table the the event key and a flag
"processed" field)
2) When problem event arrives, the first server should "select for
update" on the event of syslog, then update a "processed" field (from
"0" to "1" or false to true, something like that).

When second server try to get the "select for update" on the events
table, it will fails, and then it can move ahead in the log looking for
other errors that need attention.
I cannot say how adversely this strategy would affect your system (or if
there are other contention involved, like page locks), but seems very
logical for me.
I've used this to distribute processing on users desktops for a massive
processing system with success (but using MySQL for storing data).

Regards,

Edson

Em 09/06/2012 23:40, Evan Rempel escreveu:
> Thanks for the input. Dave also replied indicating that without more details it
> is difficult the really help. I was intentionally vague to see if there was some
> SQL standard way like to mysql "insert ... on duplicate update ... " syntax, or
> the proposed MSSQL merge command.
>
> Since not, I'll give a lot more detail without writing a book.
>
> We are working on a project where a syslog stream arrives and is parsed in
> real time. Every log message is considered to be an "event". Events can usually
> be ignored because they are "normal behaviour". Some events indicate a problem
> and should create an incident. Repetitions of the same event should not create a new
> incident if the current incident is not resolved.
>
> For redundancy, two independent systems will consume and process these events.
>
>
> Now for the part where postgresql comes in.
>
> When an event that should create an incident is encountered, only one incident
> should be created. The incident details are not known until such time as the event
> occurs, so no pre-population of tables can occur. So only one of the two server should
> perform the insert, and then update it with details of successive events, possibly ticketing
> system identification, date/time of last pager message that went out.
>
> Once the incident is placed into postgresql, everything is easy, "select for update",
> determine all that should take place like paging, updating tickets, recording date/time
> of last alert sent to administrators.
>
> It is just that first insert that is the challenge. One system does the insert, only to have the
> other do the "select for update". I would like to have an insert that is locked AND visible to
> other sessions.
>
> Exclusive lock on the table is an idea, but it serializes ALL new incident creation, and we
> only NEED to serialize for the same incident identifier. Since both (all) of the systems
> that will be processing the live log stream, in all likelihood all of the servers will always be
> working on the same data and thus the same incident, so will always be locking the same
> piece of data anyway, so the full table lock may not be any worse.
>
> I was thinking of using and advisory lock, which would also serialize everything, just like a table lock,
> but again, that may not be a problem since all processes work on the same data at the same time
> anyways.
>
> I could also use a custom "my_locks" table that just has rows with unique values that I do a
> "select for update" on to serialize everything. Again, no functional difference from table or advisory locks.
>
>
> A follow on questions;
>
> Is there anything inherently atomic about a stored procedure?
> Does the stored procedure simply run within the transaction context of where it is called from?
>
> begin transaction
> - select storedProc1
> - select storedProc2
> commit
>
> Would the actions of both stored procedures would be a single atomic action?
>
> Thanks again for lending me your experience, it can, and is, saving me days.
>
> Evan.
>
> ________________________________________
> From: Bill Moran [wmoran(at)potentialtech(dot)com]
> Sent: Saturday, June 09, 2012 4:35 PM
> To: Evan Rempel
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] is there a select for update insert if not exist type command?
>
> On Sat, 9 Jun 2012 15:41:34 -0700 Evan Rempel<erempel(at)uvic(dot)ca> wrote:
>
>> I have a project where I will have two clients essentially doing the
>> same things at the same time. The idea is that if one has already done the
>> work, then the second one does not need to do it.
>>
>> I was hoping that adding a task related unique identifier to a table
>> could be used to coordinate these client, something like a primary key and using
>> select for update.
>>
>> The challenge I have is during the initial insert. One of the two clients will cause postgresql
>> to log an error, which I would rather avoid (just seems dirty).
>>
>> Here is the time line;
>>
>> Both clients A and B becomes aware to do a task
>>
>> Client A or client B issues the "select for update ... if not exist do insert" type command
>> The other client gets blocked on the "select for update.
>>
>> First client finishes insert/updates to record that it has delt with the task
>>
>> second client gets unblocked and reads the record realizing that the first client delt with the task already.
>>
>>
>> It is the "select for update ... if not exist do insert" type command that I am ignorant of how to code.
>>
>> Anyone care to school me?
> It's amazing to me how often I have this conversation ...
>
> How would you expect SELECT FOR UPDATE to work when you're checking to see
> if you can insert a row? If the row doesn't exist, there's nothing to
> lock against, and thus it doesn't help anything. FOR UPDATE is only
> useful if you're UPDATING a row.
>
> That being given, there are a number of ways to solve your problem. Which
> one you use depends on a number of factors.
>
> If it's x number of processes all contending for one piece of work, you could
> just exclusive lock the entire table, and do the check/insert with the
> table locked. This essentially creates a wait queue.
>
> If the processes need to coordinate around doing several pieces of work, you
> can put a row in for each piece of work with a boolean field indicating
> whether a process is currently working on it. Then you can SELECT FOR
> UPDATE a particular row representing work to be done, and if the boolean
> isn't already true, set it to true and start working. In my experience,
> you'll benefit from going a few steps forward and storing some information
> about what's being done on it (like the PID of the process working on it,
> and the time it started processing) -- it just makes problems easier to
> debug later.
>
> There are other approaches as well, but those are the two that come to
> mind.
>
> Not sure what your experience level is, but I'll point out that these
> kinds of things only work well if you're transaction management is
> correct. I have seen people struggle to get these kind of things working
> because they didn't really understand how transactions and locking interact,
> or they were using some sort of abstraction layer that does transaction
> stuff in such an opaque way that they couldn't figure out what was actually
> happening.
>
> Hope this helps.
>
> --
> Bill Moran<wmoran(at)potentialtech(dot)com>