Re: what can depend on index

Lists: pgsql-general
From: Szymon Guz <mabewlun(at)gmail(dot)com>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: what can depend on index
Date: 2010-10-26 08:13:04
Message-ID: AANLkTimsg68Y2ZaLcWXPnt+KjsJhNCyjOpPbAzLFD8Ah@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,
today I noticed that in the documentation there is DROP INDEX CASCADE.
I've got one question: what is that for? What can depend on index?

regards
Szymon


From: David Fetter <david(at)fetter(dot)org>
To: Szymon Guz <mabewlun(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: what can depend on index
Date: 2010-10-26 14:21:58
Message-ID: 20101026142158.GE8041@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 26, 2010 at 10:13:04AM +0200, Szymon Guz wrote:
> Hi,
> today I noticed that in the documentation there is DROP INDEX
> CASCADE. I've got one question: what is that for? What can depend
> on index?

A foreign key can, if the index is unique.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Szymon Guz <mabewlun(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: what can depend on index
Date: 2010-10-26 14:29:45
Message-ID: 24385.1288103385@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Szymon Guz <mabewlun(at)gmail(dot)com> writes:
> today I noticed that in the documentation there is DROP INDEX CASCADE.
> I've got one question: what is that for? What can depend on index?

Unique constraints and foreign key constraints, for starters.

But even if there weren't anything, we'd still provide the CASCADE
syntax for future-proofing purposes.

regards, tom lane


From: Szymon Guz <mabewlun(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: what can depend on index
Date: 2010-10-26 14:37:47
Message-ID: AANLkTi=7Rb9xr6rANagoML8KtWd9NrPWbthKpBaRHHc1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 26 October 2010 16:21, David Fetter <david(at)fetter(dot)org> wrote:

> On Tue, Oct 26, 2010 at 10:13:04AM +0200, Szymon Guz wrote:
> > Hi,
> > today I noticed that in the documentation there is DROP INDEX
> > CASCADE. I've got one question: what is that for? What can depend
> > on index?
>
> A foreign key can, if the index is unique.
>
>
>
Thanks for the answer.

regards
Szymon


From: Szymon Guz <mabewlun(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: what can depend on index
Date: 2010-10-26 14:37:58
Message-ID: AANLkTi=u0ELrPV8s09kvVHrhBeqj-TcMqQ3UDvWAPjRP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 26 October 2010 16:29, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Szymon Guz <mabewlun(at)gmail(dot)com> writes:
> > today I noticed that in the documentation there is DROP INDEX CASCADE.
> > I've got one question: what is that for? What can depend on index?
>
> Unique constraints and foreign key constraints, for starters.
>
> But even if there weren't anything, we'd still provide the CASCADE
> syntax for future-proofing purposes.
>
> regards, tom lane
>

Thanks for the answer, I didn't know about the foreign key, I only thought
about the future use.

regards
Szymon


From: Ozz Nixon <ozznixon(at)gmail(dot)com>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Cc: Ozz Nixon <ozznixon(at)gmail(dot)com>
Subject: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-26 17:18:41
Message-ID: D4718551-6E2A-4840-8BCF-20901EA0E570@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I am the only user on this system right now, and one table select count(*) took over 20 minutes:

wikitags exists and has 58,988,656 records.

Structure (in pascal) is:

quer.SQL.Add('create table '+DBTags+' (');
quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,');
quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,');
quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,');
quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,');
quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,');
quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,');
quer.SQL.Add(' instances '+SQL_INT32+' not null,');
if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK');
quer.SQL.Add(' primary key(pagename, tagword, instances)');
quer.SQL.Add(')');

where SQL_TITLE = 'varchar', SQL_IN32 = 'int'

I have hung off indexes for each column, to resolve my previous "performance" issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete.

Any suggestions?


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Ozz Nixon <ozznixon(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-26 17:26:31
Message-ID: 20101026132631.d23b0a39.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Ozz Nixon <ozznixon(at)gmail(dot)com>:

> I am the only user on this system right now, and one table select count(*) took over 20 minutes:
>
> wikitags exists and has 58,988,656 records.
>
> Structure (in pascal) is:
>
> quer.SQL.Add('create table '+DBTags+' (');
> quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,');
> quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,');
> quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,');
> quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,');
> quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,');
> quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,');
> quer.SQL.Add(' instances '+SQL_INT32+' not null,');
> if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK');
> quer.SQL.Add(' primary key(pagename, tagword, instances)');
> quer.SQL.Add(')');
>
> where SQL_TITLE = 'varchar', SQL_IN32 = 'int'
>
> I have hung off indexes for each column, to resolve my previous "performance" issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete.
>
> Any suggestions?

Generate the count one time and store it somewhere for quick retrieval.

In an MVCC database, count(*) is designed to be accurate, which requires
a scan of the entire table (which appears to take about 20 mins on your
hardware).

MVCC just isn't optimized for a table that never changes. However, it's
easy to cache that value, since it never changes the cache never needs
to be updated.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-26 17:33:05
Message-ID: 4CC710D1.3040001@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/26/10 10:18 AM, Ozz Nixon wrote:
> I am the only user on this system right now, and one table select count(*) took over 20 minutes:
>
> wikitags exists and has 58,988,656 records.
>
> Structure (in pascal) is:
>
> quer.SQL.Add('create table '+DBTags+' (');
> quer.SQL.Add(' pagename '+SQL_TITLE+'(100) not null,');
> quer.SQL.Add(' tagword '+SQL_TITLE+'(15) not null,');
> quer.SQL.Add(' soundex2 '+SQL_TITLE+'(4) not null,');
> quer.SQL.Add(' metaphone '+SQL_TITLE+'(15) not null,');
> quer.SQL.Add(' metaphone2 '+SQL_TITLE+'(22) not null,');
> quer.SQL.Add(' carverphone '+SQL_TITLE+'(22) not null,');
> quer.SQL.Add(' instances '+SQL_INT32+' not null,');
> if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add(' constraint '+DBTags+'_PK');
> quer.SQL.Add(' primary key(pagename, tagword, instances)');
> quer.SQL.Add(')');
>
> where SQL_TITLE = 'varchar', SQL_IN32 = 'int'
>
> I have hung off indexes for each column, to resolve my previous "performance" issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete.

count(*) has to read the whole table to get the accurate count. The
reason for this is that different clients can see different versions of
that table, for instance, if client A is already in a transaction, and
client B then does an INSERT, the two clients will see different values
for the count.


From: Alan Hodgson <ahodgson(at)simkin(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-26 17:33:51
Message-ID: 201010261033.51267.ahodgson@simkin.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On October 26, 2010 10:18:41 am Ozz Nixon wrote:
> I have hung off indexes for each column, to resolve my previous
> "performance" issue from 3+ weeks ago. However, COUNT() is still dog slow
> - this table is a write once, read many... *never* update, nor delete.
>
> Any suggestions?

If you need to do count(*) on 60 million row tables, you will probably need
faster hardware.


From: Diego Schulz <dschulz(at)gmail(dot)com>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-26 20:30:08
Message-ID: AANLkTinYVGq-wUbdUwULkt8STB9YkLr1hF0JKigcjqYZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 26, 2010 at 2:18 PM, Ozz Nixon <ozznixon(at)gmail(dot)com> wrote:
> I am the only user on this system right now, and one table select count(*) took over 20 minutes:
>
> wikitags exists and has 58,988,656 records.
>
> Structure (in pascal) is:
>
>   quer.SQL.Add('create table '+DBTags+' (');
>   quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
>   quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
>   quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
>   quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
>   quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
>   quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
>   quer.SQL.Add('   instances '+SQL_INT32+' not null,');
>   if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
>   quer.SQL.Add('   primary key(pagename, tagword, instances)');
>   quer.SQL.Add(')');
>
> where SQL_TITLE = 'varchar', SQL_IN32 = 'int'
>
> I have hung off indexes for each column, to resolve my previous "performance" issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete.
>
> Any suggestions?
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

You could try something like what's suggested in this blog post:
http://jakub.fedyczak.net/post/26

I didn't actually tried it, but I think it should work ok.

cheers,

diego


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Diego Schulz <dschulz(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-27 11:15:08
Message-ID: AANLkTimNQ+9821c---rX9z4w6VbyQ8UvCCMipeUri4et@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 26, 2010 at 4:30 PM, Diego Schulz <dschulz(at)gmail(dot)com> wrote:
> On Tue, Oct 26, 2010 at 2:18 PM, Ozz Nixon <ozznixon(at)gmail(dot)com> wrote:
>> I am the only user on this system right now, and one table select count(*) took over 20 minutes:
>>
>> wikitags exists and has 58,988,656 records.
>>
>> Structure (in pascal) is:
>>
>>   quer.SQL.Add('create table '+DBTags+' (');
>>   quer.SQL.Add('   pagename '+SQL_TITLE+'(100) not null,');
>>   quer.SQL.Add('   tagword '+SQL_TITLE+'(15) not null,');
>>   quer.SQL.Add('   soundex2 '+SQL_TITLE+'(4) not null,');
>>   quer.SQL.Add('   metaphone '+SQL_TITLE+'(15) not null,');
>>   quer.SQL.Add('   metaphone2 '+SQL_TITLE+'(22) not null,');
>>   quer.SQL.Add('   carverphone '+SQL_TITLE+'(22) not null,');
>>   quer.SQL.Add('   instances '+SQL_INT32+' not null,');
>>   if SQL_NAME_PRIMARY_KEYS then quer.SQL.Add('   constraint '+DBTags+'_PK');
>>   quer.SQL.Add('   primary key(pagename, tagword, instances)');
>>   quer.SQL.Add(')');
>>
>> where SQL_TITLE = 'varchar', SQL_IN32 = 'int'
>>
>> I have hung off indexes for each column, to resolve my previous "performance" issue from 3+ weeks ago. However, COUNT() is still dog slow - this table is a write once, read many... *never* update, nor delete.
>>
>> Any suggestions?
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> You could try something like what's suggested in this blog post:
> http://jakub.fedyczak.net/post/26
>
> I didn't actually tried it, but I think it should work ok.

Before you try that, you should ask yourself if you really need a 100%
accurate count. A reasonable approximation is maintained via the
stats system (pg_class.reltuples) that will often do and is free.

merlin


From: Ozz Nixon <ozznixon(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Diego Schulz <dschulz(at)gmail(dot)com>, PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-28 15:36:21
Message-ID: B6CED44A-2607-4AD1-8172-024A649D9353@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

How/where do I query this?

My script does not need a 100% accurate count - just a recently valid count - so I can verify the web crawlers are still crawling :-)

On Oct 27, 2010, at 7:15 AM, Merlin Moncure wrote:

> pg_class.reltuples


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Ozz Nixon <ozznixon(at)gmail(dot)com>
Cc: Diego Schulz <dschulz(at)gmail(dot)com>, PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-10-28 16:04:50
Message-ID: AANLkTin+XdaRAd+D-3D3Uvj5Kd0gr8TfYn2oTfOHcpiT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Oct 28, 2010 at 11:36 AM, Ozz Nixon <ozznixon(at)gmail(dot)com> wrote:
> How/where do I query this?
>
> My script does not need a 100% accurate count - just a recently valid count - so I can verify the web crawlers are still crawling :-)

you can do this:
select reltuples from pg_class where relname = 'your_table' and relkind = 'r';

that will give you accurate count as of the last analyze, which is
going to be driven by table usage and/or manual analyze. Probably
much better in your particular case is to do this:

select * from pg_stat_all_tables where relname = 'your_table';

and look at the n_tup_ins, del, etc. and make sure they are changing
(those numbers are reset when server resets, fyi).

merlin


From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Why Select Count(*) from table - took over 20 minutes?
Date: 2010-11-03 10:15:23
Message-ID: iarcnr$sp4$1@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 2010-10-26, John R Pierce <pierce(at)hogranch(dot)com> wrote:
>
> count(*) has to read the whole table to get the accurate count. The
> reason for this is that different clients can see different versions of
> that table, for instance, if client A is already in a transaction, and
> client B then does an INSERT, the two clients will see different values
> for the count.

They may or may not. the default transaction isolation level "read commited"
allows a session to see most changes that were committed externally after the
start of the transaction.

Tlso the index may include deleted rows.
which is another reason count(*) does a table scan.

--
ɹǝpun uʍop ɯoɹɟ sƃuıʇǝǝɹ⅁