Re: Index speeds up one row table (why)?

Lists: pgsql-bugspgsql-performance
From: Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-05-31 22:56:56
Message-ID: 3ED93338.80806@dave.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Tom Lane Writes:

>Bruno Wolff III <bruno(at)wolff(dot)to> writes:
>> It probably has one visible row in it. If it can changed a lot, there
>> may be lots of deleted tuples in a row. That would explain why an
>> index scan speeds things up.

>Right, every UPDATE on unique_ids generates a dead row, and a seqscan
>has no alternative but to wade through them all. When a unique index is
>present, the indexscan code knows that after it's fetched one live tuple
...
>More-frequent vacuums would be a much more reliable solution,

The index I created wasn't unique (though it should have been), but
perhaps much of the same reasoning still applies.

Also, I could have swore I tried a vacuum, and it didn't make a
difference, although experimenting just now, it did. The data collection
rate is considerably slower at the moment though, so perhaps last time
the table simply quickly got "inefficient" very quickly again
during/immediately after the vacuum (or I wasn't where I thought I was
when I vacuumed). I'll have to experiment with this a bit more, when the
data generation is high again.

(ok, experimented a bit more just now)
Hm, it appears that degredation occurs with the index as well, I guess
at the time I created the index, it just initially did better because it
got to skip all the already dead rows at creation time: but this is
disturbing, I do a vacuum, and the access times are better, but still
horrible:

explain analyze select next_id from bigint_unique_ids where
table_name='CONNECTION_DATA';

Index Scan using bigint_unique_ids__table_name on bigint_unique_ids
(cost=0.00..8.01 rows=1 width=8) (actual time=13.77..844.14 rows=1 loops=1)
Index Cond: (table_name = 'CONNECTION_DATA'::text)
Total runtime: 844.36 msec
(3 rows)

vacuum; -- takes about 10 minutes
VACUUM

explain analyze select next_id from bigint_unique_ids where
table_name='CONNECTION_DATA';
Index Scan using bigint_unique_ids__table_name on bigint_unique_ids
(cost=0.00..84.01 rows=1 width=8) (actual time=0.17..99.94 rows=1 loops=1)
Index Cond: (table_name = 'CONNECTION_DATA'::text)
Total runtime: 100.09 msec

vacuum; --takes about 2 minutes
Index Scan using bigint_unique_ids__table_name on bigint_unique_ids
(cost=0.00..179.01 rows=1 width=8) (actual time=0.45..219.05 rows=1 loops=1)
Index Cond: (table_name = 'CONNECTION_DATA'::text)
Total runtime: 219.20 msec

--ACK, worse, ran twice more, got 212.5 ms, and 394.39

vacuum bigint_unique_ids; -- try specific table only, takes about 5 seconds
Index Scan using bigint_unique_ids__table_name on bigint_unique_ids
(cost=0.00..163.01 rows=1 width=8) (actual time=0.23..143.59 rows=1 loops=1)
Index Cond: (table_name = 'CONNECTION_DATA'::text)
Total runtime: 143.72 msec

vacuum full bigint_unique_ids; -- try full, takes about 3 seconds.
Seq Scan on bigint_unique_ids (cost=0.00..1.02 rows=1 width=8) (actual
time=0.10..0.10 rows=1 loops=1)
Filter: (table_name = 'CONNECTION_DATA'::text)
Total runtime: 0.25 msec

-- ah, much much much, better.

So apparently vacuum by itself isn't going to be sufficent, i'm going to
need vacuum fulls? Or if I do vacuum's often enough (that should allow
old rows to be overwritten?) will that do it? I'm a bit hazy on why
vacuum isn't doing just as well as vacuum full, I thought the only
difference was that full released space back to the operating system
(and presumably defragments existing data, but for one row, this
shouldn't matter?).

wait several minutes:

Seq Scan on bigint_unique_ids (cost=0.00..39.01 rows=1 width=8) (actual
time=2.97..2.98 rows=1 loops=1)
Filter: (table_name = 'CONNECTION_DATA'::text)
Total runtime: 3.13 msec
reindex index bigint_unique_ids__table_name;
REINDEX

Index Scan using bigint_unique_ids__table_name on bigint_unique_ids
(cost=0.00..5.97 rows=1 width=8) (actual time=0.11..0.20 rows=1 loops=1)
Index Cond: (table_name = 'CONNECTION_DATA'::text)
Total runtime: 0.30 msec

It appears reindex has the same speed up effect. (and in this case made
it switch back from seq_scan to index scan).

Let me throw in this too, if its helpful:

vacuum verbose bigint_unique_ids;
INFO: --Relation public.bigint_unique_ids--
INFO: Index bigint_unique_ids__table_name: Pages 29; Tuples 1: Deleted
5354.
CPU 0.01s/0.04u sec elapsed 0.05 sec.
INFO: Removed 11348 tuples in 79 pages.
CPU 0.00s/0.02u sec elapsed 0.02 sec.
INFO: Pages 79: Changed 1, Empty 0; Tup 1: Vac 11348, Keep 0, UnUsed 0.
Total CPU 0.03s/0.06u sec elapsed 0.14 sec.
INFO: --Relation pg_toast.pg_toast_21592--
INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

vacuum full verbose bigint_unique_ids;
INFO: --Relation public.bigint_unique_ids--
INFO: Pages 79: Changed 1, reaped 79, Empty 0, New 0; Tup 1: Vac 297,
Keep/VTL 0/0, UnUsed 11157, MinLen 52, MaxLen 52; Re-using: Free/Avail.
Space 599716/22724; EndEmpty/Avail. Pages 76/3.
CPU 0.01s/0.00u sec elapsed 0.01 sec.
INFO: Index bigint_unique_ids__table_name: Pages 29; Tuples 1: Deleted 297.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Rel bigint_unique_ids: Pages: 79 --> 1; Tuple(s) moved: 1.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO: Index bigint_unique_ids__table_name: Pages 29; Tuples 1: Deleted 1.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: --Relation pg_toast.pg_toast_21592--
INFO: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space
0/0; EndEmpty/Avail. Pages 0/0.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Index pg_toast_21592_index: Pages 1; Tuples 0.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
VACUUM


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-06-01 02:29:33
Message-ID: 1054434573.11968.42.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

> vacuum verbose bigint_unique_ids;
> INFO: --Relation public.bigint_unique_ids--
> INFO: Index bigint_unique_ids__table_name: Pages 29; Tuples 1: Deleted
> 5354.
> CPU 0.01s/0.04u sec elapsed 0.05 sec.
> INFO: Removed 11348 tuples in 79 pages.
> CPU 0.00s/0.02u sec elapsed 0.02 sec.
> INFO: Pages 79: Changed 1, Empty 0; Tup 1: Vac 11348, Keep 0, UnUsed 0.
> Total CPU 0.03s/0.06u sec elapsed 0.14 sec.

Vacuum (regular, not full) frequently enough that the 'Pages' value
doesn't increase past 1 and you'll be fine. A sequential scan on a very
small table is what you want to have.

In this particular case, vacuum removed over 11000 dead versions of the
tuple.

An 8 k page will hold approx 140 tuples based on your structure. So,
for every ~100 updates you'll want to run vacuum (regular, not full) on
the table.

--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-06-01 05:33:39
Message-ID: 20030601053339.GB16259@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On Sat, May 31, 2003 at 16:56:56 -0600,
Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to> wrote:
>
> (ok, experimented a bit more just now)
> Hm, it appears that degredation occurs with the index as well, I guess
> at the time I created the index, it just initially did better because it
> got to skip all the already dead rows at creation time: but this is
> disturbing, I do a vacuum, and the access times are better, but still
> horrible:

You really don't want to use an index, so this probably doesn't matter
for the current application. The problem is that when data is inserted
into an index that just increases (or decreases) in value space from
deleted entries doesn't get reused. I believe this is fixed in 7.4.
This case would apply to indexes based on counters, dates or times
where new values are added and old values get deleted.


From: Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-06-01 07:20:03
Message-ID: 3ED9A923.90500@dave.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Rod Taylor wrote:

>An 8 k page will hold approx 140 tuples based on your structure. So,
>for every ~100 updates you'll want to run vacuum (regular, not full) on
>the table

Alas, for this application, that means a vacuum once every 5 seconds or
so. I'll see if I can set up a separate little task to do that (I assume
at this rate, its better to just keep a connection open, than
setup/teardown). I don't suppose there is a way to get a trigger to do a
vacuum (which doesn't want to be in a transaction) (thinking it could
check for id mod 100=0 or something)? I also assume a few pages isn't
going to be that bad (just don't let it get to 11000 8).


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-06-01 12:14:15
Message-ID: 1054469655.11968.109.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

sending thread to -performance from -bugs

On Sun, 2003-06-01 at 03:20, Dave E Martin XXIII wrote:
> Rod Taylor wrote:
>
> >An 8 k page will hold approx 140 tuples based on your structure. So,
> >for every ~100 updates you'll want to run vacuum (regular, not full) on
> >the table
>
> Alas, for this application, that means a vacuum once every 5 seconds or
> so. I'll see if I can set up a separate little task to do that (I assume
> at this rate, its better to just keep a connection open, than
> setup/teardown). I don't suppose there is a way to get a trigger to do a
> vacuum (which doesn't want to be in a transaction) (thinking it could
> check for id mod 100=0 or something)? I also assume a few pages isn't
> going to be that bad (just don't let it get to 11000 8).

Sorry... Vacuum cannot be triggered -- nor would you want it to be.
There really isn't anything wrong with vacuuming once every 5 seconds or
so, as it'll take a very short time if there is only a page or so to
deal with.

Setup a script to connect, issue a vacuum, count to 5, issue a vacuum,
count to 5, etc.

More than one page and you will want an index. Having an index is only
going to slow things down in the long run as indexes will not shrink
with a vacuum in 7.3 (7.4 puts an effort towards correcting this). This
means you'll be running REINDEX every couple of minutes, which of course
locks the table, where standard vacuum does not.

--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-06-01 19:04:09
Message-ID: 20030601190409.GA7853@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On Sun, Jun 01, 2003 at 01:20:03 -0600,
Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to> wrote:
> Rod Taylor wrote:
>
> >An 8 k page will hold approx 140 tuples based on your structure. So,
> >for every ~100 updates you'll want to run vacuum (regular, not full) on
> >the table
>
> Alas, for this application, that means a vacuum once every 5 seconds or
> so. I'll see if I can set up a separate little task to do that (I assume
> at this rate, its better to just keep a connection open, than
> setup/teardown). I don't suppose there is a way to get a trigger to do a
> vacuum (which doesn't want to be in a transaction) (thinking it could
> check for id mod 100=0 or something)? I also assume a few pages isn't
> going to be that bad (just don't let it get to 11000 8).

Maybe you should reconsider how badly you want the app to be totally database
agnostic? Using a sequence might be less of a contortion than using vacuum
a few times a minute. You are likely to have similar performance issues
with other databases, so this section of code may not turn out to be very
portable in any case.


From: Dave E Martin XXIII <postgresql-to(dot)dave(at)dave(dot)to>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-06-02 07:14:30
Message-ID: 3EDAF956.9010307@dave.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Bruno Wolff III wrote:

>Maybe you should reconsider how badly you want the app to be totally database
>agnostic? Using a sequence might be less of a contortion than using vacuum
>a few times a minute. You are likely to have similar performance issues
>with other databases, so this section of code may not turn out to be very
>portable in any case.
>
>
Maybe I can further abstract out the generate unique-id portion, Since
unique-id generation does seem to be a pretty common database extension
(for some reason...), and then provide a generic schema definition, and
a postgresql specific one (along with whatever others I can drum up).
The generic one will rely on the software to come up with the unique id
in the fashion I'm currently doing.

Speaking of which, is there a better way than what i'm currently doing
(when the database doesn't have any such support)? I've heard of one
method based on something like "select max(id)+1 from table" but this
seems error prone, at the very least, you'd have to have a unique index,
and be prepared to redo on failure, which could get messy if its a big
transaction, and frequent if there is a lot of concurrent inserting
going on.


From: Peter Childs <blue(dot)dragon(at)blueyonder(dot)co(dot)uk>
To:
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Index speeds up one row table (why)?
Date: 2003-06-02 07:51:49
Message-ID: Pine.LNX.4.44.0306020843500.25203-100000@RedDragon.Childs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On Mon, 2 Jun 2003, Dave E Martin XXIII wrote:

> Bruno Wolff III wrote:
>
> >Maybe you should reconsider how badly you want the app to be totally database
> >agnostic? Using a sequence might be less of a contortion than using vacuum
> >a few times a minute. You are likely to have similar performance issues
> >with other databases, so this section of code may not turn out to be very
> >portable in any case.
> >
> >
> Maybe I can further abstract out the generate unique-id portion, Since
> unique-id generation does seem to be a pretty common database extension
> (for some reason...), and then provide a generic schema definition, and
> a postgresql specific one (along with whatever others I can drum up).
> The generic one will rely on the software to come up with the unique id
> in the fashion I'm currently doing.
>
> Speaking of which, is there a better way than what i'm currently doing
> (when the database doesn't have any such support)? I've heard of one
> method based on something like "select max(id)+1 from table" but this
> seems error prone, at the very least, you'd have to have a unique index,
> and be prepared to redo on failure, which could get messy if its a big
> transaction, and frequent if there is a lot of concurrent inserting
> going on.
>
For a generic solution you could have an extra table that fed you
ids and update it every time you took a value. (Maybe a trigger could be
used?) Due to table locking during transactions no two concurrent
requested would get the same answer. Implementation could be
interesting but relatively simple.

BEGIN;
SELECT id from unqid where name='seq_name';
UPDATE unqid set id=id+1 where name='seq_name';
COMMIT;

Peter Childs