Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Stanislaw Pankevich <s(dot)pankevich(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Date: 2012-07-18 14:33:42
Message-ID: CAHyXU0xH34ma+=J5CXyDsEqec6PeOeDFwsrBxtu_cRziEgvpfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jul 3, 2012 at 10:22 AM, Stanislaw Pankevich
<s(dot)pankevich(at)gmail(dot)com> wrote:
> Hello,
>
> My question below is almost exact copy of the on on SO:
> http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way
>
> The post on SO caused a few answers, all as one stating "DO ONLY TRUNCATION
> - this is the fast".
>
> Also I think I've met some amount of misunderstanding of what exactly do I
> want. I would appreciate it great, if you try, as people whom I may trust in
> performance question.
>
> Here goes the SO subject, formulating exact task I want to accomplish, this
> procedure is intended to be run beetween after or before each test, ensure
> database is cleaned enough and has reset unique identifiers column (User.id
> of the first User should be nor the number left from previous test in a test
> suite but 1). Here goes the message:
>
> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the
> fastest way to clean each non-empty table and reset unique identifier column
> of empty ones ====
>
> I wonder, what is the fastest way to accomplish this kind of task in
> PostgreSQL. I am interested in the fastest solutions ever possible.
>
> I found myself such kind of solution for MySQL, it performs much faster than
> just truncation of tables one by one. But anyway, I am interested in the
> fastest solutions for MySQL too. See my result here, of course it it for
> MySQL only: https://github.com/bmabey/database_cleaner/issues/126
>
> I have following assumptions:
>
> I have 30-100 tables. Let them be 30.
>
> Half of the tables are empty.
>
> Each non-empty table has, say, no more than 100 rows. By this I mean,
> tables are NOT large.
>
> I need an optional possibility to exclude 2 or 5 or N tables from this
> procedure.
>
> I cannot! use transactions.
>
> I need the fastest cleaning strategy for such case working on PostgreSQL
> both 8 and 9.
>
> I see the following approaches:
>
> 1) Truncate each table. It is too slow, I think, especially for empty
> tables.
>
> 2) Check each table for emptiness by more faster method, and then if it is
> empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL)
> to initial state (1), i.e to restore its last_value from sequence (the same
> AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
>
> I use Ruby code to iterate through all tables, calling code below on each of
> them, I tried to setup SQL code running against each table like:
>
> DO $$DECLARE r record;
> BEGIN
> somehow_captured = SELECT last_value from #{table}_id_seq
> IF (somehow_captured == 1) THEN
> == restore initial unique identifier column value here ==
> END
>
> IF (somehow_captured > 1) THEN
> TRUNCATE TABLE #{table};
> END IF;
> END$$;

This didn't work because you can't use variables for table names in
non-dynamic (that is, executed as a string) statements. You'd probably
want:

EXECUTE 'TRUNCATE TABLE ' || #{table};

As to performance, TRUNCATE in postgres (just like mysql) has the nice
property that the speed of truncation is mostly not dependent on table
size: truncating a table with 100 records is not very much faster than
truncating a table with millions of records. For very small tables,
it might be faster to simply fire off a delete.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Nick Hofstede 2012-07-18 16:10:30 optimizing queries using IN and EXISTS
Previous Message Campbell, Lance 2012-07-18 14:27:16 monitoring suggestions