Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jan Lentfer <Jan(dot)Lentfer(at)web(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date: 2014-07-31 05:29:27
Message-ID: CAA4eK1+kFar=0M2Hp5FjXKRvLeCFHvuc_aPXwF3dqn1-R2a2ig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 18, 2014 at 10:22 AM, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
wrote:
> On 16 July 2014 12:13, Magnus Hagander Wrote,
> >Yeah, those are exactly my points. I think it would be significantly
simpler to do it that way, rather than forking and threading. And also
easier to make portable...
>
> >(and as a optimization on Alvaros suggestion, you can of course reuse
the initial connection as one of the workers as long as you got the full
list of tasks from it up front, which I think you do anyway in order to
sorting of tasks...)
>
> I have modified the patch as per the suggestion,
>
> Now in beginning we create all connections, and first connection we use
for getting table list in beginning, After that all connections will be
involved in vacuum task.
>
> Please have a look and provide your opinion…

1.
+ connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot));
+ for (i = 0; i < parallel; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
Here it seems to me that you are opening connections before
getting or checking tables list, so in case you have lesser
number of tables, won't the extra connections be always idle.
Simple case to erify the same is with below example

vacuumdb -t t1 -d postgres -j 4

2.
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);

Here it is just trying to get the list of relations, however
Vacuum command processes materialized views as well, so
I think here the list should include materialized views as well
unless you have any specific reason for not including those.

3. In function vacuum_parallel(), if user has not provided list of tables,
then it is retrieving all the tables in database and then in
run_parallel_vacuum(),
it tries to do Vacuum for each of table using Async mechanism, now
consider a case when after getting list if any table is dropped by user
from some other session, then patch will error out. However without patch
or Vacuum command will internally ignore such a case and complete
the Vacuum for other tables. Don't you think the patch should maintain
the existing behaviour?

4.
+ <term><option>-j <replaceable
class="parameter">jobs</replaceable></></term>
+ Number of parallel process to perform the operation.

Change this description as per new implementation. Also I think
there is a need of some explanation for this new option.

5.
It seems there is no change in below function decalration:
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);

6.
+ printf(_(" -j, --jobs=NUM use this many parallel jobs
to vacuum\n"));
Change the description as per new implementation.

7.
/* This will give the free connection slot, if no slot is free it will
wait for atleast one slot to get free.*/

Multiline comments should be written like (refer other places)
/*
* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
Kindly correct at other places if similar instance exist in patch.

8.
Isn't it a good idea to check performance of this new patch
especially for some worst cases like when there is not much
to vacuum in the tables inside a database. The reason I wanted
to check is that because with new algorithm (for a vacuum of database,
now it will get the list of tables and perform vacuum on individual
tables) we have to repeat certain actions in server side like
allocation/deallocataion of context, sending stats which would have
been otherwise done once.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-07-31 05:59:45 Re: Production block comparison facility
Previous Message Bruce Momjian 2014-07-31 03:05:52 Re: 9.4 docs current as of