Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jan Lentfer <Jan(dot)Lentfer(at)web(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date: 2014-12-08 02:03:49
Message-ID: 4205E661176A124FAF891E0A6BA913526638CA9E@szxeml509-mbs.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06 December 2014 20:01 Amit Kapila Wrote

>I wanted to understand what exactly the above loop is doing.

>a.
>first of all the comment on top of it says "Some of the slot
>are free, ...", if some slot is free, then why do you want
>to process the results? (Do you mean to say that *None* of
>the slot is free....?)

This comment is wrong, I will remove this.

>b.
>IIUC, you have called function select_loop(maxFd, &slotset)
>to check if socket descriptor is readable, if yes then why
>in do..while loop the same maxFd is checked always, don't
>you want to check different socket descriptors? I am not sure
>if I am missing something here

select_loop(maxFd, &slotset)

maxFd is the max descriptor among all SETS, and slotset contains all the descriptor, so if any of the descriptor get some message select_loop will come out, and once select loop come out,
we need to check how many descriptor have got the message from server so we loop and process the results.

So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete,
and if none of the socket is still free (that we check in below for loop), then go to select_loop again.

>c.
>After checking the socket descriptor for maxFd why you want
>to run run the below for loop for all slots?
>for (i = 0; i < max_slot; i++)
After Select loop is out, it’s possible that we might have got result on multiple connections, so consume input and check if still busy, then nothing to do, but if finished process the result and mark the connection free.
And if any of the connection is free, then we will break the do..while loop.

From: Amit Kapila [mailto:amit(dot)kapila16(at)gmail(dot)com]
Sent: 06 December 2014 20:01
To: Dilip kumar
Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com<mailto:dilip(dot)kumar(at)huawei(dot)com>> wrote:
>
> On 24 November 2014 11:29, Amit Kapila Wrote,
>

I have verified that all previous comments are addressed and
the new version is much better than previous version.

>
> here we are setting each target once and doing for all the tables..
>
Hmm, theoretically I think new behaviour could lead to more I/O in
certain cases as compare to existing behaviour. The reason for more I/O
is that in the new behaviour, while doing Analyze for a particular table at
different targets, in-between it has Analyze of different table as well,
so the pages in shared buffers or OS cache for a particular table needs to
be reloded again for a new target whereas currently it will do all stages
of Analyze for a particular table in one-go which means that each stage
of Analyze could get benefit from the pages of a table loaded by previous
stage. If you agree, then we should try to avoid this change in new
behaviour.

>
> Please provide you opinion.

I have few questions regarding function GetIdleSlot()

+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const
char *progname, bool completedb)
{
..
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd,
&slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+
* This can only happen if user has sent the cancel request using
+ *
Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+
GetQueryResult(pSlot[0].connection, dbname, progname,
+
completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+
for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock,
&slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+
if (PQisBusy(pSlot[i].connection))
+ continue;
+
+
pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname,
progname,
+ completedb))
+
return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+
}
+ }while(firstFree < 0);
}

I wanted to understand what exactly the above loop is doing.

a.
first of all the comment on top of it says "Some of the slot
are free, ...", if some slot is free, then why do you want
to process the results? (Do you mean to say that *None* of
the slot is free....?)

b.
IIUC, you have called function select_loop(maxFd, &slotset)
to check if socket descriptor is readable, if yes then why
in do..while loop the same maxFd is checked always, don't
you want to check different socket descriptors? I am not sure
if I am missing something here

c.
After checking the socket descriptor for maxFd why you want
to run run the below for loop for all slots?
for (i = 0; i < max_slot; i++)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com<http://www.enterprisedb.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2014-12-08 02:15:06 Re: inherit support for foreign tables
Previous Message Michael Paquier 2014-12-08 01:25:10 Re: Fractions in GUC variables