Re: pg_background (and more parallelism infrastructure patches)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_background (and more parallelism infrastructure patches)
Date: 2014-10-29 19:28:39
Message-ID: 20141029192839.GG17724@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-22 19:03:28 -0400, Robert Haas wrote:
> On Wed, Oct 8, 2014 at 6:32 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I got to ask: Why is it helpful that we have this in contrib? I have a
> > good share of blame to bear for that, but I think we need to stop
> > dilluting contrib evermore with test programs. These have a place, but I
> > don't think it should be contrib.
>
> I don't think pg_background is merely a test program: I think it's a
> quite useful piece of functionality. It can be used for running
> VACUUM from a procedure, which is something people have asked for more
> than once, or for simulating an autonomous transaction. Granted,
> it'll be a lot slower than a real autonomous transaction, but it's
> still better than doing it via dblink, because you don't have to futz
> with authentication.

Fair enough.

> > Hm. So every user can do this once the extension is created as the
> > functions are most likely to be PUBLIC. Is this a good idea?
>
> Why not? If they can log in, they could start separate sessions with
> similar effect.

Connection limits and revoked connection permissions are possible
reasons.

> > I'm unsure right now about the rules surrounding this, but shouldn't we
> > check that the user is allowed to execute these? And shouldn't we fall
> > back to non binary functions if no binary ones are available?
>
> I can't see any reason to do either of those things. I'm not aware
> that returning data in binary format is in any way intended to be a
> security-restricted operation, or that we have any data types that
> actually matter without send and receive functions. If we do, I think
> the solution is to add them, not make this more complicated.

We do have a couple of types that don't have binary send/recv
functions. And there are many more out there. I don't think we can make
that a hard prerequisite.

> >> + /*
> >> + * Limit the maximum error level to ERROR. We don't want
> >> + * a FATAL inside the background worker to kill the user
> >> + * session.
> >> + */
> >> + if (edata.elevel > ERROR)
> >> + edata.elevel = ERROR;
> >
> > Hm. But we still should report that it FATALed? Maybe add error context
> > notice about it? Not nice, but I don't have a immediately better idea. I
> > think it generally would be rather helpful to add the information that
> > this wasn't originally an error triggered by this process. The user
> > might otherwise be confused when looking for the origin of the error in
> > the log.
>
> Yeah, I was wondering if we needed some kind of annotation here. What
> I'm wondering about is appending something to the errcontext, perhaps
> "background worker, PID %d".

Can't we just add another error context? That seems cleaner to me. It
should be sufficient to push something onto the relevant stack.

> >> + case 'A':
> >> + {
> >> + /* Propagate NotifyResponse. */
> >> + pq_putmessage(msg.data[0], &msg.data[1], nbytes - 1);
> >> + break;
> >
> > Hm. Are we sure to be in a situation where the client expects these? And
> > are we sure their encoding is correct? The other data goe through
> > input/output methods checking for that, but here we don't. And the other
> > side AFAICS could have done a SET client_encoding.
>
> I think there's no problem at the protocol level; I think the server
> can send NotifyResponse pretty much whenever.

I think the server not having done so for pretty much forever will still
confuse clients.

> It could be argued that this is a POLA violation, but dropping the
> notify messages on the floor (which seems to be the other option)
> doesn't seem like superior. So I think this is mostly a matter of
> documentation.

Maybe.

> >> +Datum
> >> +pg_background_detach(PG_FUNCTION_ARGS)
> >> +{
> >> + int32 pid = PG_GETARG_INT32(0);
> >> + pg_background_worker_info *info;
> >> +
> >> + info = find_worker_info(pid);
> >> + if (info == NULL)
> >> + ereport(ERROR,
> >> + (errcode(ERRCODE_UNDEFINED_OBJECT),
> >> + errmsg("PID %d is not attached to this session", pid)));
> >> + dsm_detach(info->seg);
> >> +
> >> + PG_RETURN_VOID();
> >> +}
> >
> > So there 's really no limit of who is allowed to do stuff like
> > this. I.e. any SECURITY DEFINER and such may do the same.
>
> Do you think we need a restriction? It's not obvious to me that there
> are any security-relevant consequences to this, but it's an important
> question, and I might be missing something.

Well, a security definer function might have started the worker. And
then the plain user kills it. Or the other way round.

> >> + /* Establish signal handlers. */
> >> + pqsignal(SIGTERM, handle_sigterm);
> >
> > Hm. No SIGINT?
>
> Nope; bgworker.c sets it to StatementCancelHandler, which should be
> fine.

Ah, right.

> >> + /* Find data structures in dynamic shared memory. */
> >> + fdata = shm_toc_lookup(toc, PG_BACKGROUND_KEY_FIXED_DATA);
> >> + sql = shm_toc_lookup(toc, PG_BACKGROUND_KEY_SQL);
> >> + gucstate = shm_toc_lookup(toc, PG_BACKGROUND_KEY_GUC);
> >> + mq = shm_toc_lookup(toc, PG_BACKGROUND_KEY_QUEUE);
> >> + shm_mq_set_sender(mq, MyProc);
> >> + responseq = shm_mq_attach(mq, seg, NULL);
> >
> > Don't these need to ensure that values have been found? shm_toc_lookup
> > returns NULL for unknown itmes and such and such?
>
> Meh. The launching process would have errored out if it hadn't been
> able to set up the segment correctly. We could add some Assert()
> statements if you really feel strongly about it, but it seems fairly
> pointless to me. Any situation where those pointers come back NULL is
> presumably going to be some sort of really stupid bug that will be
> found even by trivial testing.

If we write code like this we really should have made it an error to
look something up that doesn't exist in the toc. But whatever.

> >> + /* Restore GUC values from launching backend. */
> >> + StartTransactionCommand();
> >> + RestoreGUCState(gucstate);
> >> + CommitTransactionCommand();
> >
> > I haven't read the guc save patch, but is it a) required to this in a
> > transaction? We normally reload the config even without. b) correct to
> > do? What's with SET LOCAL variables?
>
> (a) Yeah, it doesn't work without that. I forget what breaks, but if
> you taking those out, it will blow up.

That seems bad. Because we actually parse gucs without a surrounding
transaction in some cases. But maybe it's just NOT_IN_FILE (or however
they're called) variables that are problematic?

Btw, how are we dealing with connection gucs?

> (b) Do those need special handling for some reason?

Well, the CommitTransactionCommand() will reset them (via AtEOXact_GUC),
no? Or am I missing something?

> >> + /* Post-execution cleanup. */
> >> + disable_timeout(STATEMENT_TIMEOUT, false);
> >> + CommitTransactionCommand();
> >
> > So, we're allowed to do nearly arbitrary nastyness here...
>
> Can you be more specific about the nature of your concern? This is no
> different than finish_xact_command().

What I was thinking about was that the background process can do stuff
that writes.

> >> + /*
> >> + * Parse the SQL string into a list of raw parse trees.
> >> + *
> >> + * Because we allow statements that perform internal transaction control,
> >> + * we can't do this in TopTransactionContext; the parse trees might get
> >> + * blown away before we're done executing them.
> >> + */
> >> + parsecontext = AllocSetContextCreate(TopMemoryContext,
> >> + "pg_background parse/plan",
> >> + ALLOCSET_DEFAULT_MINSIZE,
> >> + ALLOCSET_DEFAULT_INITSIZE,
> >> + ALLOCSET_DEFAULT_MAXSIZE);
> >
> > Not that it hugely matters, but shouldn't this rather be
> > TopTransactionContext?
>
> No, for the reasons explained in the command that you quoted right, uh, there.

Man. You expect me to read comments? .oO(hide)

> >> + bool snapshot_set = false;
> >> + Portal portal;
> >> + DestReceiver *receiver;
> >> + int16 format = 1;
> >> +
> >> + /*
> >> + * We don't allow transaction-control commands like COMMIT and ABORT
> >> + * here. The entire SQL statement is executed as a single transaction
> >> + * which commits if no errors are encountered.
> >> + */
> >> + if (IsA(parsetree, TransactionStmt))
> >> + ereport(ERROR,
> >> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >> + errmsg("transaction control statements are not allowed in pg_background")));
> >
> > Hm. I don't think that goes far enough. This allows commands that
> > internally stop/start transactions like CREATE INDEX CONCURRETNLY. Are
> > you sure that's working right now?
>
> I tested VACUUM not of a specific table and that seems to work
> correctly. I could try CIC, but I think the issues are the same. If
> we only get one parse tree, then isTopLevel will be true and it's safe
> for commands to do their own transaction control. If we get multiple
> parse trees, then PreventTransactionChain will do its thing. This is
> not novel territory.

It's not novel, but damned easy to get wrong.

> > Hm. This is a fair amount of code copied from postgres.c.
>
> Yes. I'm open to suggestions, but I don't immediately see a better way.

I don't immediately see something better either. But I definitely
dislike the amount of code it duplicates.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-29 19:31:50 Re: Directory/File Access Permissions for COPY and Generic File Access Functions
Previous Message Asif Naeem 2014-10-29 19:27:35 Re: Add shutdown_at_recovery_target option to recovery.conf