Re: pg_dump refactor patch to remove global variables

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump refactor patch to remove global variables
Date: 2014-09-05 09:19:00
Message-ID: CACw0+10C0FcCBJYqTPPTrJiNNKqNsSfXfHyMxHjQ5V4NCGe05w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 30, 2014 at 11:12 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> The general idea of this patch is not disputed, I think.

Ok, that's good news.

> - Why is quote_all_identifiers left behind as a global variable?

As I said, it's really used everywhere. I'm not opposed to passing it
around (which would be straightforward) but it would really blow up
the patch size and it would be a rather mechanical patch. I'd rather
do that as a separate patch, otherwise all other changes would get
lost in the noise.

> - Shouldn't pg_dumpall also use DumpOptions?

It could, it would mostly be a cosmetic change, since pg_dumpall is
only a wrapper that invokes the pg_dump command. pg_dumpall's argument
handling is isolated from the rest, so it could use DumpOptions or
not...

> - What about binary_upgrade?

I thought of the binary upgrade more as a different environment that
the program is run in (as opposed to a traditional user switch) so I
left it aside, but it turns out that it doesn't require that much more
code to support it and definitely it enables more code refactoring if
it's treated like the other flags, so the new patch also removes the
global binary_upgrade.

> - I think some of the variables in pg_dump's main() don't need to be
> moved into DumpOptions. For example, compressLevel is only looked at
> once in main() and then forgotten. We don't need to pass that around
> everywhere. The same goes for dumpencoding and possibly others.

That's partly true, there are two groups of variables that we could
remove from the DumpOptions, one is the group of options that are only
used in the huge main() of pg_dump.c and the second group is variables
that go one function further down into setup_connection().
dumpencoding for example goes down into setup_connection(). So do
use_role, no_synchronized_snapshots, serializable_deferrable and
numWorkers.

Previously dumpencoding and use_role were passed as additional
arguments to setup_connection(), being NULL when there was no change
to the encoding (which I found quite ugly). I would say we should
treat all variables of that group the same, so either all of them
become local variables in main() and are passed as parameters to
setup_connection() or we just pass the DumpOptions struct and put them
in there (as is done in my patch now)... Or we create a new struct
ConnectionOpts or so...

Then there's another group of options that is used only in main() but
is then used to populate the RestoreOptions struct. compressLevel is
one of them.

The thing about creating all those different groups is that it makes
refactoring a bit harder. In the end we would group variables not by
their meaning and purpose but by their usage pattern in the code.

> - The forward declaration of struct _dumpOptions in pg_backup.h seems
> kind of useless. You could move some things around so that that's not
> necessary.

Agreed, fixed.

> - NewDumpOptions() and NewRestoreOptions() are both in
> pg_backup_archiver.c, but NewRestoreOptions() is in pg_backup.h whereas
> NewDumpOptions() is being put into pg_backup_archiver.h. None of that
> makes too much sense, but it could be made more consistent.

I would have created the prototype in the respective header of the .c
file that holds the function definition, but that's a bit fuzzy around
pg_dump. I have now moved the DumpOptions over to pg_backup.h, because
pg_backup_archiver.h includes pg_backup.h so that makes pg_backup.h
the lower header.

> - In dumpOptionsFromRestoreOptions() you are building the return value
> in a local variable that is not zeroed. You should probably use
> NewDumpOptions() there.

The idea was to avoid malloc()ing and free()ing and to instead just
create those dumpOptions on the stack, because they're only used for a
short time and a small chunk of data, but I assume it's more
consistent to do it as you propose with NewDumpOptions(). So this is
fixed in the new patch.

> Also, looking at dumpOptionsFromRestoreOptions() and related code makes
> me think that these should perhaps really be the same structure. Have
> you investigated that?

I have (as mentioned in my original email about this patch), but that
would be a much larger change than what I had envisioned.

New patch attached.

Joachim

Attachment Content-Type Size
pg_dump_refactor_globals.3.diff text/plain 127.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2014-09-05 09:20:18 Re: Allowing implicit 'text' -> xml|json|jsonb
Previous Message Atri Sharma 2014-09-05 09:18:45 Re: Join push-down support for foreign tables