Re: memory usage of pg_upgrade

Lists: pgsql-hackers
From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: memory usage of pg_upgrade
Date: 2013-09-09 22:20:54
Message-ID: CAMkU=1z3T7s7RpB37ZBuV9Qx=AaUNRc_-nVo1GkH7UUA7MwmoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
tablespace name of every object (table, toast table, index) in the
database being upgraded. This adds up pretty quickly when there is a
very large number of objects. It could be changed to char* to a
separately allocated name that takes only as much space it needs. But
maybe it would be better to point into os_info.old_tablespaces or
something like that, as surely there are not going to be one
independent file space per object.

typedef struct
{
...
char tablespace[MAXPGPATH];
} RelInfo;

The struct FileNameMap has 4 more .

Since there seems to be some interest in improving the scalability of
pg_upgrade, this is one of the things to consider fixing. What is the
best way to do it?

Cheers,

Jeff


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: memory usage of pg_upgrade
Date: 2013-09-09 22:39:39
Message-ID: 522E4E2B.2030209@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 09/09/2013 06:20 PM, Jeff Janes wrote:
> pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
> tablespace name of every object (table, toast table, index) in the
> database being upgraded. This adds up pretty quickly when there is a
> very large number of objects. It could be changed to char* to a
> separately allocated name that takes only as much space it needs. But
> maybe it would be better to point into os_info.old_tablespaces or
> something like that, as surely there are not going to be one
> independent file space per object.
>
>
> typedef struct
> {
> ...
> char tablespace[MAXPGPATH];
> } RelInfo;
>
> The struct FileNameMap has 4 more .
>
> Since there seems to be some interest in improving the scalability of
> pg_upgrade, this is one of the things to consider fixing. What is the
> best way to do it?

Send in a patch :-)

We recently ripped out some uses of statically sized strings in the
parallel code and replaced them with pointers to palloc'ed strings. So
there is good precedent for this. See
<https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179>

In the case of tablespaces, I should have thought you could keep a hash
table of the names and just store an entry id in the table structure.
But that's just my speculation without actually looking at the code, so
don't take my word for it :-)

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: memory usage of pg_upgrade
Date: 2013-09-09 23:39:00
Message-ID: 20130909233900.GC32173@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 9, 2013 at 06:39:39PM -0400, Andrew Dunstan wrote:
>
> On 09/09/2013 06:20 PM, Jeff Janes wrote:
> >pg_upgrade reserves 5 times MAXPGPATH, or 5120 characters, for the
> >tablespace name of every object (table, toast table, index) in the
> >database being upgraded. This adds up pretty quickly when there is a
> >very large number of objects. It could be changed to char* to a
> >separately allocated name that takes only as much space it needs. But
> >maybe it would be better to point into os_info.old_tablespaces or
> >something like that, as surely there are not going to be one
> >independent file space per object.
> >
> >
> >typedef struct
> >{
> > ...
> > char tablespace[MAXPGPATH];
> >} RelInfo;
> >
> >The struct FileNameMap has 4 more .
> >
> >Since there seems to be some interest in improving the scalability of
> >pg_upgrade, this is one of the things to consider fixing. What is the
> >best way to do it?
>
>
> Send in a patch :-)
>
> We recently ripped out some uses of statically sized strings in the
> parallel code and replaced them with pointers to palloc'ed strings.
> So there is good precedent for this. See <https://github.com/postgres/postgres/commit/910d3a458c15c1b4cc518ba480be2f712f42f179>
>
> In the case of tablespaces, I should have thought you could keep a
> hash table of the names and just store an entry id in the table
> structure. But that's just my speculation without actually looking
> at the code, so don't take my word for it :-)

Yes, please feel free to improve the code. I improved pg_upgrade CPU
usage for a lerge number of objects, but never thought to look at memory
usage. It would be a big win to just palloc/pfree the memory, rather
than allocate tones of memory. If you don't get to it, I will in a few
weeks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: memory usage of pg_upgrade
Date: 2014-02-04 02:14:10
Message-ID: 20140204021410.GA24552@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
> > In the case of tablespaces, I should have thought you could keep a
> > hash table of the names and just store an entry id in the table
> > structure. But that's just my speculation without actually looking
> > at the code, so don't take my word for it :-)
>
> Yes, please feel free to improve the code. I improved pg_upgrade CPU
> usage for a lerge number of objects, but never thought to look at memory
> usage. It would be a big win to just palloc/pfree the memory, rather
> than allocate tones of memory. If you don't get to it, I will in a few
> weeks.

Thanks you for pointing out this problem. I have researched the cause
and the major problem was that I was allocating the maximum path length
in a struct rather than allocating just the length I needed, and was not
reusing string pointers that I knew were not going to change.

The updated attached patch significantly decreases memory consumption:

tables orig patch % decrease
----
1 27,168 kB 27,168 kB 0
1k 46,136 kB 27,920 kB 39
2k 65,224 kB 28,796 kB 56
4k 103,276 kB 30,472 kB 70
8k 179,512 kB 33,900 kB 81
16k 331,860 kB 40,788 kB 88
32k 636,544 kB 54,572 kB 91
64k 1,245,920 kB 81,876 kB 93

As you can see, a database with 64k tables shows a 93% decrease in
memory use. I plan to apply this for PG 9.4.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachment Content-Type Size
pg_upgrade.diff text/x-diff 10.6 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: memory usage of pg_upgrade
Date: 2014-02-12 21:35:40
Message-ID: 20140212213540.GE12551@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 3, 2014 at 09:14:10PM -0500, Bruce Momjian wrote:
> On Mon, Sep 9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote:
> > > In the case of tablespaces, I should have thought you could keep a
> > > hash table of the names and just store an entry id in the table
> > > structure. But that's just my speculation without actually looking
> > > at the code, so don't take my word for it :-)
> >
> > Yes, please feel free to improve the code. I improved pg_upgrade CPU
> > usage for a lerge number of objects, but never thought to look at memory
> > usage. It would be a big win to just palloc/pfree the memory, rather
> > than allocate tones of memory. If you don't get to it, I will in a few
> > weeks.
>
> Thanks you for pointing out this problem. I have researched the cause
> and the major problem was that I was allocating the maximum path length
> in a struct rather than allocating just the length I needed, and was not
> reusing string pointers that I knew were not going to change.
>
> The updated attached patch significantly decreases memory consumption:
>
> tables orig patch % decrease
> ----
> 1 27,168 kB 27,168 kB 0
> 1k 46,136 kB 27,920 kB 39
> 2k 65,224 kB 28,796 kB 56
> 4k 103,276 kB 30,472 kB 70
> 8k 179,512 kB 33,900 kB 81
> 16k 331,860 kB 40,788 kB 88
> 32k 636,544 kB 54,572 kB 91
> 64k 1,245,920 kB 81,876 kB 93
>
> As you can see, a database with 64k tables shows a 93% decrease in
> memory use. I plan to apply this for PG 9.4.

Patch applied.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +