Re: MMAP Buffers

Lists: pgsql-hackers
From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: MMAP Buffers
Date: 2011-04-15 10:32:37
Message-ID: ac8eade5f27608bed98d40c437b42dec@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

If I may, I want to share some concept to use mmap in PG. It's far, far
away from perfect, but it's keeps WAL before data. As well I crated
table, with index, inserted few values, and I done vacuum full on this
table. Db inits welcome from orginal sources.

Performance of read (if backend is loaded) is really good, query time
goes down from 450ms to about 410ms. Update may be slower - but work is
in progress (I will start with write, as I went to point when simple
updates may be performed). Even that I didn't covered all aspects off
updating, it's simple to do it, just to call PreopareBufferToUpdate
before modifing buffer, ofc some ideas of increasing this are still in
my head.

Any comments, suggestions welcome.

I didn't included this, as diff, because of ~150kb size (mainly
configure scripts, which are included in SVC). Due to this, You may
download it from
http://softperience.eu/downloads/pg_mmap_20110415.diff.bz2 (Legal: Work
under PostgreSQL BSD Lincense). Patch is just GIT diff, later I will try
to grab some git.

Regards and have a nice day,
Radek.

P.S. This problem about assert with signals, I wrote... I merged with
last master, and rebuilded code. I think, I forgot to rebuild it after
previous merge.


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-15 11:33:37
Message-ID: 4DA82D11.8030808@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.04.2011 13:32, Radosław Smogura wrote:
> If I may, I want to share some concept to use mmap in PG. It's far, far
> away from perfect, but it's keeps WAL before data. As well I crated
> table, with index, inserted few values, and I done vacuum full on this
> table. Db inits welcome from orginal sources.
>
> Performance of read (if backend is loaded) is really good, query time
> goes down from 450ms to about 410ms. Update may be slower - but work is
> in progress (I will start with write, as I went to point when simple
> updates may be performed). Even that I didn't covered all aspects off
> updating, it's simple to do it, just to call PreopareBufferToUpdate
> before modifing buffer, ofc some ideas of increasing this are still in
> my head.
>
> Any comments, suggestions welcome.

The patch is quite hard to read because of random whitespace changes and
other stylistic issues, but I have a couple of high-level questions on
the design:

* Does each process have its own mmappings, or are the mmapping done to
shared buffers?

* How do you handle locking? Do you still need to allocate a shared
buffer for each mmapped page?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-15 12:12:37
Message-ID: 45293d0b173a01a8c6997ad6e5d3bc82@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 15 Apr 2011 14:33:37 +0300, Heikki Linnakangas wrote:
> On 15.04.2011 13:32, Radosław Smogura wrote:
>> If I may, I want to share some concept to use mmap in PG. It's far,
>> far
>> away from perfect, but it's keeps WAL before data. As well I crated
>> table, with index, inserted few values, and I done vacuum full on
>> this
>> table. Db inits welcome from orginal sources.
>>
>> Performance of read (if backend is loaded) is really good, query
>> time
>> goes down from 450ms to about 410ms. Update may be slower - but work
>> is
>> in progress (I will start with write, as I went to point when simple
>> updates may be performed). Even that I didn't covered all aspects
>> off
>> updating, it's simple to do it, just to call PreopareBufferToUpdate
>> before modifing buffer, ofc some ideas of increasing this are still
>> in
>> my head.
>>
>> Any comments, suggestions welcome.
>
> The patch is quite hard to read because of random whitespace changes
> and other stylistic issues, but I have a couple of high-level
> questions on the design:
Yes, but, hmm... in Netbeans I had really long gaps (probably 8 spaces,
from tabs), so deeper "ifs", comments at the and of variables, went of
out my screen. I really wanted to not format this, but sometimes I
needed.

> * Does each process have its own mmappings, or are the mmapping done
> to shared buffers?
Those are MAP_SHARED mappings, but each process has it's own pointer to
this.

> * How do you handle locking?
I do not do locking... I do different thing (worst)... When buffer
should be updated, it gets shared buffer, content is copied (so
situation almost same like fread), and depending on situation content is
used directly or pages between mmaping and shared (updatable) regions
are swapped - it keeps tuple pointers, etc. I really would be happy if
such method (lock flushing to file) could exists.

> Do you still need to allocate a shared buffer for each mmapped page?
Currently each mmaped page has additional shared buffer, but it's
almost ready to use independent pool of shared buffers. This will be
good, as mmaped buffers could cover whole system cache, keeping maybe
10%-20% of this size for write in SHMEM.

I think about MAP_PRIVATE, but those has some pluses and minuses, e.g.
MAP_SHARED may be, for less critical systems, simplier equipped with GUC
mmap_direct_write=true.

Regards,
Radek


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-15 12:26:26
Message-ID: 4DA83972.1000507@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/15/2011 08:12 AM, Radosław Smogura wrote:
>> The patch is quite hard to read because of random whitespace changes
>> and other stylistic issues, but I have a couple of high-level
>> questions on the design:
>
> Yes, but, hmm... in Netbeans I had really long gaps (probably 8
> spaces, from tabs), so deeper "ifs", comments at the and of variables,
> went of out my screen. I really wanted to not format this, but
> sometimes I needed.
>
>

Netbeans is possibly not very well suited to working on postgres code.
AFAIK emacs and/or vi(m) are used by almost all the major developers.

cheers

andrew


From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-15 16:55:04
Message-ID: 2140729944.261277.1302886504155.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Radoslaw,

10% improvement isn't very impressive from a switch to mmap. What workload did you test with? What I'd really like to see is testing with databases which are 50%, 90% and 200% the size of RAM ... that's where I'd expect the greatest gain from limiting copying.

> Netbeans is possibly not very well suited to working on postgres code.
> AFAIK emacs and/or vi(m) are used by almost all the major developers.

Guys, can we *please* focus on the patch for now, rather than the formatting, which is fixable with sed?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-15 17:36:44
Message-ID: 201104151936.44487.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua Berkus <josh(at)agliodbs(dot)com> Friday 15 April 2011 18:55:04
> Radoslaw,
>
> 10% improvement isn't very impressive from a switch to mmap. What workload
> did you test with? What I'd really like to see is testing with databases
> which are 50%, 90% and 200% the size of RAM ... that's where I'd expect
> the greatest gain from limiting copying.
I think 10% is quite good, as my stand-alone test of mmap vs. read shown that
speed up of copying 100MB data to mem may be from ~20ms to ~100ms (depends on
destination address). Of course deeper, system test simulating real usage will
say more. In any case after good deals with writes, I will speed up reads. I
think to bypass smgr/md much more and to expose shared id's (1,2,3...) for
each file segment.

Going to topic...

In attachment I sent test-scripts which I used to fill data, nothing complex
(left from 2nd level caches).

Query I've used to measure was SELECT count(substr(content, 1, 1)) FROM
testcase1 WHERE multi_id > 50000;

Timings ware taken from psql.

I didn't made load (I have about 2GB of free sapce at /home, and 4GB RAM) and
stress (I'm not quite ready to try concurrent updates of same page - may fail,
notice is and place to fix is in code) tests yet.

> > Netbeans is possibly not very well suited to working on postgres code.
> > AFAIK emacs and/or vi(m) are used by almost all the major developers.
>
> Guys, can we *please* focus on the patch for now, rather than the
> formatting, which is fixable with sed?
Netbeans is quite good, of course it depends who likes what. Just try 7.0 RC
2.

Regards,
Radek

Attachment Content-Type Size
test-scritps_20110319_0026.tar.bz2 application/x-bzip-compressed-tar 2.4 KB

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 05:48:23
Message-ID: 4DA92DA7.6050708@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua Berkus wrote:
> Guys, can we *please* focus on the patch for now, rather than the formatting, which is fixable with sed?
>

Never, and that's not true. Heikki was being nice; I wouldn't have even
slogged through it long enough to ask the questions he did before
kicking it back as unusable. A badly formatted patch makes it
impossible to evaluate whether the changes from a submission are
reasonable or not without the reviewer fixing it first. And you can't
automate correcting it, it takes a lot of tedious manual work. Start
doing a patch review every CommitFest cycle and you very quickly realize
it's not an ignorable problem. And lack of discipline in minimizing
one's diff is always a sign of other code quality issues.

Potential contributors to PostgreSQL should know that a badly formatted
patch faces an automatic rejection, because no reviewer can work with it
easily. This fact is not a mystery; in fact it's documented at
http://wiki.postgresql.org/wiki/Submitting_a_Patch : "The easiest way
to get your patch rejected is to make lots of unrelated changes, like
reformatting lines, correcting comments you felt were poorly worded etc.
Each patch should have the minimum set of changes required to fulfil the
single stated objective." I think I'll go improve that text
next--something like "Ways to get your patch rejected" should be its own
section.

The problem here isn't whether someone used an IDE or not, it's that
this proves they didn't read their own patch before submitting it.
Reading one's own diff and reflecting on what you've changed is one of
the extremely underappreciated practices of good open-source software
development. Minimizing the size of that diff is perhaps the most
important thing someone can do in order to make their changes to a piece
of software better. Not saying something that leads in that direction
would be a disservice to the submitter.

P.S. You know what else I feel should earn an automatic rejection
without any reviewer even looking at the code? Submitting a patch that
claims to improve performance and not attaching the test case you used,
along with detailed notes about before/after tests on your own
hardware. A hand wave "it's faster" is never good enough, and it's
extremely wasteful of our limited reviewer resources to try and
duplicate what the submitter claimed. Going to add something about that
to the submission guidelines too.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 06:24:38
Message-ID: 69818AF1-FFFA-4015-BEF3-B3AB73F158BA@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Apr 16, 2011, at 1:48 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> P.S. You know what else I feel should earn an automatic rejection without any reviewer even looking at the code?

Greg is absolutely right. And to the two he listed, let me add another of my own gripes: failing to provide submission notes that explain how the patch works, and how it addresses the conceptually difficult issues raised previously. The OP says that this patch maintains the WAL-before-data rule without any explanation of how it accomplishes that seemingly quite amazing feat. I assume I'm going to have to read this patch at some point to refute this assertion, and I think that sucks. I am pretty nearly 100% confident that this approach is utterly doomed, and I don't want to spend a lot of time on it unless someone can provide me with a compelling explanation of why my confidence is misplaced. But spending a lot of time on it is exactly what I'm going to have to do, because reading a undocumented patch full of spurious garbage to refute a hand-wavy claim of correctness is time-consuming, and if I give up on it without reading it, someone will yell "unfair, unfair!"

None of this is to say that I don't appreciate Radoslaw's interest in contributing, because I very much do. But I also think it's important to realize that we have a finite number of reviewers and they have finite time. Trying to minimize the amount of time that it takes someone to review or commit your patch is a service to the whole community, and we should acknowledge that it has value and appreciate the people who consistently do it.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 06:36:08
Message-ID: 23881.1302935768@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> Reading one's own diff and reflecting on what you've changed is one of
> the extremely underappreciated practices of good open-source software
> development. Minimizing the size of that diff is perhaps the most
> important thing someone can do in order to make their changes to a piece
> of software better. Not saying something that leads in that direction
> would be a disservice to the submitter.

A couple further comments on that thought:

* Another argument for avoiding unnecessary changes is that the larger
your patch's change footprint, the more likely it is to create merge
conflicts for people working on other patches. Now, if they're
necessary changes, that's the price of parallelism in development.
But gratuitous whitespace changes add nothing and they do have costs.

* On the other side of the coin, I have seen many a patch that was
written to minimize the length of the diff to the detriment of
readability or maintainability of the resulting code, and that's *not*
a good tradeoff. Always do what makes the most sense from a long-run
perspective.

I keep wanting to do a talk arguing that everything you need to know
about good patch style can be derived from the mantra "Make the patch
look like the code had always been there". If the functionality had
been designed in on day one, where would it be placed and how would it
be coded? You might be able to make the patch diff shorter with some
shortcut or other, but that's not the way to do it.

regards, tom lane


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rados?aw Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 07:06:46
Message-ID: 4DA94006.6050905@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> * On the other side of the coin, I have seen many a patch that was
> written to minimize the length of the diff to the detriment of
> readability or maintainability of the resulting code, and that's *not*
> a good tradeoff.
>

Sure. that's possible. But based on the reviews I've done, I'd say that
the fact someone is even aware that minimizing their diff is something
important to consider automatically puts them far ahead of the average
new submitter. There are a high percentage of patches where the
submitter generates a diff and sents it without even looking at it.
That a person would look at their diff and go too far without trying to
make it small doesn't happen nearly as much.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rados?aw Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 07:12:23
Message-ID: 4DA94157.1010401@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> The OP says that this patch maintains the WAL-before-data rule without any explanation of how it accomplishes that seemingly quite amazing feat. I assume I'm going to have to read this patch at some point to refute this assertion, and I think that sucks.

I don't think you have to read any patch that doesn't follow the
submission guidelines. The fact that you do is a great contribution to
the community. But if I were suggesting how your time would be best
spent improving PostgreSQL, "reviewing patches that don't meet coding
standards" would be at the bottom of the list. There's always something
better for the project you could be working on instead.

I just added
http://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned
, recycling some existing text, adding some new suggestions.

I hope I got the tone of that text right. The intention was to have a
polite but clear place to point submitters to when their suggestion
doesn't meet the normal standards here, such that they might even get
bounced before even entering normal CommitFest review. This MMAP patch
looks like it has all 5 of the problems mentioned on that now more
focused list.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 11:00:19
Message-ID: BANLkTinZyTLJcq3q_9RUOn8c-0kT70=urw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 16, 2011 at 7:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The OP says that this patch maintains the WAL-before-data rule without any explanation of how it accomplishes that seemingly quite amazing feat.  I assume I'm going to have to read this patch at some point to refute this assertion, and I think that sucks. I am pretty nearly 100% confident that this approach is utterly doomed, and I don't want to spend a lot of time on it unless someone can provide me with a compelling explanation of why my confidence is misplaced.

Fwiw he did explain how he did that. Or at least I think he did --
it's possible I read what I expected because what he came up with is
something I've recently been thinking about.

What he did, I gather, is treat the mmapped buffers as a read-only
copy of the data. To actually make any modifications he copies it into
shared buffers and treats them like normal. When the buffers get
flushed from memory they get written and then the pointers get
repointed back at the mmapped copy. Effectively this means the shared
buffers get extended to include all of the filesystem cache instead of
having to evict buffers from shared buffers just because you want to
read another one that's already in filesystem cache.

It doesn't save the copying between filesystem cache and shared
buffers for buffers that are actually being written to. But it does
save some amount of other copies on read-only traffic and it can even
save some i/o. It does require a function call before each buffer
modification where the pattern is currently <lock buffer>, <mutate
buffer>, <mark buffer dirty>. From what he describes he needs to add a
<prepare buffer for mutation> between the lock and mutate.

I think it's an interesting experiment and it's good to know how to
solve some of the subproblems. Notably, how do you extend files or
drop them atomically across processes? And how do you deal with
getting the mappings to be the same across all the processes or deal
with them being different? But I don't think it's a great long-term
direction. It just seems clunky to have to copy things from mmapped
buffers to local buffers and back. Perhaps the performance testing
will show that clunkiness is well worth it but we'll need to see that
for a wide variety of workloads to judge that.

--
greg


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 11:50:10
Message-ID: 201104161350.10576.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> Saturday 16 April 2011 13:00:19
> On Sat, Apr 16, 2011 at 7:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > The OP says that this patch maintains the WAL-before-data rule without
> > any explanation of how it accomplishes that seemingly quite amazing
> > feat. I assume I'm going to have to read this patch at some point to
> > refute this assertion, and I think that sucks. I am pretty nearly 100%
> > confident that this approach is utterly doomed, and I don't want to
> > spend a lot of time on it unless someone can provide me with a
> > compelling explanation of why my confidence is misplaced.
>
> Fwiw he did explain how he did that. Or at least I think he did --
> it's possible I read what I expected because what he came up with is
> something I've recently been thinking about.
>
> What he did, I gather, is treat the mmapped buffers as a read-only
> copy of the data. To actually make any modifications he copies it into
> shared buffers and treats them like normal. When the buffers get
> flushed from memory they get written and then the pointers get
> repointed back at the mmapped copy. Effectively this means the shared
> buffers get extended to include all of the filesystem cache instead of
> having to evict buffers from shared buffers just because you want to
> read another one that's already in filesystem cache.
>
> It doesn't save the copying between filesystem cache and shared
> buffers for buffers that are actually being written to. But it does
> save some amount of other copies on read-only traffic and it can even
> save some i/o. It does require a function call before each buffer
> modification where the pattern is currently <lock buffer>, <mutate
> buffer>, <mark buffer dirty>. From what he describes he needs to add a
> <prepare buffer for mutation> between the lock and mutate.
>
> I think it's an interesting experiment and it's good to know how to
> solve some of the subproblems. Notably, how do you extend files or
> drop them atomically across processes? And how do you deal with
> getting the mappings to be the same across all the processes or deal
> with them being different? But I don't think it's a great long-term
> direction. It just seems clunky to have to copy things from mmapped
> buffers to local buffers and back. Perhaps the performance testing
> will show that clunkiness is well worth it but we'll need to see that
> for a wide variety of workloads to judge that.

In short words, I swap, exchange (clash of terms) VM pages to prevent pointers
(only if needed). I tried to directly point to new memory area, but I saw that
some parts of code really depends on memory pointed by original pointers, e.g.
Vaccumm uses hint bits setted by previous scan (it depends on this if bit is
set or not! so for it it's not only hint). Just from this case I can't assume
there is no more such places, so VM pages swap does it for me.

Stand alone tests shows for me that this process (with copy from mmap) is
2x-3x time longer then previous. But until someone will not update whole
table, then benefit will be taken from pre-update scan, index scans, larger
availability of memory (you don't eat cache memory to keep copy of cache in
ShMem). Everything may be slower when database fits in ShMem, and similarly
(2nd level bufferes may increase performance slightly).

I reserve memory for whole segment even if file is smaller. Extending is by
wirte one byte at the end of block (here may come deal with Unfiorm Buffer
Caches, if I remember name well). For current processors, and current
implementation database size is limited to about 260TB (no dynamic segment
reservation is performed).

Truncation not implemented.

Each buffer descriptor has tagVersion to simple check if buffer tag has
changed. Descriptors (partially) are mirrored in local memory, and versions
are checked. Currently each re-read (is pointed to smgr/md), but introduce
shared segment id, and assuming each segment has constant maximum number of
blocks, will make it faster (this will be something like current buffer tag),
even version field will be unneeded.

I saw problems with vacuum, as it reopens relation and I got mappings of same
file twice (minor problem). Important will be about deletion, when pointers
must invalidated in "good way".

Regards,
Radek.


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 11:53:07
Message-ID: BANLkTi=qeFc_yWTTazwfamC+b_N+NM_snQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 16, 2011 at 8:48 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Joshua Berkus wrote:
>>
>> Guys, can we *please* focus on the patch for now, rather than the
>> formatting, which is fixable with sed?
>>
>
> Never, and that's not true.  Heikki was being nice; I wouldn't have even
> slogged through it long enough to ask the questions he did before kicking it
> back as unusable.  A badly formatted patch makes it impossible to evaluate
> whether the changes from a submission are reasonable or not without the
> reviewer fixing it first.  And you can't automate correcting it, it takes a
> lot of tedious manual work.  Start doing a patch review every CommitFest
> cycle and you very quickly realize it's not an ignorable problem.  And lack
> of discipline in minimizing one's diff is always a sign of other code
> quality issues.
>
> Potential contributors to PostgreSQL should know that a badly formatted
> patch faces an automatic rejection, because no reviewer can work with it
> easily.  This fact is not a mystery; in fact it's documented at
> http://wiki.postgresql.org/wiki/Submitting_a_Patch :  "The easiest way to
> get your patch rejected is to make lots of unrelated changes, like
> reformatting lines, correcting comments you felt were poorly worded etc.
> Each patch should have the minimum set of changes required to fulfil the
> single stated objective."  I think I'll go improve that text next--something
> like "Ways to get your patch rejected" should be its own section.
>
> The problem here isn't whether someone used an IDE or not, it's that this
> proves they didn't read their own patch before submitting it.  Reading one's
> own diff and reflecting on what you've changed is one of the extremely
> underappreciated practices of good open-source software development.
>  Minimizing the size of that diff is perhaps the most important thing
> someone can do in order to make their changes to a piece of software better.
>  Not saying something that leads in that direction would be a disservice to
> the submitter.
>
> P.S. You know what else I feel should earn an automatic rejection without
> any reviewer even looking at the code?  Submitting a patch that claims to
> improve performance and not attaching the test case you used, along with
> detailed notes about before/after tests on your own hardware.  A hand wave
> "it's faster" is never good enough, and it's extremely wasteful of our
> limited reviewer resources to try and duplicate what the submitter claimed.
>  Going to add something about that to the submission guidelines too.

Give the OP a break - he was not "re-styling", he was clearly trying to make
crappily indented Postgres code readable in his editor. Reading the patch
would not matter because the original code would still be crappily indented.

Yes, such patch is bad, but what should the proper response be in such
situation?

Hint: it can be both polite and short.

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 15:02:32
Message-ID: 814.1302966152@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> What he did, I gather, is treat the mmapped buffers as a read-only
> copy of the data. To actually make any modifications he copies it into
> shared buffers and treats them like normal. When the buffers get
> flushed from memory they get written and then the pointers get
> repointed back at the mmapped copy.

That seems much too late --- won't other processes still be looking at
the stale mmap'ed version of the page until a write-out happens?

I'm pretty concerned about the memory efficiency of this too, since it
seems like it's making it *guaranteed*, not just somewhat probable,
that there are two copies in RAM of every database page that's been
modified since the last checkpoint (or so).

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 17:24:27
Message-ID: 1302974667.15043.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2011-04-15 at 12:32 +0200, Radosław Smogura wrote:
> I didn't included this, as diff, because of ~150kb size (mainly
> configure scripts, which are included in SVC). Due to this, You may
> download it from
> http://softperience.eu/downloads/pg_mmap_20110415.diff.bz2 (Legal:
> Work
> under PostgreSQL BSD Lincense). Patch is just GIT diff, later I will
> try
> to grab some git.

Btw., about 87% of this patch are diffs against configure and
pg_config.h.in, which are useless. If you strip those out, your patch
will be small enough to submit inline.


From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-16 18:09:25
Message-ID: 1453152393.293937.1302977365075.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

> Never, and that's not true. Heikki was being nice; I wouldn't have
> even
> slogged through it long enough to ask the questions he did before
> kicking it back as unusable. A badly formatted patch makes it
> impossible to evaluate whether the changes from a submission are
> reasonable or not without the reviewer fixing it first.

Then you can say that politely and firmly with direct reference to the problem, rather than making the submitter feel bad.

"Thank you for taking on testing an idea we've talked about on this list for a long time and not had the energy to test. However, I'm having a hard time evaluating your patch for a few reasons ...(give reasons). Would it be possible for you to resolve these and resubmit so that I can give the patch a good evaluation?"

... and once *one* person on this list has made such a comment, there is no need for two other hackers to pile on the reformat-your-patch bandwagon.

Our project has an earned reputation for being rejection-happy curmudgeons. This is something I heard more than once at MySQLConf, including from one student who chose to work on Drizzle instead of PostgreSQL for that reason. I think that we could stand to go out of our way to be helpful to first-time submitters.

That doesn't mean that we have to accept patches mangled by using an IDE designed for Java, and which lack test cases. However, we can be nice about it.

--
Josh Berkus
Niceness Nazi


From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 18:33:10
Message-ID: 309059930.294300.1302978790081.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Radoslaw,

> I think 10% is quite good, as my stand-alone test of mmap vs. read
> shown that
> speed up of copying 100MB data to mem may be from ~20ms to ~100ms
> (depends on
> destination address). Of course deeper, system test simulating real
> usage will
> say more. In any case after good deals with writes, I will speed up
> reads. I
> think to bypass smgr/md much more and to expose shared id's (1,2,3...)
> for
> each file segment.

Well, given the risks to durability and stability associated with using MMAP, I doubt anyone would even consider it for a 10% throughput improvement. However, I don't think the test you used demonstrates the best case for MMAP as a performance improvement.

> In attachment I sent test-scripts which I used to fill data, nothing
> complex
> (left from 2nd level caches).
>
> Query I've used to measure was SELECT count(substr(content, 1, 1))
> FROM
> testcase1 WHERE multi_id > 50000;
>
> Timings ware taken from psql.
>
> I didn't made load (I have about 2GB of free sapce at /home, and 4GB
> RAM) and
> stress (I'm not quite ready to try concurrent updates of same page -
> may fail,
> notice is and place to fix is in code) tests yet.

Yes, but this test case doesn't offer much advantage to MMAP. Where I expect it would shine would be cases where the database is almost as big as, or much bigger than RAM ... where the extra data copying by current code is both frequent and wastes buffer space we need to use. As well as concurrent reads from the same rows.

You can write a relatively simple custom script using pgBench to test this; you don't need a big complicated benchmark. Once we get over the patch cleanup issues, I might be able to help with this.

> Netbeans is quite good, of course it depends who likes what. Just try
> 7.0 RC
> 2.

I don't know if you've followed the formatting discussion, but apparently there's an issue with Netbeans re-indenting lines you didn't even edit. It makes your patch hard to read or apply. I expect that Netbeans has some method to reconfigure indenting, etc.; do you think you could configure it to PostgresQL standards so that this doesn't get in the way of evaluation of your ideas?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-16 19:19:53
Message-ID: 4DA9EBD9.7060902@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua Berkus wrote:
> Then you can say that politely and firmly with direct reference to the problem, rather than making the submitter feel bad.
>

That's exactly what happened. And then you responded that it was
possible to use a patch without fixing the formatting first. That's not
true, and those of us who do patch review are tired of even trying.

> Our project has an earned reputation for being rejection-happy curmudgeons. This is something I heard more than once at MySQLConf, including from one student who chose to work on Drizzle instead of PostgreSQL for that reason. I think that we could stand to go out of our way to be helpful to first-time submitters.
>

I'll trade you anecdotes by pointing out that I heard from half a dozen
business people that the heavy emphasis on quality control and standards
was the reason they were looking into leaving MySQL derived
distributions for PostgreSQL.

I've spent days of time working on documentation to help new submitters
get their patches improve to where they meet this community's
standards. This thread just inspired another round of that. What
doesn't help is ever telling someone they can ignore those and still do
something useful we're interested in.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 19:41:19
Message-ID: BANLkTinv387tqTSc=x-CfiR1Y-SbdGn8dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 15, 2011 at 3:12 PM, Radosław Smogura
<rsmogura(at)softperience(dot)eu> wrote:
> On Fri, 15 Apr 2011 14:33:37 +0300, Heikki Linnakangas wrote:
>> The patch is quite hard to read because of random whitespace changes
>> and other stylistic issues, but I have a couple of high-level
>> questions on the design:
>
> Yes, but, hmm... in Netbeans I had really long gaps (probably 8 spaces, from
> tabs), so deeper "ifs", comments at the and of variables, went of out my
> screen. I really wanted to not format this, but sometimes I needed.

Seems no one else has mentioned it yet -- Postgres uses non-standard
tab-width of 4, instead of 8. If you see ugly code in Postgres, thats why...

--
marko


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 22:43:31
Message-ID: 201104170043.32020.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Saturday 16 April 2011 17:02:32
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > What he did, I gather, is treat the mmapped buffers as a read-only
> > copy of the data. To actually make any modifications he copies it into
> > shared buffers and treats them like normal. When the buffers get
> > flushed from memory they get written and then the pointers get
> > repointed back at the mmapped copy.
>
> That seems much too late --- won't other processes still be looking at
> the stale mmap'ed version of the page until a write-out happens?
No, no, no :) I wanted to do this, but from above reason I skipped it. I swap
VM pages, I do remap, in place where the shared buffer was I put mmaped page,
and in place where mmaped page was I put shared page (in certain cases, which
should be optimized by e. g. read for update, for initial read of page in
process I directly points to shared buffer), it can be imagined as I affects
TLB. This what I call "VM swap" is remapping, so I don't change pointers, I
change only where this pointers points in physical memory, preserving same
pointer in Virtual Memory.

if 0x1 is start of buffer 1 (at relation 1, block 1)
I have
0x1 - 0x1 + BLCKSZ -> mmaped area
0xfffff1000 - 0xfffff1000 + BLCKSZ -> Shmem

SWAP
0x1 - 0x1 + BLCKSZ -> Shmem
0xfffff1000 - 0xfffff1000 + BLCKSZ -> mmaped area

It's reason I putted in crash reports /proc/{pid}/maps. For e. g. maps after
swap looks like (from crash report):

[...]
#Data mappings
7fe69b7e3000-7fe69b7ef000 r--s 00000000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fe69b7ef000-7fe69b7f1000 rw-s 00148000 00:04 8880132
/SYSV0052ea91 (deleted)
7fe69b7f1000-7fe6db7e3000 r--s 0000e000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
[...]
#SysV shmem mappings

7fec60788000-7fec6078c000 rw-s 00144000 00:04 8880132
/SYSV0052ea91 (deleted)
7fec6078c000-7fec6078e000 r--s 0000c000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fec6078e000-7fec6079c000 rw-s 0014a000 00:04 8880132
/SYSV0052ea91 (deleted)
[...]

Without swap 12516 should be mapped to one VM region of size equal to
BLCKSZ*BLOCKS_PER_SEGMENT (which is about 1GB).

When process reads buffer (or after taking lock), the shared buffer descriptor
is checked if page was modified (currently is it dirty) if yes do swap, if
page is currently in use, or use directly SysV shared areas if pages is just
pinned to process.

Regards,
Radek

> I'm pretty concerned about the memory efficiency of this too, since it
> seems like it's making it *guaranteed*, not just somewhat probable,
> that there are two copies in RAM of every database page that's been
> modified since the last checkpoint (or so).
>
> regards, tom lane


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 23:19:21
Message-ID: 4DAA23F9.8030504@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Radosław Smogura wrote:
> Yes, but, hmm... in Netbeans I had really long gaps (probably 8
> spaces, from tabs), so deeper "ifs", comments at the and of variables,
> went of out my screen. I really wanted to not format this, but
> sometimes I needed.

The guide at
http://www.open-source-editor.com/editors/how-to-make-netbeans-use-tabs-for-indention.html
seems to cover how to fix this in Netbeans. You want it to look like
that screen shot: 4 spaces per indent with matching tab size of 4, and
"Expand Tabs to Spaces" unchecked.

Generally, if you look at the diff you've created, and your new code
doesn't line up right with what's already there, that means the
tab/space setup isn't quite right when you were editing. Reading the
diff is useful for catching all sorts of other issues, too, so it's just
generally a good practice. As Peter already mentioned, the big problem
here is that you checked in a modified configure file.

I also note that you use C++ style "//" comments, which aren't allowed
under the coding guidelines--even though they work fine on many common
platforms.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-16 23:35:45
Message-ID: 12299.1302996945@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

=?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> No, no, no :) I wanted to do this, but from above reason I skipped it. I swap
> VM pages, I do remap, in place where the shared buffer was I put mmaped page,
> and in place where mmaped page was I put shared page (in certain cases, which
> should be optimized by e. g. read for update, for initial read of page in
> process I directly points to shared buffer), it can be imagined as I affects
> TLB. This what I call "VM swap" is remapping, so I don't change pointers, I
> change only where this pointers points in physical memory, preserving same
> pointer in Virtual Memory.

... Huh? Are you saying that you ask the kernel to map each individual
shared buffer separately? I can't believe that's going to scale to
realistic applications.

regards, tom lane


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 01:12:25
Message-ID: BANLkTimXoLC38Gfkv8Y735UDNdG5SvjLFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 16, 2011 at 3:19 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Joshua Berkus wrote:
>>
>> Then you can say that politely and firmly with direct reference to the
>> problem, rather than making the submitter feel bad.
>>
>
> That's exactly what happened.  And then you responded that it was possible
> to use a patch without fixing the formatting first.  That's not true, and
> those of us who do patch review are tired of even trying.

It would be worth a lot if we could get it enough easier to use
pgindent, so that that could help *anyone* fix the formatting, as
opposed to being something that Bruce runs once in a long while.

If you can say, "here, run 'tools/frobozz/pg_indent' against each of
your files, then resubmit the patch," and have at least a fighting
chance of that being *nearly* right, that is a much nicer response to
give those folks.

Alternately, it would be nice if you could say, "I ran pgindent
against your files, here's the revised patch, please do that yourself
in future"

When application of formatting policy is near-nondeterministic, that's no fun!
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 01:23:13
Message-ID: BANLkTi=1+CeCsovPXe4uXS27zqFzJibnrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 16, 2011 at 9:12 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
> On Sat, Apr 16, 2011 at 3:19 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>> Joshua Berkus wrote:
>>>
>>> Then you can say that politely and firmly with direct reference to the
>>> problem, rather than making the submitter feel bad.
>>>
>>
>> That's exactly what happened.  And then you responded that it was possible
>> to use a patch without fixing the formatting first.  That's not true, and
>> those of us who do patch review are tired of even trying.
>
> It would be worth a lot if we could get it enough easier to use
> pgindent, so that that could help *anyone* fix the formatting, as
> opposed to being something that Bruce runs once in a long while.
>
> If you can say, "here, run 'tools/frobozz/pg_indent' against each of
> your files, then resubmit the patch," and have at least a fighting
> chance of that being *nearly* right, that is a much nicer response to
> give those folks.
>
> Alternately, it would be nice if you could say, "I ran pgindent
> against your files, here's the revised patch, please do that yourself
> in future"

I agree.

But it turns out that it doesn't really matter. Whitespace or no
whitespace, if you don't read the diff before you hit send, it's
likely to contain some irrelevant cruft, whether whitespace changes or
otherwise.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 01:24:44
Message-ID: 4DAA415C.6010405@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/16/2011 09:12 PM, Christopher Browne wrote:
> On Sat, Apr 16, 2011 at 3:19 PM, Greg Smith<greg(at)2ndquadrant(dot)com> wrote:
>> Joshua Berkus wrote:
>>> Then you can say that politely and firmly with direct reference to the
>>> problem, rather than making the submitter feel bad.
>>>
>> That's exactly what happened. And then you responded that it was possible
>> to use a patch without fixing the formatting first. That's not true, and
>> those of us who do patch review are tired of even trying.
> It would be worth a lot if we could get it enough easier to use
> pgindent, so that that could help *anyone* fix the formatting, as
> opposed to being something that Bruce runs once in a long while.
>
> If you can say, "here, run 'tools/frobozz/pg_indent' against each of
> your files, then resubmit the patch," and have at least a fighting
> chance of that being *nearly* right, that is a much nicer response to
> give those folks.
>
> Alternately, it would be nice if you could say, "I ran pgindent
> against your files, here's the revised patch, please do that yourself
> in future"
>
> When application of formatting policy is near-nondeterministic, that's no fun!

What makes you think this isn't possible to run pgindent? There are no
secret incantations.

But it's probably overkill. emacs' indent-region gives you about a 90%
result or better if you're set up correctly.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 01:31:46
Message-ID: BANLkTi=6s8dJfgZjtRF6-XXmoxx7q2-=MQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 16, 2011 at 2:33 PM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
> Well, given the risks to durability and stability associated with using MMAP, I doubt anyone would even consider it for a 10% throughput improvement.  However, I don't think the test you used demonstrates the best case for MMAP as a performance improvement.

Actually, I'd walk through fire for a 10% performance improvement if
it meant only a *risk* to stability. The problem is that this is
likely unfixably broken. In particular, I think the first sentence of
Tom's response hit it right on the nose, and mirrors my own thoughts
on the subject. To have any chance of working, you'd need to track
buffer pins and shared/exclusive content locks for the pages that were
being accessed outside of shared buffers; otherwise someone might be
looking at a stale copy of the page.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 01:39:03
Message-ID: BANLkTin9yF-JfbOGBDf82KMvA8bVatufzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 16, 2011 at 9:31 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Apr 16, 2011 at 2:33 PM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
>> Well, given the risks to durability and stability associated with using MMAP, I doubt anyone would even consider it for a 10% throughput improvement.  However, I don't think the test you used demonstrates the best case for MMAP as a performance improvement.
>
> Actually, I'd walk through fire for a 10% performance improvement if
> it meant only a *risk* to stability.  The problem is that this is
> likely unfixably broken.  In particular, I think the first sentence of
> Tom's response hit it right on the nose, and mirrors my own thoughts
> on the subject.  To have any chance of working, you'd need to track
> buffer pins and shared/exclusive content locks for the pages that were
> being accessed outside of shared buffers; otherwise someone might be
> looking at a stale copy of the page.

Of course, maybe the patch is doing that. Rereading the thread, I
grow increasingly confused about what this is actually supposed to do
and how it's supposed to work and why it's supposedly better than what
we do now. But please, everyone feel free to continue bashing me for
wanting a readable patch with some understandable submission notes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 06:16:57
Message-ID: BANLkTim5rDbeJO=BBv-zMjmYjLiwHkRzpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/16/11, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> What makes you think this isn't possible to run pgindent? There are no
> secret incantations.

A while ago I spent a few hours trying to run it and gave up. I think
it was something about needing some obscure BSD version of some tool
which conflicted with just about everything else on the system. I can
try again and report back if anyone cares.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 07:40:59
Message-ID: BANLkTim=EwJ=r8jgRppz-zBnW91z0BO1yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Apr 17, 2011 8:17 AM, "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> On 4/16/11, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >
> >
> > What makes you think this isn't possible to run pgindent? There are no
> > secret incantations.
>
> A while ago I spent a few hours trying to run it and gave up. I think
> it was something about needing some obscure BSD version of some tool
> which conflicted with just about everything else on the system. I can
> try again and report back if anyone cares.

It does rely on BSD indent. For that very reason, we provide the source for
it, since most people have gnu indent. It's trivial to build, though, and
works just fine as a local build, and you can keep gnu indent as the main
one on your system - no conflicts.

It used to be a PITA due to the typedef list, but that has been fixed.
Perhaps we just need to document it a bit more...

/Magnus


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 08:08:44
Message-ID: 4DAAA00C.5010509@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> What makes you think this isn't possible to run pgindent? There are no
> secret incantations.

The first hit newbies find looking for info about pgident is
http://blog.hagander.net/archives/185-pgindent-vs-dash.html which sure
looks like secret incantations to me. The documentation
src/tools/pgindent/README reads like a magic spell too:

find . -name '*.[ch]' -type f -print | \
egrep -v -f src/tools/pgindent/exclude_file_patterns | \
xargs -n100 pgindent src/tools/pgindent/typedefs.list

And it doesn't actually work as written unless you've installed
pgindent, entab/detab, and the specially patched NetBSD indent into the
system PATH somewhere--unreasonable given that this may be executing on
a source only tree that has never been installed.. The fact that the
documention is only in the README and not with the rest of the code
conventions isn't helping either.

The last time I tried to do this a few years ago I failed miserably and
never came back. I know way more about building software now though,
and just got this to work for the first time. Attached is a WIP wrapper
script for running pgident that builds all the requirements into
temporary directories, rather than expecting you to install anything
system-wide or into a PostgreSQL destination directory. Drop this into
src/tools/pgindent, make it executable, and run it from that directory.
Should do the right thing on any system that has "make" as an alias for
"gmake" (TODO to be better about that in the file, with some other
nagging things).

When I just ran it against master I got a bunch of modified files, but
most of them look like things that have been touched recently so I think
it did the right thing. A test of my work here from someone who isn't
running this for the first time would be helpful. If this works well
enough, I think it would make a good helper script to include in the
distribution. The loose ends to fix I can take care of easily enough
once basic validation is finished.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Attachment Content-Type Size
run-pgindent text/plain 1.4 KB

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 08:11:55
Message-ID: 201104171011.55771.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Sunday 17 April 2011 01:35:45
> =?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> > No, no, no :) I wanted to do this, but from above reason I skipped it. I
> > swap VM pages, I do remap, in place where the shared buffer was I put
> > mmaped page, and in place where mmaped page was I put shared page (in
> > certain cases, which should be optimized by e. g. read for update, for
> > initial read of page in process I directly points to shared buffer), it
> > can be imagined as I affects TLB. This what I call "VM swap" is
> > remapping, so I don't change pointers, I change only where this pointers
> > points in physical memory, preserving same pointer in Virtual Memory.
>
> ... Huh? Are you saying that you ask the kernel to map each individual
> shared buffer separately? I can't believe that's going to scale to
> realistic applications.
>
> regards, tom lane

No, I do
mrempa(mmap_buff_A, MAP_FIXED, temp);
mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
mrempa(tmp, MAP_FIXED, mmap_buff_A).

This is this additional overhead - and may have some disadvantages. All
regions SysV / Posix MMAP are mapped before.

I couldn't believe too, but as I done some work about read, I was in dead
corner:
1. Create Read Before Buffer (connect with XLOG) that will store each page
before modification (page should be flushed and synced to log)
2. Rewrite whole db to repoint pointers or similar stuff (I done few steps for
this).
3. Or find something different.

I couldn't believe too, it's way I still work on it. I saw it gains speed for
few simple updates. I'm not quite sure why it gets it. I only may think it
was from "pre update reads". But full checks will go after some good point of
updates.

Regards,
Radek


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 08:15:53
Message-ID: 4DAAA1B9.6080702@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> But it turns out that it doesn't really matter. Whitespace or no
> whitespace, if you don't read the diff before you hit send, it's
> likely to contain some irrelevant cruft, whether whitespace changes or
> otherwise.
>

Right. Presuming that pgident will actually solve anything leaps over
two normally incorrect assumptions:

-That the main tree was already formatted with pgident before you
started, so no stray diffs will result from it touching things the
submitter isn't even involved in.

-There is no larger code formatting or diff issues except for spacing.

This has been a nagging loose end for a while, so I'd like to see
pgindent's rough edges get sorted out so it's easier to use. But
whitespace errors because of bad editors are normally just a likely sign
of a patch with bigger problems, rather than something that can get
fixed and then submissions is good. There is no substitute for the
discipline of reading your own diff before submission. I'll easily
obsess over mine for an hour before I submit something major, and that
time is always well spent.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 08:26:21
Message-ID: 4DAAA42D.20309@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/17/2011 02:16 AM, Jeff Janes wrote:
> On 4/16/11, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>
>> What makes you think this isn't possible to run pgindent? There are no
>> secret incantations.
> A while ago I spent a few hours trying to run it and gave up. I think
> it was something about needing some obscure BSD version of some tool
> which conflicted with just about everything else on the system. I can
> try again and report back if anyone cares.
>

A few hours? Seriously?

Here's what I just did, starting from scratch. It took me a few minutes.

* wget ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz
* mkdir bsdindent && cd bdsindent && tar -z -xf
.../indent.netbsd.patched.tgz
* make
* mv indent indent_for_pg
* sudo install -s -o bin -g bin indent_for_pg /usr/local/bin
* cd ../pg_head/src/tools/entab
* sudo make install
* cd ../pgindent
* sed -i 's/INDENT=indent/INDENT=indent_for_pg/' pgindent
* sudo install -s -o bin -g bin pgindent /usr/local/bin

Now we could certainly make this quite a bit slicker. Apart from
anything else, we should change the indent source code tarball so it
unpacks into its own directory. Having it unpack into the current
directory is ugly and unfriendly. And we should get rid of the "make
clean" line in the install target of entab's makefile, which just seems
totally ill-conceived.

It might also be worth setting it up so that instead of having to pass a
path to a typedefs file on the command line, we default to a file
sitting in, say, /usr/local/etc. Then you'd just be able to say
"pgindent my_file.c".

But it shouldn't take anyone hours to set up.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 08:31:59
Message-ID: 4DAAA57F.4090508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/17/2011 04:08 AM, Greg Smith wrote:
> Andrew Dunstan wrote:
>> What makes you think this isn't possible to run pgindent? There are
>> no secret incantations.
>
> The first hit newbies find looking for info about pgident is
> http://blog.hagander.net/archives/185-pgindent-vs-dash.html which sure
> looks like secret incantations to me. The documentation
> src/tools/pgindent/README reads like a magic spell too:
>
> find . -name '*.[ch]' -type f -print | \
> egrep -v -f src/tools/pgindent/exclude_file_patterns | \
> xargs -n100 pgindent src/tools/pgindent/typedefs.list

That's the incantation for indenting the whole of the source code. But
very few people want to do that. Most people just want to indent a
single file, for which the incantation is "pgindent path_to_typedefs
my_file.c". See in another message my suggestion for defaulting the
typedefs arg, so you'd just be able to say "pg_indent my_file.c".

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 13:51:59
Message-ID: 4DAAF07F.4000005@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/17/2011 04:26 AM, Andrew Dunstan wrote:
>
>
> Now we could certainly make this quite a bit slicker. Apart from
> anything else, we should change the indent source code tarball so it
> unpacks into its own directory. Having it unpack into the current
> directory is ugly and unfriendly. And we should get rid of the "make
> clean" line in the install target of entab's makefile, which just
> seems totally ill-conceived.
>
> It might also be worth setting it up so that instead of having to pass
> a path to a typedefs file on the command line, we default to a file
> sitting in, say, /usr/local/etc. Then you'd just be able to say
> "pgindent my_file.c".
>
>

OK, I have most of these bits.

A new tarball of indent is available at
<http://developer.postgresql.org/~andrew/indent.netbsd.patched.tgz> and
if everyone agrees I'll push it out to the mirrors.

Attached are two patches, one to remove some infelicity in the entab
makefile, and the other to allow skipping specifying the typedefs file
location either by setting it in an environment variable or by putting
it in a hard coded location.

cheers

andrew

Attachment Content-Type Size
entab.patch text/x-patch 390 bytes
pgindent.patch text/x-patch 933 bytes

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-17 14:27:20
Message-ID: 4DAAF8C8.5090001@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/17/2011 09:51 AM, Andrew Dunstan wrote:
>
>
> On 04/17/2011 04:26 AM, Andrew Dunstan wrote:
>>
>>
>> Now we could certainly make this quite a bit slicker. Apart from
>> anything else, we should change the indent source code tarball so it
>> unpacks into its own directory. Having it unpack into the current
>> directory is ugly and unfriendly. And we should get rid of the "make
>> clean" line in the install target of entab's makefile, which just
>> seems totally ill-conceived.
>>
>> It might also be worth setting it up so that instead of having to
>> pass a path to a typedefs file on the command line, we default to a
>> file sitting in, say, /usr/local/etc. Then you'd just be able to say
>> "pgindent my_file.c".
>>
>>
>
> OK, I have most of these bits.
>
> A new tarball of indent is available at
> <http://developer.postgresql.org/~andrew/indent.netbsd.patched.tgz>
> and if everyone agrees I'll push it out to the mirrors.
>
> Attached are two patches, one to remove some infelicity in the entab
> makefile, and the other to allow skipping specifying the typedefs file
> location either by setting it in an environment variable or by putting
> it in a hard coded location.
>
>

... and this one has a typo fixed.

cheers

andrew

Attachment Content-Type Size
pgindent.patch text/x-patch 932 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 15:48:56
Message-ID: 24855.1303055336@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

=?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Sunday 17 April 2011 01:35:45
>> ... Huh? Are you saying that you ask the kernel to map each individual
>> shared buffer separately? I can't believe that's going to scale to
>> realistic applications.

> No, I do
> mrempa(mmap_buff_A, MAP_FIXED, temp);
> mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
> mrempa(tmp, MAP_FIXED, mmap_buff_A).

There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
nor on my quite-up-to-date OS X box. The Linux man page for it says
"This call is Linux-specific, and should not be used in programs
intended to be portable." So if the patch is dependent on that call,
it's dead on arrival from a portability standpoint.

But in any case, you didn't explain how use of mremap() avoids the
problem of the kernel having to maintain a separate page-mapping-table
entry for each individual buffer. (Per process, yet.) If that's what's
happening, it's going to be a significant performance penalty as well as
(I suspect) a serious constraint on how many buffers can be managed.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: The big picture for patch submission (was Re: MMAP Buffers)
Date: 2011-04-17 16:47:43
Message-ID: 25628.1303058863@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> ... But please, everyone feel free to continue bashing me for
> wanting a readable patch with some understandable submission notes.

What he said. All this obsessing over whether the mmap patch could or
should have been run through pgindent is missing the big picture.
Namely, that no design documentation or theory-of-operation was offered,
and people are trying to extract that information from the code, and
it's just too messy for that to be feasible. (The patch isn't really
short of comments, but half of the comments seem to be TODOs or author's
questions to himself about whether something will work, and so they just
aren't particularly helpful to someone trying to understand what the
patch does or whether it will work.)

I think that rather than complaining about formatting, we should be
complaining about not following the overall patch submission process
and not providing adequate documentation. Most of the questions that
people are asking right now could have been answered on the strength of
a design sketch, before any code had been written at all. For a patch
as complicated and invasive as this, there should be a design sketch,
which perhaps gets fleshed out into a README file in the final patch.

The Submitting_a_Patch wiki page does touch on the point of getting some
early design feedback before you even try to write a patch, but I think
it could do with more emphasis on the issue.

regards, tom lane


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 17:26:31
Message-ID: 201104171926.31900.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Sunday 17 April 2011 17:48:56
> =?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Sunday 17 April 2011 01:35:45
> >
> >> ... Huh? Are you saying that you ask the kernel to map each individual
> >> shared buffer separately? I can't believe that's going to scale to
> >> realistic applications.
> >
> > No, I do
> > mrempa(mmap_buff_A, MAP_FIXED, temp);
> > mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
> > mrempa(tmp, MAP_FIXED, mmap_buff_A).
>
> There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
> nor on my quite-up-to-date OS X box. The Linux man page for it says
> "This call is Linux-specific, and should not be used in programs
> intended to be portable." So if the patch is dependent on that call,
> it's dead on arrival from a portability standpoint.
Good point. This is from initial concept, and actually I done this to do not
leave gaps in VM in which library or something could be mmaped. Last time I
think about using mmap to replace just one VM page.

> But in any case, you didn't explain how use of mremap() avoids the
> problem of the kernel having to maintain a separate page-mapping-table
> entry for each individual buffer. (Per process, yet.) If that's what's
> happening, it's going to be a significant performance penalty as well as
> (I suspect) a serious constraint on how many buffers can be managed.
>
> regards, tom lane
Kernel merges vm_structs. So mappings are compacted. I'm not kernel
specialist, but skipping memory consumption, for not compacted mappings,
kernel uses btrees for dealing with TLB, so it should not matter if there is
100 vm_structs or 100000 vm_structs.

Swap isn't made everywhere. When buffer is initialy read (privaterefcount
==1), then any access to this buffer will directly point to latest valid area.
If it has assigned shmem area then this will be used. I plan to add
"readbuffer for update" to prevent swaps, when it's almost sure that buffer
will be used for update.

I measured performance of page modifications (with unpining, full process on
stand alone unit test) it's 2x-3x more time of normal page reads, but this
result may not be sure, as I saw memcpy to memory above 2GB is slower then
memcpy to first 2GB (this may be idea to try to put some shared structs <
2GB).

I know that this patch is big question. Sometimes I'm optimistic, and
sometimes I'm pessimistic about final result.

Regards,
Radek


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: MMAP Buffers
Date: 2011-04-17 18:02:11
Message-ID: 201104172002.11811.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 17 April 2011 19:26:31 Radosław Smogura wrote:
> Kernel merges vm_structs. So mappings are compacted. I'm not kernel
> specialist, but skipping memory consumption, for not compacted mappings,
> kernel uses btrees for dealing with TLB, so it should not matter if there
> is 100 vm_structs or 100000 vm_structs.
But the CPUs TLB cache has maybe 16/256 (1lvl, 2nd) to 64/512 entries. That
will mean that there will be cachemisses all over.
Additionally your scheme requires flushing it regularly...

Andres


From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 19:03:52
Message-ID: 919444343.6842.1303067032082.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert,

> Actually, I'd walk through fire for a 10% performance improvement if
> it meant only a *risk* to stability.

Depends on the degree of risk. MMAP has the potential to introduce instability into areas of the code which have been completely reliable for years. Adding 20 new coredump cases with data loss for a 10% improvement seems like a poor bargain to me. It doesn't help that the only DB to rely heavily on MMAP (MongoDB) is OSSDB's paragon of data loss.

However, in the case where the database is larger than RAM ... or better, 90% of RAM ... MMAP has the theoretical potential to improve performance quite a bit more than 10% ... try up to 900% on some queries. However, I'd like to prove that in a test before we bother even debating the fundamental obstacles to using MMAP. It's possible that these theoretical performance benefits will not materialize, even without data safeguards.

> The problem is that this is
> likely unfixably broken. In particular, I think the first sentence of
> Tom's response hit it right on the nose, and mirrors my own thoughts
> on the subject. To have any chance of working, you'd need to track
> buffer pins and shared/exclusive content locks for the pages that were
> being accessed outside of shared buffers; otherwise someone might be
> looking at a stale copy of the page.

Nothing is unfixable. The question is whether it's worth the cost. Let me see if I can build a tree with Radislaw's patch, and do some real performance tests.

I, for one, am glad he did this work. We've discussed MMAP in the code off and on for years, but nobody wanted to do the work to test it. Now someone has, and we can decide whether it's worth pursuing based on the numbers.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 19:28:54
Message-ID: 29531.1303068534@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua Berkus <josh(at)agliodbs(dot)com> writes:
> I, for one, am glad he did this work. We've discussed MMAP in the code off and on for years, but nobody wanted to do the work to test it. Now someone has, and we can decide whether it's worth pursuing based on the numbers.

Well, the troubling issue is that it's not clear whether this patch is
realistic enough to think that performance measurements based on it
are representative of the whole idea of using mmap. The business of
remapping individual buffers in order to transition them to writable
state seems likely to me to be a huge performance penalty --- first
there's the direct cost of having to incur a kernel call each time we
do that, and second there's the distributed cost of asking the kernel
to manage thousands or millions of tiny mappings.

IOW, if this patch shows little or no performance improvement (as seems
likely to happen at scale), that doesn't prove that mmap in general
isn't potentially interesting, only that this isn't the right way to
approach it.

Still, if you do some tests and don't find a win, that might save time
compared to actually trying to understand and vet the patch ...

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 20:01:55
Message-ID: BANLkTi=A0k2zbo5jdXctr0ZEv0EVVo7pgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 17, 2011 at 11:48 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> =?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Sunday 17 April 2011 01:35:45
>>> ... Huh?  Are you saying that you ask the kernel to map each individual
>>> shared buffer separately?  I can't believe that's going to scale to
>>> realistic applications.
>
>> No, I do
>> mrempa(mmap_buff_A, MAP_FIXED, temp);
>> mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
>> mrempa(tmp, MAP_FIXED, mmap_buff_A).
>
> There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
> nor on my quite-up-to-date OS X box.  The Linux man page for it says
> "This call is Linux-specific, and should not be used in programs
> intended to be portable."  So if the patch is dependent on that call,
> it's dead on arrival from a portability standpoint.
>
> But in any case, you didn't explain how use of mremap() avoids the
> problem of the kernel having to maintain a separate page-mapping-table
> entry for each individual buffer.  (Per process, yet.)  If that's what's
> happening, it's going to be a significant performance penalty as well as
> (I suspect) a serious constraint on how many buffers can be managed.

I share your suspicions, although no harm in measuring it.

But I don't understand is how this approach avoids the problem of
different processes seeing different buffer contents. If backend A
has the buffer mmap'd and backend B wants to modify it (and changes
the mapping), backend A is still looking at the old buffer contents,
isn't it? And then things go boom.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: MMAP Buffers
Date: 2011-04-17 20:09:24
Message-ID: 201104172209.24587.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> Sunday 17 April 2011 20:02:11
> On Sunday 17 April 2011 19:26:31 Radosław Smogura wrote:
> > Kernel merges vm_structs. So mappings are compacted. I'm not kernel
> > specialist, but skipping memory consumption, for not compacted mappings,
> > kernel uses btrees for dealing with TLB, so it should not matter if
> > there is 100 vm_structs or 100000 vm_structs.
>
> But the CPUs TLB cache has maybe 16/256 (1lvl, 2nd) to 64/512 entries. That
> will mean that there will be cachemisses all over.
> Additionally your scheme requires flushing it regularly...
>
> Andres

I only know Phenom has 4096 entries I think and this covers 16MB of memory.
But I was taking about memory usage of struct vm_struct in kernel. I tries as
well with huge pages, but I can't write really fast allocator for this, it's
slower then malloc, maybe from different reasons.

Regards,
Radek


From: Andres Freund <andres(at)anarazel(dot)de>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: MMAP Buffers
Date: 2011-04-17 20:24:07
Message-ID: 201104172224.07686.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 17 April 2011 22:09:24 Radosław Smogura wrote:
> I only know Phenom has 4096 entries I think and this covers 16MB of
> memory.
The numbers I cited where intels before and after core2.

Andres


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-17 21:32:18
Message-ID: 201104172332.18381.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> Sunday 17 April 2011 22:01:55
> On Sun, Apr 17, 2011 at 11:48 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > =?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> >> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Sunday 17 April 2011 01:35:45
> >>
> >>> ... Huh? Are you saying that you ask the kernel to map each individual
> >>> shared buffer separately? I can't believe that's going to scale to
> >>> realistic applications.
> >>
> >> No, I do
> >> mrempa(mmap_buff_A, MAP_FIXED, temp);
> >> mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
> >> mrempa(tmp, MAP_FIXED, mmap_buff_A).
> >
> > There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
> > nor on my quite-up-to-date OS X box. The Linux man page for it says
> > "This call is Linux-specific, and should not be used in programs
> > intended to be portable." So if the patch is dependent on that call,
> > it's dead on arrival from a portability standpoint.
> >
> > But in any case, you didn't explain how use of mremap() avoids the
> > problem of the kernel having to maintain a separate page-mapping-table
> > entry for each individual buffer. (Per process, yet.) If that's what's
> > happening, it's going to be a significant performance penalty as well as
> > (I suspect) a serious constraint on how many buffers can be managed.
>
> I share your suspicions, although no harm in measuring it.
>
> But I don't understand is how this approach avoids the problem of
> different processes seeing different buffer contents. If backend A
> has the buffer mmap'd and backend B wants to modify it (and changes
> the mapping), backend A is still looking at the old buffer contents,
> isn't it? And then things go boom.

Each process has simple "mirror" of shared descriptors.

I "believe" that modifications to buffer content may be only done when holding
exclusive lock (with some simple exceptions) (+ MVCC), actually I saw only two
things that can change already loaded data and cause damage, you have
described (setting hint bits during scan, and vacuum - 1st may only cause, I
think, that two processes will ask for same transaction statuses <except
vacuum>, 2nd one is impossible as vacumm requires exclusive pin). When buffer
tag is changed the version of buffer is bumped up, and checked against local
version - this about reading buffer.

In other cases after obtaining lock check is done if buffer has associated
updatable buffer and if local "mirror" has it too, then swap should take
place.

Logic about updatable buffers is similar to "shared buffers", each updatable
buffer has pin count, and updatable buffer can't be free if someone uses it,
but in contrast to "normal buffers", updatable buffers doesn't have any
support for locking etc. Updatable buffers exists only on free list, or when
associated with buffer.

In future, I will change version to shared segment id, something like
relation's oid + block, but ids will have continuous numbering 1,2,3..., so I
will be able to bypass smgr/md during read, and tag version check - this looks
like faster solution.

Regards,
Radek


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-18 01:06:17
Message-ID: BANLkTimUhX0TxSN3_-rh1xo9aE_3fJZyqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
<rsmogura(at)softperience(dot)eu> wrote:
> Each process has simple "mirror" of shared descriptors.
>
> I "believe" that modifications to buffer content may be only done when holding
> exclusive lock (with some simple exceptions) (+ MVCC), actually I saw only two
> things that can change already loaded data and cause damage, you have
> described (setting hint bits during scan, and vacuum - 1st may only cause, I
> think, that two processes will ask for same transaction statuses <except
> vacuum>, 2nd one is impossible as vacumm requires exclusive pin). When buffer
> tag is changed the version of buffer is bumped up, and checked against local
> version - this about reading buffer.

Yes, an exclusive lock is required for substantive content changes.
But if vacuum cleaning up the buffer is an issue for your patch, then
it's probably also a problem if someone grabs an exclusive content
lock and deletes the tuple (by setting XMAX) and some other backend
later sees the old buffer contents after having in the meanwhile taken
a new snapshot; or if likewise someone grabs an exclusive-lock, adds a
tuple, and then your backend takes a new snapshot and then sees the
old buffer contents. Basically, any time someone grabs an
exclusive-lock and releases it, it's necessary for all observers to
see the updated contents by the time the exclusive lock is released.

> In other cases after obtaining lock check is done if buffer has associated
> updatable buffer and if local "mirror" has it too, then swap should take
> place.

I think this check would have to be done every time someone
share-locks the buffer, which seems rather expensive.

> Logic about updatable buffers is similar to "shared buffers", each updatable
> buffer has pin count, and updatable buffer can't be free if someone uses it,
> but in contrast to "normal buffers", updatable buffers doesn't have any
> support for locking etc. Updatable buffers exists only on free list, or when
> associated with buffer.

I don't see how you're going to get away with removing buffer locks.
They exist for a reason, and adding mmap() to the mix is going to
require MORE locking, not less.

> In future, I will change version to shared segment id, something like
> relation's oid + block, but ids will have continuous numbering 1,2,3..., so I
> will be able to bypass smgr/md during read, and tag version check - this looks
> like faster solution.

I don't understand this part at all.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-18 04:48:06
Message-ID: 4DABC286.3070102@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Now we could certainly make this quite a bit slicker. Apart from
> anything else, we should change the indent source code tarball so it
> unpacks into its own directory. Having it unpack into the current
> directory is ugly and unfriendly. And we should get rid of the "make
> clean" line in the install target of entab's makefile, which just
> seems totally ill-conceived.

I think the script I submitted upthread has most of the additional
slickness needed here. Looks like we both were working on documenting a
reasonable way to do this at the same time the other day. The idea of
any program here relying on being able to write to /usr/local/bin as
your example did makes this harder for people to run; that's why I made
everything in the build tree and just pushed the appropriate directories
into the PATH.

Since I see providing a script to automate this whole thing as the
preferred way to make this easier, re-packaging the indent source
tarball to extract to a directory doesn't seem worth the backwards
compatibility trouble it will introduce. Improving the entab makefile I
don't have an opinion on.

> It might also be worth setting it up so that instead of having to pass
> a path to a typedefs file on the command line, we default to a file
> sitting in, say, /usr/local/etc. Then you'd just be able to say
> "pgindent my_file.c".

OK, so I need to update my script to handle either indenting a single
file, or doing all of them.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-18 05:47:36
Message-ID: 201104180747.36265.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> Monday 18 April 2011 03:06:17
> On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
>
> <rsmogura(at)softperience(dot)eu> wrote:
> > Each process has simple "mirror" of shared descriptors.
> >
> > I "believe" that modifications to buffer content may be only done when
> > holding exclusive lock (with some simple exceptions) (+ MVCC), actually
> > I saw only two things that can change already loaded data and cause
> > damage, you have described (setting hint bits during scan, and vacuum -
> > 1st may only cause, I think, that two processes will ask for same
> > transaction statuses <except vacuum>, 2nd one is impossible as vacumm
> > requires exclusive pin). When buffer tag is changed the version of
> > buffer is bumped up, and checked against local version - this about
> > reading buffer.
>
> Yes, an exclusive lock is required for substantive content changes.
> But if vacuum cleaning up the buffer is an issue for your patch, then
> it's probably also a problem if someone grabs an exclusive content
> lock and deletes the tuple (by setting XMAX) and some other backend
> later sees the old buffer contents after having in the meanwhile taken
> a new snapshot; or if likewise someone grabs an exclusive-lock, adds a
> tuple, and then your backend takes a new snapshot and then sees the
> old buffer contents. Basically, any time someone grabs an
> exclusive-lock and releases it, it's necessary for all observers to
> see the updated contents by the time the exclusive lock is released.
> > In other cases after obtaining lock check is done if buffer has
> > associated updatable buffer and if local "mirror" has it too, then swap
> > should take place.
>
> I think this check would have to be done every time someone
> share-locks the buffer, which seems rather expensive.
I don't treat as issues, but it's disadvantage.

> > Logic about updatable buffers is similar to "shared buffers", each
> > updatable buffer has pin count, and updatable buffer can't be free if
> > someone uses it, but in contrast to "normal buffers", updatable buffers
> > doesn't have any support for locking etc. Updatable buffers exists only
> > on free list, or when associated with buffer.
>
> I don't see how you're going to get away with removing buffer locks.
> They exist for a reason, and adding mmap() to the mix is going to
> require MORE locking, not less.
>
> > In future, I will change version to shared segment id, something like
> > relation's oid + block, but ids will have continuous numbering 1,2,3...,
> > so I will be able to bypass smgr/md during read, and tag version check -
> > this looks like faster solution.
>
> I don't understand this part at all.
Versioning is witch approach where I thought about really often changes of
mmaped areas, I allocated part of segments, but now the segment is mmaped with
reservation, to it's full possible size, addresses of segments can't change
(problem is only with segment deletion).

Regards,
Radek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Browne <cbbrowne(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-18 06:07:50
Message-ID: 3694.1303106870@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> The last time I tried to do this a few years ago I failed miserably and
> never came back. I know way more about building software now though,
> and just got this to work for the first time.

BTW, another thing that should be in the try-try-again category is
seeing how close we could get to pgindent's results with GNU indent.
It seems clear to me that a process based on GNU indent would be a
lot easier for a lot of people. We tried that once before, and couldn't
get close enough to want to consider switching, but maybe it just needs
a more determined effort and/or more recent versions of GNU indent.
(ISTR that we hit some things that seemed to be outright bugs in GNU
indent, but this was quite a few years ago.)

regards, tom lane


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MMAP Buffers
Date: 2011-04-18 11:56:25
Message-ID: 25133b993f880f358728feac47aa243e@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 17 Apr 2011 21:06:17 -0400, Robert Haas wrote:
> On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
> <rsmogura(at)softperience(dot)eu> wrote:
>> Each process has simple "mirror" of shared descriptors.
>>
>> I "believe" that modifications to buffer content may be only done
>> when holding
>> exclusive lock (with some simple exceptions) (+ MVCC), actually I
>> saw only two
>> things that can change already loaded data and cause damage, you
>> have
>> described (setting hint bits during scan, and vacuum - 1st may only
>> cause, I
>> think, that two processes will ask for same transaction statuses
>> <except
>> vacuum>, 2nd one is impossible as vacumm requires exclusive pin).
>> When buffer
>> tag is changed the version of buffer is bumped up, and checked
>> against local
>> version - this about reading buffer.
>
> Yes, an exclusive lock is required for substantive content changes.
> But if vacuum cleaning up the buffer is an issue for your patch, then
> it's probably also a problem if someone grabs an exclusive content
> lock and deletes the tuple (by setting XMAX) and some other backend
> later sees the old buffer contents after having in the meanwhile
> taken
> a new snapshot; or if likewise someone grabs an exclusive-lock, adds
> a
> tuple, and then your backend takes a new snapshot and then sees the
> old buffer contents. Basically, any time someone grabs an
> exclusive-lock and releases it, it's necessary for all observers to
> see the updated contents by the time the exclusive lock is released.
>
>> In other cases after obtaining lock check is done if buffer has
>> associated
>> updatable buffer and if local "mirror" has it too, then swap should
>> take
>> place.
>
> I think this check would have to be done every time someone
> share-locks the buffer, which seems rather expensive.
>
>> Logic about updatable buffers is similar to "shared buffers", each
>> updatable
>> buffer has pin count, and updatable buffer can't be free if someone
>> uses it,
>> but in contrast to "normal buffers", updatable buffers doesn't have
>> any
>> support for locking etc. Updatable buffers exists only on free list,
>> or when
>> associated with buffer.
>
> I don't see how you're going to get away with removing buffer locks.
> They exist for a reason, and adding mmap() to the mix is going to
> require MORE locking, not less.
>
>> In future, I will change version to shared segment id, something
>> like
>> relation's oid + block, but ids will have continuous numbering
>> 1,2,3..., so I
>> will be able to bypass smgr/md during read, and tag version check -
>> this looks
>> like faster solution.
>
> I don't understand this part at all.

To my previous post I want to clarify that "updatable buffers" are
implemented in shared memory, so there is no way that process has own
copy of data.

Regards,
Radek.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-18 12:29:22
Message-ID: 4DAC2EA2.20709@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/18/2011 12:48 AM, Greg Smith wrote:
> Andrew Dunstan wrote:
>> Now we could certainly make this quite a bit slicker. Apart from
>> anything else, we should change the indent source code tarball so it
>> unpacks into its own directory. Having it unpack into the current
>> directory is ugly and unfriendly. And we should get rid of the "make
>> clean" line in the install target of entab's makefile, which just
>> seems totally ill-conceived.
>
> I think the script I submitted upthread has most of the additional
> slickness needed here. Looks like we both were working on documenting
> a reasonable way to do this at the same time the other day. The idea
> of any program here relying on being able to write to /usr/local/bin
> as your example did makes this harder for people to run; that's why I
> made everything in the build tree and just pushed the appropriate
> directories into the PATH.
>
> Since I see providing a script to automate this whole thing as the
> preferred way to make this easier, re-packaging the indent source
> tarball to extract to a directory doesn't seem worth the backwards
> compatibility trouble it will introduce. Improving the entab makefile
> I don't have an opinion on.

Personally, I want pgindent installed in /usr/local/ or similar. That
way I can have multiple trees and it will work in all of them without my
having to build it for each. What I don't want is for the installed
patched BSD indent to conflict with the system's indent, which is why I
renamed it. If you still think that's a barrier to easy use, then I
think we need a way to provide hooks in the makefiles for specifying the
install location, so we can both be satisfied.

Since there's no script I know of other than your prototype, I don't
think repackaging is likely to break anything. That makes it worth doing
*now* rather than later.

But frankly, I'd rather do without an extra script if possible.

>
>> It might also be worth setting it up so that instead of having to
>> pass a path to a typedefs file on the command line, we default to a
>> file sitting in, say, /usr/local/etc. Then you'd just be able to say
>> "pgindent my_file.c".
>
> OK, so I need to update my script to handle either indenting a single
> file, or doing all of them.

Yes, very much.

cheers

andrew


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-18 17:46:30
Message-ID: 4DAC78F6.3020003@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/17/2011 11:07 PM, Tom Lane wrote:
>
> BTW, another thing that should be in the try-try-again category is
> seeing how close we could get to pgindent's results with GNU indent.
> It seems clear to me that a process based on GNU indent would be a
> lot easier for a lot of people. We tried that once before, and couldn't
> get close enough to want to consider switching, but maybe it just needs
> a more determined effort and/or more recent versions of GNU indent.
> (ISTR that we hit some things that seemed to be outright bugs in GNU
> indent, but this was quite a few years ago.)

That seems like a definite win possibility there.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Developement
Organizers of the PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-18 22:20:30
Message-ID: 4DACB92E.8030207@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/18/2011 01:46 PM, Joshua D. Drake wrote:
> On 04/17/2011 11:07 PM, Tom Lane wrote:
>>
>> BTW, another thing that should be in the try-try-again category is
>> seeing how close we could get to pgindent's results with GNU indent.
>> It seems clear to me that a process based on GNU indent would be a
>> lot easier for a lot of people. We tried that once before, and couldn't
>> get close enough to want to consider switching, but maybe it just needs
>> a more determined effort and/or more recent versions of GNU indent.
>> (ISTR that we hit some things that seemed to be outright bugs in GNU
>> indent, but this was quite a few years ago.)
>
> That seems like a definite win possibility there.
>
>

If you're aware of any changes in GNU indent that would overcome the
previous issues, then by all means spend the time on it. If not, it
seems a bit like the definition of insanity ("repeating an experiment
with the expectation of a different result").

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joshua D(dot) Drake <jd(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-18 22:28:36
Message-ID: 1303165621-sup-7675@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Andrew Dunstan's message of lun abr 18 19:20:30 -0300 2011:
>
> On 04/18/2011 01:46 PM, Joshua D. Drake wrote:
> > On 04/17/2011 11:07 PM, Tom Lane wrote:
> >>
> >> BTW, another thing that should be in the try-try-again category is
> >> seeing how close we could get to pgindent's results with GNU indent.
> >> It seems clear to me that a process based on GNU indent would be a
> >> lot easier for a lot of people. We tried that once before, and couldn't
> >> get close enough to want to consider switching, but maybe it just needs
> >> a more determined effort and/or more recent versions of GNU indent.
> >> (ISTR that we hit some things that seemed to be outright bugs in GNU
> >> indent, but this was quite a few years ago.)
> >
> > That seems like a definite win possibility there.
>
> If you're aware of any changes in GNU indent that would overcome the
> previous issues, then by all means spend the time on it. If not, it
> seems a bit like the definition of insanity ("repeating an experiment
> with the expectation of a different result").

The source of GNU indent itself is 3x what it was when the experiment
was last reported. (I checked this about a year ago with an eye on
"repeating the experiment" but then I failed to actually do it.) It
seems fair to say that, yes, it has changed a bit in the meantime.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-18 23:25:33
Message-ID: 18543.1303169133@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Andrew Dunstan's message of lun abr 18 19:20:30 -0300 2011:
>> On 04/17/2011 11:07 PM, Tom Lane wrote:
>>> ... but maybe it just needs
>>> a more determined effort and/or more recent versions of GNU indent.

>> If you're aware of any changes in GNU indent that would overcome the
>> previous issues, then by all means spend the time on it. If not, it
>> seems a bit like the definition of insanity ("repeating an experiment
>> with the expectation of a different result").

> The source of GNU indent itself is 3x what it was when the experiment
> was last reported. (I checked this about a year ago with an eye on
> "repeating the experiment" but then I failed to actually do it.) It
> seems fair to say that, yes, it has changed a bit in the meantime.

Also, my recollection of the previous go-round is that we gave up rather
quickly. Maybe by now there is somebody willing to put more than
minimal effort into getting the options just so.

regards, tom lane


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-04-22 07:20:46
Message-ID: 4DB12C4E.4020604@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Personally, I want pgindent installed in /usr/local/ or similar. That
> way I can have multiple trees and it will work in all of them without
> my having to build it for each. What I don't want is for the installed
> patched BSD indent to conflict with the system's indent, which is why
> I renamed it. If you still think that's a barrier to easy use, then I
> think we need a way to provide hooks in the makefiles for specifying
> the install location, so we can both be satisfied.

I don't think there's a conflict here, because the sort of uses I'm
worried about don't want to install the thing at all; just to run it. I
don't really care what "make install" does because I never intend to run
it; dumping into /usr/local is completely reasonable for the people who do.

> Since there's no script I know of other than your prototype, I don't
> think repackaging is likely to break anything. That makes it worth
> doing *now* rather than later.
>
> But frankly, I'd rather do without an extra script if possible.

Fine, the renaming bit I'm not really opposed to. The odds there's
anyone using this thing that isn't reading this message exchange is
pretty low. There is the documentation backport issue if you make any
serious change it though. Maybe put the new version in another
location, leave the old one where it is?

There's a fair number of steps to this though. It's possible to
automate them all such that running the program is trivial. I don't
know how we'd ever get that same ease of use without some sort of
scripting for the whole process. Could probably do it in a makefile
instead, but I don't know if that's really any better. The intersection
between people who want to run this and people who don't have bash
available is pretty slim I think. I might re-write in Perl to make it
more portable, but I think that will be at the expense of making it
harder for people to tweak if it doesn't work out of the box. More
code, too.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-10-11 22:07:04
Message-ID: 201110112207.p9BM74411934@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Attached are two patches, one to remove some infelicity in the entab
> makefile, and the other to allow skipping specifying the typedefs file

I have applied the 'entab' Makefile fix.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Date: 2011-10-12 20:42:33
Message-ID: 201110122042.p9CKgYM25299@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Now we could certainly make this quite a bit slicker. Apart from
> anything else, we should change the indent source code tarball so it
> unpacks into its own directory. Having it unpack into the current

Yes, done.

> directory is ugly and unfriendly. And we should get rid of the "make
> clean" line in the install target of entab's makefile, which just seems
> totally ill-conceived.

Yes, fixed.

> It might also be worth setting it up so that instead of having to pass a
> path to a typedefs file on the command line, we default to a file
> sitting in, say, /usr/local/etc. Then you'd just be able to say
> "pgindent my_file.c".

Yes, also done.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +