From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | b8flowerfire <b8flowerfire(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller |
Date: | 2014-06-10 17:43:57 |
Message-ID: | CAMkU=1y3KGb3_SLN8H5X46B3Q-Xupnr=ogAF_hcQbYGW0Un1og@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 10, 2014 at 5:17 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The problem case is when you have 1 batch and the increased memory
> consumption causes you to switch to 2 batches. That's expensive. It
> seems clear based on previous testing that *on the average*
> NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an
> increase in the number of batches it will be much worse - particularly
> because the only way we ever increase the number of batches is to
> double it, which is almost always going to be a huge loss.
>
Is there a reason we don't do hybrid hashing, where if 80% fits in memory
than we write out only the 20% that doesn't? And then when probing the
table with the other input, the 80% that land in in-memory buckets get
handled immediately, and only the 20 that land in the on-disk buckets get
written for the next step?
Obviously no one implemented it yet, but is there a fundamental reason for
that or just a round tuit problem?
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Adam Brightwell | 2014-06-10 17:46:30 | Re: API change advice: Passing plan invalidation info from the rewriter into the planner? |
Previous Message | shreesha21 | 2014-06-10 17:41:01 | Getting “make: Entering an unknown directory” error while building postgresql on MIPS platform |