patch review : Add ability to constrain backend temporary file space

Lists: pgsql-hackers
From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: patch review : Add ability to constrain backend temporary file space
Date: 2011-05-31 21:24:48
Message-ID: BANLkTi=-MTigvB+q6AxXH6-hOgGVkwu2Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

here is a partial review of your patch, better than keeping it
sleeping in the commitfest queue I hope.

Submission review
================

* The patch is not in context diff format.
* The patch apply, but contains some extra whitespace.
* Documentation is here but not explicit about 'temp tables',
maybe worth adding that this won't limit temporary table size ?
* There is no test provided. One can be expected to check that the
feature work.

Code review
=========

* in fd.c, I think that "temporary_files_size -=
(double)vfdP->fileSize;" should be done later in the function once we
have successfully unlink the file, not before.

* I am not sure it is better to add a fileSize like you did or use
relationgetnumberofblock() when file is about to be truncated or
unlinked, this way the seekPos should be enough to increase the global
counter.

* temporary_files_size, I think it is better to have a number of pages
à la postgresql than a kilobyte size

* max_temp_files_size, I'll prefer an approach like shared_buffers
GUC: you can use pages, or KB, MB, ...

Simple Feature test
==============

either explain buffers is wrong or the patch is wrong:
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
time=364.373..518.940 rows=100000 loops=1)
Sort Key: generate_series
Sort Method: external merge Disk: 1352kB
Buffers: local hit=393, temp read=249 written=249
-> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
(actual time=0.025..138.754 rows=100000 loops=1)
Buffers: local hit=393
Total runtime: 642.874 ms
(7 rows)

cedric=# set max_temp_files_size to 1900;
SET
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size
STATEMENT: explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size

Do you have some testing method I can apply to track that without
explain (analyze, buffers) before going to low-level monitoring ?

Architecture review
==============

max_temp_files_size is used for the global space used per backend.
Based on how work_mem work I expect something like "work_disk" to
limit per file and maybe a backend_work_disk (and yes maybe a
backend_work_mem ?!) per backend.
So I propose to rename the current GUC to something like backend_work_disk.

Patch is not large and easy to read.
I like the idea and it sounds useful.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-01 00:27:46
Message-ID: 4DE58782.2010001@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/06/11 09:24, Cédric Villemain wrote:
> Hello
>
> here is a partial review of your patch, better than keeping it
> sleeping in the commitfest queue I hope.
>
> Submission review
> ================
>
> * The patch is not in context diff format.
> * The patch apply, but contains some extra whitespace.
> * Documentation is here but not explicit about 'temp tables',
> maybe worth adding that this won't limit temporary table size ?
> * There is no test provided. One can be expected to check that the
> feature work.
>
> Code review
> =========
>
> * in fd.c, I think that "temporary_files_size -=
> (double)vfdP->fileSize;" should be done later in the function once we
> have successfully unlink the file, not before.
>
> * I am not sure it is better to add a fileSize like you did or use
> relationgetnumberofblock() when file is about to be truncated or
> unlinked, this way the seekPos should be enough to increase the global
> counter.
>
> * temporary_files_size, I think it is better to have a number of pages
> à la postgresql than a kilobyte size
>
> * max_temp_files_size, I'll prefer an approach like shared_buffers
> GUC: you can use pages, or KB, MB, ...
>
>
> Simple Feature test
> ==============
>
> either explain buffers is wrong or the patch is wrong:
> cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
> Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
> time=364.373..518.940 rows=100000 loops=1)
> Sort Key: generate_series
> Sort Method: external merge Disk: 1352kB
> Buffers: local hit=393, temp read=249 written=249
> -> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
> (actual time=0.025..138.754 rows=100000 loops=1)
> Buffers: local hit=393
> Total runtime: 642.874 ms
> (7 rows)
>
> cedric=# set max_temp_files_size to 1900;
> SET
> cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
> ERROR: aborting due to exceeding max temp files size
> STATEMENT: explain (analyze,buffers) select * from foo order by 1 desc ;
> ERROR: aborting due to exceeding max temp files size
>
> Do you have some testing method I can apply to track that without
> explain (analyze, buffers) before going to low-level monitoring ?
>
> Architecture review
> ==============
>
> max_temp_files_size is used for the global space used per backend.
> Based on how work_mem work I expect something like "work_disk" to
> limit per file and maybe a backend_work_disk (and yes maybe a
> backend_work_mem ?!) per backend.
> So I propose to rename the current GUC to something like backend_work_disk.
>
> Patch is not large and easy to read.
> I like the idea and it sounds useful.
>

Hi Cédric,

Thanks for reviewing!

The diff format is odd - I'm sure I told git to do a context diff,
however running 'file' on the v2 patch results it it saying 'HTML
document'... hmm, I'll check more carefully for the next one I do.

Yeah, I agree about explicitly mentioning how it does not constraint
temp tables.

Hmm, test - good idea - I'll look into it.

Re the code comments - I agree with most of them. However with respect
to the Guc units, I copied the setup for work_mem as that seemed the
most related. Also I used bytes for the internal variable in fd.c to
limit the number of arithmetic operations while comparing.

For testing I set 'log_temp_files = 0' in postgresql.conf and the
numbers seemed to agree exactly - I didn't think to use EXPLAIN +
buffers... not sure why they would disagree. Have a go with
log_temp_files set and see what you find!

I like yhour suggesting for the name. Given that 'work_mem' is per
backend, I'm leaning towards the idea of 'work_disk' being sufficiently
descriptive.

Thanks again, and I'll come up with a v3 patch.

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-01 00:32:31
Message-ID: 4DE5889F.1000005@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/06/11 12:27, Mark Kirkwood wrote:
>
> Re the code comments - I agree with most of them. However with respect
> to the Guc units, I copied the setup for work_mem as that seemed the
> most related.

Also - forget to mention - I *thought* you could specify the temp files
size GUC as KB, MB, GB or plain old pages, but I'll check this, in case
I've busted it somehow.

Cheers

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-01 00:46:18
Message-ID: 4DE58BDA.9010102@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/06/11 12:32, Mark Kirkwood wrote:
> On 01/06/11 12:27, Mark Kirkwood wrote:
>>
>> Re the code comments - I agree with most of them. However with
>> respect to the Guc units, I copied the setup for work_mem as that
>> seemed the most related.
>
> Also - forget to mention - I *thought* you could specify the temp
> files size GUC as KB, MB, GB or plain old pages, but I'll check this,
> in case I've busted it somehow.
>

Doh, ignore me here :-( You are correct - I've specified GUC_UNITS_KB
(copying work_mem) - it is easy enough to allow pages like shared_buffers.

Cheers

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-01 23:35:55
Message-ID: 4DE6CCDB.9090608@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/06/11 09:24, Cédric Villemain wrote:
>
> Submission review
> ================
>
> * The patch is not in context diff format.
> * The patch apply, but contains some extra whitespace.
> * Documentation is here but not explicit about 'temp tables',
> maybe worth adding that this won't limit temporary table size ?
> * There is no test provided. One can be expected to check that the
> feature work.
>

I've created a new patch (attached) with 'git diff -c' so hopefully the
format is ok now. I've added a paragraph about how temporary *table*
storage is not constrained, only temp work files. I made the point that
the *query* that creates a temp table will have its work files
constrained too. I've added a test too.

The major visible change is that the guc has been renamed to
'work_disk', to gel better with the idea that it is the disk spill for
'work_mem'.

> Code review
> =========
>
> * in fd.c, I think that "temporary_files_size -=
> (double)vfdP->fileSize;" should be done later in the function once we
> have successfully unlink the file, not before.
>

Agreed, I've moved it.

> * I am not sure it is better to add a fileSize like you did or use
> relationgetnumberofblock() when file is about to be truncated or
> unlinked, this way the seekPos should be enough to increase the global
> counter.
>

The temp files are not relations so I'd have to call stat I guess. Now
truncate/unlink can happen quite a lot (e.g hash with many files) and I
wanted to avoid adding too many library calls to this code for
performance reasons, so on balance I'm thinking it is gonna be more
efficient to remember the size in the Vfd.

> * temporary_files_size, I think it is better to have a number of pages
> à la postgresql than a kilobyte size
>

The temp file related stuff is worked on in bytes or kbytes in the fd.c
and other code (e.g log_temp_files), so it seems more natural to stay in
kbytes. Also work_disk is really only a spillover from work_mem (in
kbytes) so seems logical to match its units.

> * max_temp_files_size, I'll prefer an approach like shared_buffers
> GUC: you can use pages, or KB, MB, ...
>

We can use KB, MB, GB - just not pages, again like work_mem. These files
are not relations so I'm not sure pages is an entirely appropriate unit
for them.

> Simple Feature test
> ==============
>
> either explain buffers is wrong or the patch is wrong:
> cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
> Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
> time=364.373..518.940 rows=100000 loops=1)
> Sort Key: generate_series
> Sort Method: external merge Disk: 1352kB
> Buffers: local hit=393, temp read=249 written=249
> -> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
> (actual time=0.025..138.754 rows=100000 loops=1)
> Buffers: local hit=393
> Total runtime: 642.874 ms
> (7 rows)
>
> cedric=# set max_temp_files_size to 1900;
> SET
> cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
> ERROR: aborting due to exceeding max temp files size
> STATEMENT: explain (analyze,buffers) select * from foo order by 1 desc ;
> ERROR: aborting due to exceeding max temp files size
>
> Do you have some testing method I can apply to track that without
> explain (analyze, buffers) before going to low-level monitoring ?
>

We're looking at this...

> Architecture review
> ==============
>
> max_temp_files_size is used for the global space used per backend.
> Based on how work_mem work I expect something like "work_disk" to
> limit per file and maybe a backend_work_disk (and yes maybe a
> backend_work_mem ?!) per backend.
> So I propose to rename the current GUC to something like backend_work_disk.
>

Done - 'work_disk' it is to match 'work_mem'.

> Patch is not large and easy to read.
> I like the idea and it sounds useful.
>

Great! Cheers

Mark

Attachment Content-Type Size
temp-files-v3.patch.gz application/x-gzip 3.1 KB

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 02:52:18
Message-ID: 4DE6FAE2.9060608@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/06/11 11:35, Mark Kirkwood wrote:
> On 01/06/11 09:24, Cédric Villemain wrote: Simple Feature test
>> ==============
>>
>> either explain buffers is wrong or the patch is wrong:
>> cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
>> QUERY PLAN
>> -----------------------------------------------------------------------------------------------------------------
>>
>> Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
>> time=364.373..518.940 rows=100000 loops=1)
>> Sort Key: generate_series
>> Sort Method: external merge Disk: 1352kB
>> Buffers: local hit=393, temp read=249 written=249
>> -> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
>> (actual time=0.025..138.754 rows=100000 loops=1)
>> Buffers: local hit=393
>> Total runtime: 642.874 ms
>> (7 rows)
>>
>> cedric=# set max_temp_files_size to 1900;
>> SET
>> cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
>> ERROR: aborting due to exceeding max temp files size
>> STATEMENT: explain (analyze,buffers) select * from foo order by 1
>> desc ;
>> ERROR: aborting due to exceeding max temp files size
>>
>> Do you have some testing method I can apply to track that without
>> explain (analyze, buffers) before going to low-level monitoring ?
>>
>
> We're looking at this...
>

Arg - I was being dense. FileWrite can write *anywhere* in the file, so
need to be smart about whether to add to the total filesize or not.

I'll update the Commitfest page as soon as it sends me my password reset
mail...

Cheers

Mark

Attachment Content-Type Size
temp-files-v4.patch.gz application/x-gzip 2.9 KB

From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 06:34:40
Message-ID: BANLkTin_J0xDP2mXjp8gWDAoFUcG8qQUNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 1, 2011 at 6:35 PM, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> On 01/06/11 09:24, Cédric Villemain wrote:
>>
>>  Submission review
>> ================
>>
>>     * The patch is not in context diff format.
>>     * The patch apply, but contains some extra whitespace.
>>     * Documentation is here but not explicit about 'temp tables',
>> maybe worth adding that this won't limit temporary table size ?
>>     * There is no test provided. One can be expected to check that the
>> feature work.
>>
>
> I've created a new patch (attached)

Hi Mark,

A few comments:

- why only superusers can set this? if this is a per-backend setting,
i don't see the problem in allowing normal users to restrict their own
queries

- why the calculations are done as double?
+ if (temporary_files_size / 1024.0 > (double)work_disk)

- the patch adds this to serial_schedule but no test has been added...

diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index bb654f9..325cb3d 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -127,3 +127,4 @@ test: largeobject
test: with
test: xml
test: stats
+test: resource

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 09:06:19
Message-ID: 4DE7528B.8040102@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/06/11 18:34, Jaime Casanova wrote:
> On Wed, Jun 1, 2011 at 6:35 PM, Mark Kirkwood
> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
>>
>> I've created a new patch (attached)
> Hi Mark,
>
> A few comments:
>
> - why only superusers can set this? if this is a per-backend setting,
> i don't see the problem in allowing normal users to restrict their own
> queries
>

Yeah, that's a good point, I was thinking that as a resource control
parameter it might be good to prevent casual users increasing their
limit. However the most likely use case would be ad-hoc query tools that
don't have the ability to do SET anyway. Hmm - what do you think?

> - why the calculations are done as double?
> + if (temporary_files_size / 1024.0> (double)work_disk)
>
>
>

I originally coded this with the idea that the guc would (or could) be a
double - to allow for seriously big limits in data warehousing
situations etc. But subsequent discussion led to that being discarded.
However work_disk can go to INT_MAX * 1024 bytes so I need to make sure
that whatever datatype I use can handle that - double seemed sufficient.

> - the patch adds this to serial_schedule but no test has been added...
>
> diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
> index bb654f9..325cb3d 100644
> --- a/src/test/regress/serial_schedule
> +++ b/src/test/regress/serial_schedule
> @@ -127,3 +127,4 @@ test: largeobject
> test: with
> test: xml
> test: stats
> +test: resource
>

Bugger - I think I forgot to 'git add' em before doing the diff.

I can sense a v5 coming on.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 13:46:46
Message-ID: BANLkTikw67ZfX2UL8H7UU5wbVEL+mhbuBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 1, 2011 at 7:35 PM, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> Done - 'work_disk' it is to match 'work_mem'.

I guess I'm bikeshedding here, but I'm not sure I really buy this
parallel. work_mem is primarily a query planner parameter; it says,
if you're going to need more memory than this, then you have to
execute the plan some other way. This new parameter is not a query
planner paramater AIUI - its job is to KILL things if they exceed the
limit. In that sense it's more like statement_timeout. I can imagine
us wanting more parameters like this too. Kill the query if it...

...takes too long (statement_timeout)
...uses too much temporary file space (the current patch)
...uses too much CPU time
...uses too much RAM
...generates too much disk I/O
...has too high an estimated cost
...others?

So I'm not sure work_disk is a great name. Actually, work_mem is
already not a great name even for what it is, but at any rate I think
this is something different.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 14:36:10
Message-ID: 15034.1307025370@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> So I'm not sure work_disk is a great name.

I agree. Maybe something along the lines of "temp_file_limit"?

Also, once you free yourself from the analogy to work_mem, you could
adopt some more natural unit than KB. I'd think MB would be a practical
unit size, and would avoid (at least for the near term) the need to make
the parameter a float.

regards, tom lane


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 14:58:24
Message-ID: BANLkTinzSMbwLyyJ04=cuc6zjHbrifOw-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/2 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Jun 1, 2011 at 7:35 PM, Mark Kirkwood
> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
>> Done - 'work_disk' it is to match 'work_mem'.
>
> I guess I'm bikeshedding here, but I'm not sure I really buy this
> parallel.  work_mem is primarily a query planner parameter; it says,
> if you're going to need more memory than this, then you have to
> execute the plan some other way.  This new parameter is not a query
> planner paramater AIUI - its job is to KILL things if they exceed the
> limit.  In that sense it's more like statement_timeout.  I can imagine
> us wanting more parameters like this too.  Kill the query if it...
>
> ...takes too long (statement_timeout)
> ...uses too much temporary file space (the current patch)
> ...uses too much CPU time
> ...uses too much RAM
> ...generates too much disk I/O
> ...has too high an estimated cost
> ...others?

you're sorting limits for 'executor' and limits for 'planner': uses
too much CPU time VS has too high an estimated cost.

(backend)_work_(disk|mem) looks good also for the 'has too high an
estimated cost' series: limiter at the planner level should allow
planner to change its strategy, I think... But probably not something
to consider too much right now.

>
> So I'm not sure work_disk is a great name.  Actually, work_mem is
> already not a great name even for what it is, but at any rate I think
> this is something different.

I am not specially attached to a name, idea was not to use work_disk
but backend_work_disk. I agree with you anyway, and suggestion from
Tom is fine for me (temp_file_limit).

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 16:53:15
Message-ID: BANLkTimUs3jhReGxCam14PAjWC5qAbmEOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 2, 2011 at 10:58 AM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> I am not specially attached to a name, idea was not to use work_disk
> but backend_work_disk. I agree with you anyway, and suggestion from
> Tom is fine for me (temp_file_limit).

Yeah, I like that too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 18:43:15
Message-ID: BANLkTinUTfwzprWzn_Ku5Vy_PS66jBnQXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 2, 2011 at 7:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Also, once you free yourself from the analogy to work_mem, you could
> adopt some more natural unit than KB.  I'd think MB would be a practical
> unit size, and would avoid (at least for the near term) the need to make
> the parameter a float.

As long as users can specify any unit when they input the parameter it
doesn't really matter what unit the variable is stored in. I'm not
sure the GUC infrastructure can currently handle megabytes as the
native units for a guc though.

--
greg


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 22:25:57
Message-ID: 4DE80DF5.1030203@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/06/11 02:36, Tom Lane wrote:
> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>> So I'm not sure work_disk is a great name.
> I agree. Maybe something along the lines of "temp_file_limit"?
>
> Also, once you free yourself from the analogy to work_mem, you could
> adopt some more natural unit than KB. I'd think MB would be a practical
> unit size, and would avoid (at least for the near term) the need to make
> the parameter a float.
>

I agree, and I like your name suggestion (and also agree with Robert
that 'work_mem' was not such a great name).

I'd be quite happy to use MB (checks guc.h) but looks like we don't have
a GUC_UNIT_MB, not sure if adding it would be an issue (suggests on a
suitable mask if people think it is a great idea).

Cheers

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-02 23:01:43
Message-ID: 4DE81657.7090901@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/06/11 18:34, Jaime Casanova wrote:
>
>
> - the patch adds this to serial_schedule but no test has been added...
>
> diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
> index bb654f9..325cb3d 100644
> --- a/src/test/regress/serial_schedule
> +++ b/src/test/regress/serial_schedule
> @@ -127,3 +127,4 @@ test: largeobject
> test: with
> test: xml
> test: stats
> +test: resource
>

Corrected v4 patch with the test files, for completeness. Note that
discussion has moved on and there will be a v5 :-)

Attachment Content-Type Size
temp-files-v4.1.patch.gz application/x-gzip 3.1 KB

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-03 00:33:30
Message-ID: BANLkTinqUS0frHropGwiROBTcZ5h3Y2=pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/2 Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
> On 01/06/11 09:24, Cédric Villemain wrote:
>> * I am not sure it is better to add a fileSize like you did or use
>> relationgetnumberofblock() when file is about to be truncated or
>> unlinked, this way the seekPos should be enough to increase the global
>> counter.
>>
>
> The temp files are not relations so I'd have to call stat I guess. Now
> truncate/unlink can happen quite a lot (e.g hash with many files) and I
> wanted to avoid adding too many library calls to this code for performance
> reasons, so on balance I'm thinking it is gonna be more efficient to
> remember the size in the Vfd.

I am not sure temporary relation are truncated. I have not checked
right now, but IIRC, we don't need to truncate it. And I believe it
would defeat the log_temp feature because log_temp is done on
FileClose() only.

If we are to add a field in struct vfd to keep the filesize then some
optimization may happens for logging too....


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-03 01:12:04
Message-ID: 4DE834E4.4020100@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/06/11 12:33, Cédric Villemain wrote:
> 2011/6/2 Mark Kirkwood<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
>> On 01/06/11 09:24, Cédric Villemain wrote:
>>> * I am not sure it is better to add a fileSize like you did or use
>>> relationgetnumberofblock() when file is about to be truncated or
>>> unlinked, this way the seekPos should be enough to increase the global
>>> counter.
>>>
>> The temp files are not relations so I'd have to call stat I guess. Now
>> truncate/unlink can happen quite a lot (e.g hash with many files) and I
>> wanted to avoid adding too many library calls to this code for performance
>> reasons, so on balance I'm thinking it is gonna be more efficient to
>> remember the size in the Vfd.
> I am not sure temporary relation are truncated. I have not checked
> right now, but IIRC, we don't need to truncate it. And I believe it
> would defeat the log_temp feature because log_temp is done on
> FileClose() only.
>
> If we are to add a field in struct vfd to keep the filesize then some
> optimization may happens for logging too....

We pretty much need to store the file size I think, due to needing to
work out if FileWrite is re-writing inside the file or extending it. So
maybe we could avoid a stat on unlink too (mind you it is a nice
validity check that the ongoing size accounting is correct). I was/am
keen to avoid changing too much here (heh - maybe I'm being too timid...).

Cheers

Mark


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-14 14:52:45
Message-ID: BANLkTi=_QRbYfcm3+UQ+ai3S3EGNKEtVRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/3 Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
> On 02/06/11 18:34, Jaime Casanova wrote:
>>
>>
>> - the patch adds this to serial_schedule but no test has been added...
>>
>> diff --git a/src/test/regress/serial_schedule
>> b/src/test/regress/serial_schedule
>> index bb654f9..325cb3d 100644
>> --- a/src/test/regress/serial_schedule
>> +++ b/src/test/regress/serial_schedule
>> @@ -127,3 +127,4 @@ test: largeobject
>>  test: with
>>  test: xml
>>  test: stats
>> +test: resource
>>
>
> Corrected v4 patch with the test files, for completeness. Note that
> discussion has moved on and there will be a v5 :-)
>

Mark, can you submit your updated patch ?

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-15 00:35:32
Message-ID: 4DF7FE54.1090404@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15/06/11 02:52, Cédric Villemain wrote:
> 2011/6/3 Mark Kirkwood<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
>>
>> Corrected v4 patch with the test files, for completeness. Note that
>> discussion has moved on and there will be a v5 :-)
>>
> Mark, can you submit your updated patch ?
>

Thanks for the reminder! Here is v5. The changes are:

- guc is called temp_file_limit, which seems like the best choice to
date :-)
- removed code to do with truncating files, as after testing I agree
with you that temp work files don't seem to get truncated.

I have not done anything about the business on units - so we are in KB
still - there is no MB unit avaiable in the code as yet - I'm not sure
we need one at this point, as most folk who use this feature will find
4096GB a big enough *per backend* limit. Obviously it might be a good
investment to plan to have MB, and GB as possible guc units too. Maybe
this could be a separate piece of work since any *other* resource
limiters we add might find it convenient to have these available?

Cheers

Mark

Attachment Content-Type Size
temp-files-v5.patch.gz application/x-gzip 3.1 KB

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-16 21:49:12
Message-ID: BANLkTikOhDa86fiR2KJ6aPptjpoiRT9oYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/15 Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
> On 15/06/11 02:52, Cédric Villemain wrote:
>>
>> 2011/6/3 Mark Kirkwood<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
>>>
>>> Corrected v4 patch with the test files, for completeness. Note that
>>> discussion has moved on and there will be a v5 :-)
>>>
>> Mark, can you submit your updated patch ?
>>
>
> Thanks for the reminder! Here is v5. The changes are:

I have issues applying it.
Please can you remove trailing space?
Also, you can generate a cool patch like this :

get git-external-diff from postgres/src/tools to /usr/lib/git-core/
chmod +x it
export GIT_EXTERNAL_DIFF=git-external-diff
git format-patch --ext-diff origin

my log:
$ git apply temp-files-v5.patch
temp-files-v5.patch:22: trailing whitespace.
defaults to unlimited (<literal>-1</>). Values larger than zero
temp-files-v5.patch:23: trailing whitespace.
constraint the temporary file space usage to be that number of
temp-files-v5.patch:28: trailing whitespace.
the total space used by all the files produced by one backend is
temp-files-v5.patch:35: trailing whitespace.
constrain disk space used for temporary table storage. However if
temp-files-v5.patch:105: trailing whitespace.
/*
error: patch failed: src/backend/storage/file/fd.c:132
error: src/backend/storage/file/fd.c: patch does not apply
error: src/test/regress/expected/resource.out: No such file or directory
error: src/test/regress/sql/resource.sql: No such file or directory

>
> - guc is called temp_file_limit, which seems like the best choice to date
> :-)
> - removed code to do with truncating files, as after testing I agree with
> you that temp work files don't seem to get truncated.
>
> I have not done anything about the business on units - so we are in KB still
> - there is no MB unit avaiable in the code as yet - I'm not sure we need one
> at this point, as most folk who use this feature will find 4096GB a big
> enough *per backend* limit. Obviously it might be a good investment to plan
> to have MB, and GB as possible guc units too. Maybe this could be a separate
> piece of work since any *other* resource limiters we add might find it
> convenient to have these available?
>
> Cheers
>
> Mark
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-17 01:08:58
Message-ID: 4DFAA92A.80209@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/06/11 09:49, Cédric Villemain wrote:
>
> I have issues applying it.
> Please can you remove trailing space?
> Also, you can generate a cool patch like this :
>
> get git-external-diff from postgres/src/tools to /usr/lib/git-core/
> chmod +x it
> export GIT_EXTERNAL_DIFF=git-external-diff
> git format-patch --ext-diff origin
>
>
> my log:
> $ git apply temp-files-v5.patch
> temp-files-v5.patch:22: trailing whitespace.
> defaults to unlimited (<literal>-1</>). Values larger than zero
> temp-files-v5.patch:23: trailing whitespace.
> constraint the temporary file space usage to be that number of
> temp-files-v5.patch:28: trailing whitespace.
> the total space used by all the files produced by one backend is
> temp-files-v5.patch:35: trailing whitespace.
> constrain disk space used for temporary table storage. However if
> temp-files-v5.patch:105: trailing whitespace.
> /*
> error: patch failed: src/backend/storage/file/fd.c:132
> error: src/backend/storage/file/fd.c: patch does not apply
> error: src/test/regress/expected/resource.out: No such file or directory
> error: src/test/regress/sql/resource.sql: No such file or directory
>

I can generate a patch that way no problem - however it does not apply
using the above method:

$ git apply /data0/download/postgres/patches/temp/temp-files-v5.1.patch
error: doc/src/sgml/config.sgml: already exists in working directory
error: src/backend/storage/file/fd.c: already exists in working directory
error: src/backend/utils/misc/guc.c: already exists in working directory
error: src/backend/utils/misc/postgresql.conf.sample: already exists in
working directory
error: src/include/utils/guc.h: already exists in working directory
error: src/include/utils/guc_tables.h: already exists in working directory
error: src/test/regress/serial_schedule: already exists in working directory

However it applies just fine using the "normal" method:

$ patch --dry-run -p1 <
/data0/download/postgres/patches/temp/temp-files-v5.1.patch
patching file doc/src/sgml/config.sgml
patching file src/backend/storage/file/fd.c
patching file src/backend/utils/misc/guc.c
patching file src/backend/utils/misc/postgresql.conf.sample
patching file src/include/utils/guc.h
patching file src/include/utils/guc_tables.h
patching file src/test/regress/expected/resource.out
patching file src/test/regress/serial_schedule
patching file src/test/regress/sql/resource.sql

Any idea?

regards

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-17 01:51:37
Message-ID: 4DFAB329.7050806@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/06/11 13:08, Mark Kirkwood wrote:
> On 17/06/11 09:49, Cédric Villemain wrote:
>>
>> I have issues applying it.
>> Please can you remove trailing space?
>> Also, you can generate a cool patch like this :
>>
>> get git-external-diff from postgres/src/tools to /usr/lib/git-core/
>> chmod +x it
>> export GIT_EXTERNAL_DIFF=git-external-diff
>> git format-patch --ext-diff origin

I think I have the trailing spaces removed, and patch is updated for the
variable renaming recently done in fd.c

I have no idea why I can't get the git apply to work (obviously I have
exceeded by git foo by quite a ways), but it should apply for you I hope
(as it patches fine).

regards

Mark

Attachment Content-Type Size
temp-files-v5.1.patch.gz application/x-gzip 3.4 KB

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-17 08:57:53
Message-ID: BANLkTi=HiT_tHwtoOB5OdgnSyZ2=dxQveA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/17 Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
> On 17/06/11 13:08, Mark Kirkwood wrote:
>>
>> On 17/06/11 09:49, Cédric Villemain wrote:
>>>
>>> I have issues applying it.
>>> Please can you remove trailing space?
>>> Also, you can generate a cool patch like this :
>>>
>>> get git-external-diff from postgres/src/tools to /usr/lib/git-core/
>>> chmod +x it
>>> export GIT_EXTERNAL_DIFF=git-external-diff
>>> git format-patch --ext-diff origin
>
> I think I have the trailing spaces removed, and patch is updated for the
> variable renaming recently done in fd.c
>
> I have no idea why I can't get the git apply to work (obviously I have
> exceeded by git foo by quite a ways), but it should apply for you I hope (as
> it patches fine).
>

If I didn't made mistake the attached patch does not have trailling
space anymore and I did a minor cosmetic in FileClose. It is not in
the expected format required by postgresql commiters but can be
applyed with git apply...
It looks like the issue is that patch generated with the git-ext-diff
can not be git applyed (they need to use patch).

Either I did something wrong or git-ext-diff format is not so great.

I didn't test and all yet. From reading, the patch looks sane. I'll
review it later this day or this week-end.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
temp-files-v5.2.patch text/x-patch 8.2 KB

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-20 13:15:40
Message-ID: BANLkTik-yUG6-gH5XOvAbeL1KbRcxF2Vxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/17 Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>:
> 2011/6/17 Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
>> On 17/06/11 13:08, Mark Kirkwood wrote:
>>>
>>> On 17/06/11 09:49, Cédric Villemain wrote:
>>>>
>>>> I have issues applying it.
>>>> Please can you remove trailing space?
>>>> Also, you can generate a cool patch like this :
>>>>
>>>> get git-external-diff from postgres/src/tools to /usr/lib/git-core/
>>>> chmod +x it
>>>> export GIT_EXTERNAL_DIFF=git-external-diff
>>>> git format-patch --ext-diff origin
>>
>> I think I have the trailing spaces removed, and patch is updated for the
>> variable renaming recently done in fd.c
>>
>> I have no idea why I can't get the git apply to work (obviously I have
>> exceeded by git foo by quite a ways), but it should apply for you I hope (as
>> it patches fine).
>>
>
> If I didn't made mistake the attached patch does not have trailling
> space anymore and I did a minor cosmetic in FileClose. It is not in
> the expected format required by postgresql commiters but can be
> applyed with git apply...
> It looks like the issue is that patch generated with the git-ext-diff
> can not be git applyed (they need to use patch).
>
> Either I did something wrong or git-ext-diff format is not so great.
>
>
> I didn't test and all yet. From reading, the patch looks sane. I'll
> review it later this day or this week-end.
>
>

Submission review
===============

I have updated the patch for cosmetic.
(I attach 2 files: 1 patch for 'git-apply' and 1 for 'patch')

I have also updated the guc.c entry (lacks of a NULL for the last
Hooks handler and make the pg_settings comment more like other GUC)

The patch includes doc and test.

Usability review
=================

The patch implements what it claims modulo the BLK size used by
PostgreSQL: the filewrite operation happens before the test on file
limit and PostgreSQL write per 8kB. The result is that a
temp_file_limit=0 does not prevent temp_file writing or creation, but
stop the writing at 8192 bytes.
I've already argue to keep the value as a BLKSZ, but I don't care to
have kB by default.
The doc may need to be updated to reflect this behavior.
Maybe we want to disable the temp_file writing completely with the
value set to 0 ? (it looks useless to me atm but someone may have a
usecase for that ?!)

I don't believe it follows any SQL specs, and so far the community did
not refuse the idea, so looks in the good way.
I believe we want it, my only grippe is that a large value don't
prevent server to use a lot of IO before the operation is aborted, I
don't see real alternative at this level though.

After initial testing some corner cases have been outline (in previous
versions of the patch), so I believe the current code fix them well as
I didn't have issue again.

Feature test
============

The feature does not work exactly as expected because the write limit
is rounded per 8kB because we write before checking. I believe if one
write a file of 1GB in one pass (instead of repetitive 8kB increment),
and the temp_file_limit is 0, then the server will write the 1GB
before aborting.
Also, the patch does not handle a !(returnCode >=0) in FileWrite. I
don't know if it is needed or not.
As the counter is per session and we can have very long session, it
may hurt....but the problem will be larger than just this counter in
such case so...is it worth the trouble to handle it ?

Performance review
===================

not done.

Coding review
=============

I have done some update to the patch to make it follow the coding guidelines.
I don't think it can exist a portability issue.

Tom did say: I'd think MB would be a practical unit size, and would
avoid (at least for the near term) the need to make the parameter a
float.

Architecture review
================

IMO, the code used to log_temp_file is partially deprecated by this
feature: the system call to stat() looks now useles and can be removed
as well as the multiple if () around temp_file_limit or log_temp_file
can be merged.

Review review
=============

I didn't test for performances, I have not tryed to overflow
temporary_files_size but looks hard to raise. (this might be possible
if the temp_file_limit is not used but the temporary_files_size
incremented, but current code does not allow to increment one without
using the feature, so it is good I think)

Mark, please double check that my little updates of your patch did not
remove anything or modify the behavior in a wrong way.

Thanks for your good job,
--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Attachment Content-Type Size
temp-files-v5.3.patch text/x-patch 8.3 KB
temp-files-v5.3-ext-diff.patch text/x-patch 9.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-20 13:21:24
Message-ID: BANLkTi=YbuHDjexdua+dNmNYdqaEev4mdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> The feature does not work exactly as expected because the write limit
> is rounded per 8kB because we write before checking. I believe if one
> write a file of 1GB in one pass (instead of repetitive 8kB increment),
> and the temp_file_limit is 0, then the server will write the 1GB
> before aborting.

Can we rearrange thing so we check first, and then write?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-20 14:39:40
Message-ID: BANLkTimnryt1sS5Mim_a98aMm=jLRu23Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/20 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
> <cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>> The feature does not work exactly as expected because the write limit
>> is rounded per 8kB because we write before checking. I believe if one
>> write a file of 1GB in one pass (instead of repetitive 8kB increment),
>> and the temp_file_limit is 0, then the server will write the 1GB
>> before aborting.
>
> Can we rearrange thing so we check first, and then write?

probably but it needs more work to catch corner cases. We may be safe
to just document that (and also in the code). The only way I see so
far to have a larger value than 8kB here is to have a plugin doing the
sort instead of the postgresql core sort algo.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-21 23:13:12
Message-ID: 4E012588.5030106@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21/06/11 02:39, Cédric Villemain wrote:
> 2011/6/20 Robert Haas<robertmhaas(at)gmail(dot)com>:
>> On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
>> <cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>>> The feature does not work exactly as expected because the write limit
>>> is rounded per 8kB because we write before checking. I believe if one
>>> write a file of 1GB in one pass (instead of repetitive 8kB increment),
>>> and the temp_file_limit is 0, then the server will write the 1GB
>>> before aborting.
>> Can we rearrange thing so we check first, and then write?
> probably but it needs more work to catch corner cases. We may be safe
> to just document that (and also in the code). The only way I see so
> far to have a larger value than 8kB here is to have a plugin doing the
> sort instead of the postgresql core sort algo.
>
>

Thanks guys - will look at moving the check, and adding some
documentation about the possible impacts of plugins (or new executor
methods) that might write in chunks bigger than blocksz.

Maybe a few days - I'm home sick ATM, plus looking after these
http://www.maftet.co.nz/kittens.html

Cheers

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-24 06:01:50
Message-ID: 4E04284E.7040309@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22/06/11 11:13, Mark Kirkwood wrote:
> On 21/06/11 02:39, Cédric Villemain wrote:
>> 2011/6/20 Robert Haas<robertmhaas(at)gmail(dot)com>:
>>> On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
>>> <cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>>>> The feature does not work exactly as expected because the write limit
>>>> is rounded per 8kB because we write before checking. I believe if one
>>>> write a file of 1GB in one pass (instead of repetitive 8kB increment),
>>>> and the temp_file_limit is 0, then the server will write the 1GB
>>>> before aborting.
>>> Can we rearrange thing so we check first, and then write?
>> probably but it needs more work to catch corner cases. We may be safe
>> to just document that (and also in the code). The only way I see so
>> far to have a larger value than 8kB here is to have a plugin doing the
>> sort instead of the postgresql core sort algo.
>>
>>
>
> Thanks guys - will look at moving the check, and adding some
> documentation about the possible impacts of plugins (or new executor
> methods) that might write in chunks bigger than blocksz.
>
> Maybe a few days - I'm home sick ATM, plus looking after these
> http://www.maftet.co.nz/kittens.html
>
>

This version moves the check *before* we write the new buffer, so
should take care of issues about really large write buffers, plugins
etc. Also I *think* that means there is no need to amend the documentation.

Cheers

Mark

P.s: Hopefully I've got a context diff this time, just realized that git
diff -c does *not* produce a context diff (doh).

Attachment Content-Type Size
temp-files-v6.patch.gz application/x-gzip 3.1 KB

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-07 21:26:31
Message-ID: CAF6yO=1PHON8FAmmG6GxR1qDrEoj-iK+wath_4jFGoP5wVS7sA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/24 Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
>
> This  version moves the check *before* we write the new buffer, so should
> take care of issues about really large write buffers, plugins etc. Also I
> *think* that means there is no need to amend the documentation.
>
> Cheers
>
> Mark
>
> P.s: Hopefully I've got a context diff this time, just realized that git
> diff -c does *not* produce a context diff (doh).
>

Mark, I have this (hopefully) final review in my TODO list, I won't be
able to check until CHAR(11) is done, and probably not before the 19th
of July.

So... if one want to make progress in the meantime, please do.
--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-10 18:44:28
Message-ID: 4E19F30C.9000602@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

This patch needs a new reviewer, per Cedric. Please help!

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: josh(at)agliodbs(dot)com
Cc: cedric(dot)villemain(dot)debian(at)gmail(dot)com, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-14 06:48:24
Message-ID: 20110714.154824.579528996447115832.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Hackers,
>
> This patch needs a new reviewer, per Cedric. Please help!

Hi I am the new reviewer:-)

I have looked into the v6 patches. One thing I would like to suggest
is, enhancing the error message when temp_file_limit will be exceeded.

ERROR: aborting due to exceeding temp file limit

Is it possible to add the current temp file limit to the message? For
example,

ERROR: aborting due to exceeding temp file limit 10000kB

I know the current setting of temp_file_limit can be viewd in other
ways, but I think this will make admin's or application developer's
life a little bit easier.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: josh(at)agliodbs(dot)com, cedric(dot)villemain(dot)debian(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-14 10:34:30
Message-ID: 4E1EC636.40700@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14/07/11 18:48, Tatsuo Ishii wrote:
>
> Hi I am the new reviewer:-)
>
> I have looked into the v6 patches. One thing I would like to suggest
> is, enhancing the error message when temp_file_limit will be exceeded.
>
> ERROR: aborting due to exceeding temp file limit
>
> Is it possible to add the current temp file limit to the message? For
> example,
>
> ERROR: aborting due to exceeding temp file limit 10000kB
>
> I know the current setting of temp_file_limit can be viewd in other
> ways, but I think this will make admin's or application developer's
> life a little bit easier.

Hi Tatsuo,

Yeah, good suggestion - I agree that it would be useful to supply extra
detail, I'll amend and resubmit a new patch (along with whatever review
modifications we come up with in the meantime)!

Cheers

Mark


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
Cc: josh(at)agliodbs(dot)com, cedric(dot)villemain(dot)debian(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-15 02:57:09
Message-ID: 20110715.115709.411418020386079073.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> I have looked into the v6 patches. One thing I would like to suggest
>> is, enhancing the error message when temp_file_limit will be exceeded.
>>
>> ERROR: aborting due to exceeding temp file limit
>>
>> Is it possible to add the current temp file limit to the message? For
>> example,
>>
>> ERROR: aborting due to exceeding temp file limit 10000kB
>>
>> I know the current setting of temp_file_limit can be viewd in other
>> ways, but I think this will make admin's or application developer's
>> life a little bit easier.
>
> Hi Tatsuo,
>
> Yeah, good suggestion - I agree that it would be useful to supply
> extra detail, I'll amend and resubmit a new patch (along with whatever
> review modifications we come up with in the meantime)!
>
> Cheers
>
> Mark

Maybe we could add more info regarding current usage and requested
amount in addition to the temp file limit value. I mean something
like:

ERROR: aborting due to exceeding temp file limit. Current usage 9000kB, requested size 1024kB, thus it will exceed temp file limit 10000kB.

What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: josh(at)agliodbs(dot)com, cedric(dot)villemain(dot)debian(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-15 10:15:09
Message-ID: 4E20132D.1000806@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15/07/11 14:57, Tatsuo Ishii wrote:
>
> Maybe we could add more info regarding current usage and requested
> amount in addition to the temp file limit value. I mean something
> like:
>
> ERROR: aborting due to exceeding temp file limit. Current usage 9000kB, requested size 1024kB, thus it will exceed temp file limit 10000kB.
>

I modeled the original message on what happens when statement timeout is
exceeded, which doesn't state its limit in the error message at all -
actually I did wonder if there is was informal standard for *not*
stating the value of the limit that is being exceeded! However, I agree
with you and think it makes sense to include it here. I wonder if the
additional detail you are suggesting above might be better added to a
HINT - what do you think? If it is a better idea to just add it in the
message as above I can certainly do that.

Cheers

Mark


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-15 10:55:01
Message-ID: CAF6yO=0hdSiCPuojZtFkufqRtF5_i8-AXDfSufrvxxRQOdzFuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/7/15 Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>:
> On 15/07/11 14:57, Tatsuo Ishii wrote:
>>
>> Maybe we could add more info regarding current usage and requested
>> amount in addition to the temp file limit value. I mean something
>> like:
>>
>> ERROR:  aborting due to exceeding temp file limit. Current usage 9000kB,
>> requested size 1024kB, thus it will exceed temp file limit 10000kB.
>>
>
> I modeled the original message on what happens when statement timeout is
> exceeded, which doesn't state its limit in the error message at all -
> actually I did wonder if there is was informal standard for *not* stating
> the value of the limit that is being exceeded! However, I agree with you and
> think it makes sense to include it here. I wonder if the additional detail
> you are suggesting above might be better added to a HINT - what do you
> think? If it is a better idea to just add it in the message as above I can
> certainly do that.

Remember that what will happens is probably:

ERROR: aborting due to exceeding temp file limit. Current usage 8000kB,
requested size 8008kB, thus it will exceed temp file limit 8kB.

because temp file are increased by 8kb at once, rarely more (and by
rare I mean that it can happens via an extension or in the future, not
with current core postgresql).

This kind of message can also trouble the DBA by suggesting that the
query doesn't need more than what was asking at this moment.

That's being said I have no strong opinion for the HINT message if the
documentation is clear enough to not confuse DBA.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-15 14:30:19
Message-ID: 29393.1310740219@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

=?ISO-8859-1?Q?C=E9dric_Villemain?= <cedric(dot)villemain(dot)debian(at)gmail(dot)com> writes:
>> On 15/07/11 14:57, Tatsuo Ishii wrote:
>>> Maybe we could add more info regarding current usage and requested
>>> amount in addition to the temp file limit value. I mean something
>>> like:
>>>
>>> ERROR: aborting due to exceeding temp file limit. Current usage 9000kB,
>>> requested size 1024kB, thus it will exceed temp file limit 10000kB.

> Remember that what will happens is probably:

> ERROR: aborting due to exceeding temp file limit. Current usage 8000kB,
> requested size 8008kB, thus it will exceed temp file limit 8kB.

> because temp file are increased by 8kb at once, rarely more

Yes. I think the extra detail Tatsuo proposes is useless and possibly
confusing. I don't object to stating the current limit, but the other
numbers are not likely to be helpful to anyone. (If we had some way of
knowing what the file would ultimately grow to if we allowed the query
to continue, that would be useful; but of course we don't know that.)

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: cedric(dot)villemain(dot)debian(at)gmail(dot)com
Cc: mark(dot)kirkwood(at)catalyst(dot)net(dot)nz, ishii(at)postgresql(dot)org, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-16 23:45:46
Message-ID: 20110717.084546.706994262324555350.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> I modeled the original message on what happens when statement timeout is
>> exceeded, which doesn't state its limit in the error message at all -
>> actually I did wonder if there is was informal standard for *not* stating
>> the value of the limit that is being exceeded! However, I agree with you and
>> think it makes sense to include it here. I wonder if the additional detail
>> you are suggesting above might be better added to a HINT - what do you
>> think? If it is a better idea to just add it in the message as above I can
>> certainly do that.
>
> Remember that what will happens is probably:
>
> ERROR: aborting due to exceeding temp file limit. Current usage 8000kB,
> requested size 8008kB, thus it will exceed temp file limit 8kB.
>
> because temp file are increased by 8kb at once, rarely more (and by
> rare I mean that it can happens via an extension or in the future, not
> with current core postgresql).

Could you please elaborate why "Current usage 8000kB" can bigger than
"temp file limit 8kB"? I undertstand the point that temp files are
allocated by 8kB at once, but I don't understand why those numbers you
suggested could happen. Actually I tried with the modified patches and
got:

test=# CREATE TEMPORARY TABLE resourcetemp1 AS SELECT generate_series(1,100000);
SELECT 100000
test=# SET temp_file_limit = 578;
SET
test=# SELECT count(*) FROM (select * FROM resourcetemp1 ORDER BY 1) AS a;
ERROR: aborting due to exceeding temp file limit, current usage 576kB, requested size 8192kB, thus it will exceed temp file limit 578kB

Here temp_file_limit is not specified by 8kB unit, so "current usage"
becomes 576kB, which is 8kB unit. I don't think those numbers
will terribly confuse DBAs..
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-17 03:00:30
Message-ID: 386.1310871630@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
> This version moves the check *before* we write the new buffer, so
> should take care of issues about really large write buffers, plugins
> etc.

This logic seems pretty obviously wrong:

+ if (temp_file_limit >= 0 && VfdCache[file].fdstate & FD_TEMPORARY)
+ {
+ if (VfdCache[file].seekPos + amount >= VfdCache[file].fileSize)
+ {
+ increaseSize = true;
+ if ((temporary_files_size + (double)amount) / 1024.0 > (double)temp_file_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_QUERY_CANCELED),
+ errmsg("aborting due to exceeding temp file limit")));
+ }
+ }

It's supposing that the write length ("amount") is exactly the amount by
which the file length will grow, which would only be true if seekPos is
exactly equal to fileSize before the write. I think that would often be
the case in our usage patterns, but surely code at this low level should
not be assuming that.

BTW, the definition of the GUC seems to constrain the total temp file
size to be no more than 2GB on 32-bit machines ... do we really want
it to act like that?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: cedric(dot)villemain(dot)debian(at)gmail(dot)com, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-17 03:07:51
Message-ID: 525.1310872071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
>> Remember that what will happens is probably:
>>
>> ERROR: aborting due to exceeding temp file limit. Current usage 8000kB,
>> requested size 8008kB, thus it will exceed temp file limit 8kB.

> Could you please elaborate why "Current usage 8000kB" can bigger than
> "temp file limit 8kB"? I undertstand the point that temp files are
> allocated by 8kB at once, but I don't understand why those numbers you
> suggested could happen. Actually I tried with the modified patches and
> got:

> test=# CREATE TEMPORARY TABLE resourcetemp1 AS SELECT generate_series(1,100000);
> SELECT 100000
> test=# SET temp_file_limit = 578;
> SET
> test=# SELECT count(*) FROM (select * FROM resourcetemp1 ORDER BY 1) AS a;
> ERROR: aborting due to exceeding temp file limit, current usage 576kB, requested size 8192kB, thus it will exceed temp file limit 578kB

You didn't show us how you computed those numbers, but I'm really
dubious that FileWrite() has got any ability to produce numbers that
are helpful. Like Cedric, I think the write amount in any one call
is usually going to be one block.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-17 18:25:03
Message-ID: 1463.1310927103@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
> [ temp-files-v6.patch.gz ]

I've applied this patch with some editorialization, notably:

I changed the datatype of temporary_files_size to uint64 so as to avoid
worries about accumulating roundoff error over a long transaction.
This is probably just paranoia, since on most machines double can store
integers exactly up to 2^52 which is more than the largest possible
temp_file_limit, but it seemed safer.

fileSize of a temp file, and temporary_files_size, must be updated
correctly whether or not temp_file_limit is currently being enforced.
Otherwise things do not behave sanely if temp_file_limit is turned on
mid-transaction.

Fixed bogus calculation in enforcement check, per my note yesterday.

Added a new SQLSTATE code, ERRCODE_CONFIGURATION_LIMIT_EXCEEDED, because
I felt that ERRCODE_QUERY_CANCELED wasn't at all appropriate for this.
(Maybe we should think about changing the statement_timeout code to use
that too, although I'm not entirely sure that we can easily tell
statement_timeout apart from a query cancel.)

Rephrased the error message to "temporary file size exceeds
temp_file_limit (%dkB)", as per Tatsuo's first suggestion but not his
second. There's still room for bikeshedding here, of course.

Removed the reset of temporary_files_size in CleanupTempFiles. It was
inappropriate because some temp files can survive CleanupTempFiles, and
unnecessary anyway because the counter is correctly decremented when
unlinking a temp file. (This was another reason for wanting
temporary_files_size to be exact, because we're now doing dead reckoning
over the life of the session.)

Set the maximum value of temp_file_limit to INT_MAX not MAX_KILOBYTES.
There's no reason for the limit to be less on 32-bit machines than
64-bit, since we are doing arithmetic in uint64 not size_t.

Minor rewrite of documentation.

Also, I didn't add the proposed regression test, because it seems shaky
to me. In an installation with larger than default work_mem, the sort
might not spill to disk. I don't think this feature is large enough to
deserve its very own, rather expensive, regression test anyway.

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: cedric(dot)villemain(dot)debian(at)gmail(dot)com, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-17 20:42:39
Message-ID: 20110718.054239.22298205139730835.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> Could you please elaborate why "Current usage 8000kB" can bigger than
>> "temp file limit 8kB"? I undertstand the point that temp files are
>> allocated by 8kB at once, but I don't understand why those numbers you
>> suggested could happen. Actually I tried with the modified patches and
>> got:
>
>> test=# CREATE TEMPORARY TABLE resourcetemp1 AS SELECT generate_series(1,100000);
>> SELECT 100000
>> test=# SET temp_file_limit = 578;
>> SET
>> test=# SELECT count(*) FROM (select * FROM resourcetemp1 ORDER BY 1) AS a;
>> ERROR: aborting due to exceeding temp file limit, current usage 576kB, requested size 8192kB, thus it will exceed temp file limit 578kB
>
> You didn't show us how you computed those numbers, but I'm really
> dubious that FileWrite() has got any ability to produce numbers that
> are helpful. Like Cedric, I think the write amount in any one call
> is usually going to be one block.

Here it is(fd.c).

/*
* If performing this write will increase the file size, then abort if it will
* exceed temp_file_limit
*/
if (temp_file_limit >= 0 && VfdCache[file].fdstate & FD_TEMPORARY)
{
if (VfdCache[file].seekPos + amount >= VfdCache[file].fileSize)
{
increaseSize = true;
if ((temporary_files_size + (double)amount) / 1024.0 > (double)temp_file_limit)
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("aborting due to exceeding temp file limit, current usage %dkB, requested size %dkB, thus it will exceed temp file limit %dkB",
(int)(temporary_files_size/1024),
amount,
temp_file_limit)));
}
}

I also attached the whole patch against fd.c at the point when Mark
proposed the changes.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Attachment Content-Type Size
fd.c.patch.gz application/octet-stream 1.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: cedric(dot)villemain(dot)debian(at)gmail(dot)com, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-17 21:39:02
Message-ID: 5320.1310938742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
>> You didn't show us how you computed those numbers, but I'm really
>> dubious that FileWrite() has got any ability to produce numbers that
>> are helpful. Like Cedric, I think the write amount in any one call
>> is usually going to be one block.

> Here it is(fd.c).

> ereport(ERROR,
> (errcode(ERRCODE_QUERY_CANCELED),
> errmsg("aborting due to exceeding temp file limit, current usage %dkB, requested size %dkB, thus it will exceed temp file limit %dkB",
> (int)(temporary_files_size/1024),
> amount,
> temp_file_limit)));

The thing is that unless "amount" is really large, you're just going to
have two numbers that are very close to each other. I think this is
just useless complication, because "amount" is almost always going to
be 8kB or less.

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: ishii(at)postgresql(dot)org, cedric(dot)villemain(dot)debian(at)gmail(dot)com, mark(dot)kirkwood(at)catalyst(dot)net(dot)nz, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-17 21:53:29
Message-ID: 20110718.065329.231412901264772435.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> ereport(ERROR,
>> (errcode(ERRCODE_QUERY_CANCELED),
>> errmsg("aborting due to exceeding temp file limit, current usage %dkB, requested size %dkB, thus it will exceed temp file limit %dkB",
>> (int)(temporary_files_size/1024),
>> amount,
>> temp_file_limit)));
>
> The thing is that unless "amount" is really large, you're just going to
> have two numbers that are very close to each other. I think this is
> just useless complication, because "amount" is almost always going to
> be 8kB or less.

Oops. My bad. I should have divide "amount" by 1024. Now I understand
your point.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-17 22:11:46
Message-ID: 4E235E22.1030408@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18/07/11 06:25, Tom Lane wrote:
> Mark Kirkwood<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
>> [ temp-files-v6.patch.gz ]
> I've applied this patch with some editorialization, notably:

Awesome, the changes look great!

Cheers

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-07-20 09:21:48
Message-ID: 4E269E2C.8040702@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

And I thought I should mention: a big thank you to the the reviewers and
interested parties, Cedric, Tatsuo, Robert and Tom who did a review +
fixes as he committed the patch - nice work guys!

regards

Mark