Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date: 2014-10-27 12:50:39
Message-ID: CAOeZVicHLJBD7nAJynMq8QaFfAjy-pyXcL9ZXAcuaKX=RrBVaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
>
> IMHO writing all the data into a WAL would be the cleanest solution.
>
> Also, what is a small database? I don't think a static value will work,
> because the sweet spot between the current approach (forcing two
> checkpoints) and writing everything in WAL depends on the amount of dirty
> buffers that need to be checkpointed. Which is mostly driven by the size
> of shared buffers and write activity - for small shared buffers and/or
> mostly-read workload, checkpoints are cheap, so the 'small database'
> threshold (when choosing the WAL approach) is much higher than for large
> shared buffers or write-heavy workloads.
>

So are you proposing having a heuristic based on the amount of data in
shared buffers and write activity? Do you have something in mind that works
for general workloads as well?

> So maybe if we could determine the amount of data to be checkpointed, and
> then base the decision on that, that'd work better? This would also have
> to take into account that writing into WAL is sequential, while
> checkpoints usually cause random writes all over the datafiles (which is
> more expensive).
>
> Another option might be forcing just a "spread" checkpoint, not the
> immediate one (which is what we do now). That would not fix the CREATE
> DATABASE duration (actually, it would make it longer), but it would lower
> the impact on other activity on the machine.
>
>
>
I believe this to be the cleanest way to reduce the amount of I/O
generated. If I understand correctly, the original problem you mentioned
was not the time CREATE DATABASE is taking but rather the amount of I/O
each one is generating.

This also leads me to think if it makes sense to explore group commits
around the creation of files for a new database (for a same backend, of
course). This might be on call, if the user knows he/she is going to create
a lot of databases in the near future and is fine with a large spike in I/O
at one go. Again, might be even more broken than the current scenario, but
depends on what the user wants...

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-10-27 12:51:24 Re: Possible problem with shm_mq spin lock
Previous Message Robert Haas 2014-10-27 12:35:59 Re: Reducing lock strength of adding foreign keys