Re: For the ametures. (related to "Are we losing momentum?")

Lists: pgsql-hackers
From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Ben Clewett" <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 10:01:50
Message-ID: 03AF4E498C591348A42FC93DEA9661B83AF048@mail.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Ben

> -----Original Message-----
> From: Ben Clewett [mailto:B(dot)Clewett(at)roadrunner(dot)uk(dot)com]
> Sent: 17 April 2003 10:45
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: [HACKERS] For the ametures. (related to "Are we
> losing momentum?")
>
>
> I am not a hacker of PgSQL, and new to Databases. I was using MySQL
> under .NET, but was annoyed by their agressive licence agreements and
> immaturity. (Their sales personel are also very rude. One girl once
> told me that if I didn't like their licence terms I should just use
> flat-files instead.)

Probably more powerful ;-)

> - A true Windows version which people can learn their craft on.

Coming with 7.4...

> - Tools which look like Access, to do row level data
> editing with no SQL.

http://www.pgadmin.org/

It looks more like SQL Server's Enterprise Manager but does most if not
all of what I expect you need.

> - Centrally located complete documentation in many
> consistent easy to
> read formats, of the system and *ALL* API's, including
> in-line tutorials
> and examples.

The tarball includes the complete documentation in HTML format, and in
pgAdmin there's a searchable copy in the main chm help file.

> - Data types like 'ENUM' which appeal to ametures.

Isn't that just syntactic sugar for a column with a check for specific
values on it?

> - There are no administrative mandatorys. Eg, VACUUM.
> (A stand-alone
> commercial app, like an Email client, will be contrainted by
> having to
> be an app and a DBA in one.)

PostgreSQL is by no means alone in this requirement. SQL Server for
example has 'optimizations' that are performed usually as part of a
scheduled maintenance plan and are analagous to vacuum in some ways.

> - The tables (not innodb) are in different files of the
> same name.
> Allowing the OS adminitrator great ability. EG, putting tables on
> separate partitions and therefore greatly speeding performance.

One reason for not doing this is that a table in PostgreSQL might span
mutiple files if it exceeds a couple of gigs in size.

> - They have extensive backup support. Including now,
> concurrent backup
> without user interuption or risk of inconsistency.

So does PostgreSQL (pg_dump/pg_dumpall).

Regards, Dave

PS, it's nice you decided not to go to the Dark Side :-)


From: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
To: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 11:44:07
Message-ID: 3E9E9387.2050505@roadrunner.uk.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Dave,

A brief defence of my posting. After which I'll retire to my side of
the fence :)

>>- A true Windows version which people can learn their craft on.
>
> Coming with 7.4...

I look forward to this greatly. Maybe here I'll have the chance to fix
some problems for the greater community.

>>- Tools which look like Access, to do row level data
>>editing with no SQL.
>
>
> http://www.pgadmin.org/
>
> It looks more like SQL Server's Enterprise Manager but does most if not
> all of what I expect you need.

Sorry, my fault, an excellent program.

> The tarball includes the complete documentation in HTML format, and in
> pgAdmin there's a searchable copy in the main chm help file.

But not the API's. Not in one central location. Some of it, the stuff
I use, is on GBorg, and in inconsistent format. I have personally found
some documentation very fragmented. So a subtle point about an ability
is lost as I have assumed all comments to be in a few pages, and missed
something vital or relevent in another sourse. Eg, see my comment at
the end. But it's better than msdn :)

>>- Data types like 'ENUM' which appeal to ametures.
>
> Isn't that just syntactic sugar for a column with a check for specific
> values on it?

Yes :) By point is not that PostgreSQL is lacking, only that the
ameture finds others more friendly and inviting.

Although this may be a point which is irrelevent?

My personal 'gripe' was when reading through the postings, some people
considered people who have not the time, patience or ability, to learn
PostgreSQL completelly, somehow not worthy.

I wanted to support us dumb users! :)

>>- There are no administrative mandatorys. Eg, VACUUM.
>>(A stand-alone
>>commercial app, like an Email client, will be contrainted by
>>having to
>>be an app and a DBA in one.)
>
> PostgreSQL is by no means alone in this requirement. SQL Server for
> example has 'optimizations' that are performed usually as part of a
> scheduled maintenance plan and are analagous to vacuum in some ways.

Is this a weekness in DBMS's that don't require this? (MySQL, Liant
etc.) Is there a way of building a guarbage collector into the system?
My Windows PC has no 'cron'.

>>- The tables (not innodb) are in different files of the
>>same name.
>>Allowing the OS adminitrator great ability. EG, putting tables on
>>separate partitions and therefore greatly speeding performance.
>
> One reason for not doing this is that a table in PostgreSQL might span
> mutiple files if it exceeds a couple of gigs in size.

They used multile files for tables, with a common pefix of the table
name. But they have dropped this them selves now.

I miss the way with MySQL I could delete a table, or move it, or back it
up, manually using 'rm', 'mv' or 'cp'.

Working with IDE drives on PC's, you can double the performace of a DB
just by putting half the tables on a disk on another IDE chain. Adding
a DB using 'tar' is very a powerful ability.

But hay, if I missed it that much, I would not have moved! :)

>>- They have extensive backup support. Including now,
>>concurrent backup
>>without user interuption or risk of inconsistency.
>
>
> So does PostgreSQL (pg_dump/pg_dumpall).

I have used this, and it's a great command.

I could not work out from the documentation whether it takes a snapshot
at the start time, or archives data at the time it find's it. The
documentation (app-pg-dump.html). As the documentation does not clarify
this very important point, I desided it's not safe to use when the
system is in use.

Can this command can be used, with users in the system making heavy
changes, and when takes many hours to complete, does produce a valid and
consistent backup?

If so, you have all MySQL has here and in a more useful format.

> PS, it's nice you decided not to go to the Dark Side :-)

Thanks, Ben


From: Ian Barwick <barwick(at)gmx(dot)net>
To: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>, Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 12:10:09
Message-ID: 200304171410.09900.barwick@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 17 April 2003 13:44, Ben Clewett wrote:
> Hi Dave,
>
> A brief defence of my posting. After which I'll retire to my side of
> the fence :)
(snip)

> >>- Data types like 'ENUM' which appeal to ametures.
> >
> > Isn't that just syntactic sugar for a column with a check for specific
> > values on it?
>
> Yes :) By point is not that PostgreSQL is lacking, only that the
> ameture finds others more friendly and inviting.
>
> Although this may be a point which is irrelevent?

Probably ;-) because MySQL too lacks a few "user friendly" features
(like boolean datatypes).

Ian Barwick
barwick(at)gmx(dot)net


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 12:18:40
Message-ID: 003601c304db$7ba844c0$6501a8c0@DUNSLANE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


(please note that the word is "amateur" - it comes from French/Latin,
meaning people who do things for the love of it).

There are lots of cron clones for Windows - try a Google search.

Also, there is a native Windows port of Pg 7.2.1 available - we have been
using it for a couple of months now on a small project without a single
hitch. Search this mailing list for details.

Like you, we eagerly await the official Windows port in 7.4. (then we'll
have shema, for example).

Part of the problem that amateurs often face in dealing with things like a
DBMS is that their lack of formal training leads them to expect things to
work in some intuitive fashion, and they don't (for very good technical
reasons). As someone who in a past life had to teach relational theory and
practice, I can tell you that just getting across the idea of a Cartesian
product can be quite hard. And as a former DBA I can tell you that even
seasoned professional developers often don't/can't take the trouble to
analyse what their queries are doing and why they demand so much in
resources. Running a DBMS (*any* DBMS) which has significant requirements is
unfortunately something that requires both understanding and experience. It
never "just works".

Finally, one of the important things for my particular situation, is that Pg
comes with a BSDish license, which means we have no issues with bundling it.
AFAIK it's pretty much alone in that.

andrew

----- Original Message -----
From: "Ben Clewett" <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
To: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>; <pgsql-hackers(at)postgresql(dot)org>
Sent: Thursday, April 17, 2003 7:44 AM
Subject: Re: [HACKERS] For the ametures. (related to "Are we losing
momentum?")

> Hi Dave,
>
> A brief defence of my posting. After which I'll retire to my side of
> the fence :)
>
> >>- A true Windows version which people can learn their craft on.
> >
> > Coming with 7.4...
>
> I look forward to this greatly. Maybe here I'll have the chance to fix
> some problems for the greater community.
>
> >>- Tools which look like Access, to do row level data
> >>editing with no SQL.
> >
> >
> > http://www.pgadmin.org/
> >
> > It looks more like SQL Server's Enterprise Manager but does most if not
> > all of what I expect you need.
>
> Sorry, my fault, an excellent program.
>
> > The tarball includes the complete documentation in HTML format, and in
> > pgAdmin there's a searchable copy in the main chm help file.
>
> But not the API's. Not in one central location. Some of it, the stuff
> I use, is on GBorg, and in inconsistent format. I have personally found
> some documentation very fragmented. So a subtle point about an ability
> is lost as I have assumed all comments to be in a few pages, and missed
> something vital or relevent in another sourse. Eg, see my comment at
> the end. But it's better than msdn :)
>
> >>- Data types like 'ENUM' which appeal to ametures.
> >
> > Isn't that just syntactic sugar for a column with a check for specific
> > values on it?
>
> Yes :) By point is not that PostgreSQL is lacking, only that the
> ameture finds others more friendly and inviting.
>
> Although this may be a point which is irrelevent?
>
> My personal 'gripe' was when reading through the postings, some people
> considered people who have not the time, patience or ability, to learn
> PostgreSQL completelly, somehow not worthy.
>
> I wanted to support us dumb users! :)
>
> >>- There are no administrative mandatorys. Eg, VACUUM.
> >>(A stand-alone
> >>commercial app, like an Email client, will be contrainted by
> >>having to
> >>be an app and a DBA in one.)
> >
> > PostgreSQL is by no means alone in this requirement. SQL Server for
> > example has 'optimizations' that are performed usually as part of a
> > scheduled maintenance plan and are analagous to vacuum in some ways.
>
> Is this a weekness in DBMS's that don't require this? (MySQL, Liant
> etc.) Is there a way of building a guarbage collector into the system?
> My Windows PC has no 'cron'.
>
> >>- The tables (not innodb) are in different files of the
> >>same name.
> >>Allowing the OS adminitrator great ability. EG, putting tables on
> >>separate partitions and therefore greatly speeding performance.
> >
> > One reason for not doing this is that a table in PostgreSQL might span
> > mutiple files if it exceeds a couple of gigs in size.
>
> They used multile files for tables, with a common pefix of the table
> name. But they have dropped this them selves now.
>
> I miss the way with MySQL I could delete a table, or move it, or back it
> up, manually using 'rm', 'mv' or 'cp'.
>
> Working with IDE drives on PC's, you can double the performace of a DB
> just by putting half the tables on a disk on another IDE chain. Adding
> a DB using 'tar' is very a powerful ability.
>
> But hay, if I missed it that much, I would not have moved! :)
>
> >>- They have extensive backup support. Including now,
> >>concurrent backup
> >>without user interuption or risk of inconsistency.
> >
> >
> > So does PostgreSQL (pg_dump/pg_dumpall).
>
> I have used this, and it's a great command.
>
> I could not work out from the documentation whether it takes a snapshot
> at the start time, or archives data at the time it find's it. The
> documentation (app-pg-dump.html). As the documentation does not clarify
> this very important point, I desided it's not safe to use when the
> system is in use.
>
> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid and
> consistent backup?
>
> If so, you have all MySQL has here and in a more useful format.
>
> > PS, it's nice you decided not to go to the Dark Side :-)
>
> Thanks, Ben
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 12:44:52
Message-ID: 20030417124452.GE23252@libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 17, 2003 at 11:44:07AM +0000, Ben Clewett wrote:
>
> I miss the way with MySQL I could delete a table, or move it, or back it
> up, manually using 'rm', 'mv' or 'cp'.

Under most circumstances, you can't do that _anyway_, because doing
so will break stuff unless the postmaster is stopped. I agree that
bing able to put tables and files on their own platters would be a
Good Thing, but in order to make it really safe, it needs to be
managed by the postmaster. Making this difficult is sort of a
defence mechanism, therefore: if you make it too easy, people will be
shooting themselves in the foot all the time.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>
Cc: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 15:37:46
Message-ID: 200304171537.h3HFbkW19947@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dave Page wrote:
> > - They have extensive backup support. Including now,
> > concurrent backup
> > without user interuption or risk of inconsistency.
>
> So does PostgreSQL (pg_dump/pg_dumpall).

I have applied the following doc patch to the pg_dump documentation to
more clearly state that it can do consistent backups during concurrent
access --- too many people weren't seeing that capability.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 4.8 KB

From: Doug McNaught <doug(at)mcnaught(dot)org>
To: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
Cc: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 16:37:55
Message-ID: m38yu96pwc.fsf@varsoon.wireboard.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com> writes:

> > So does PostgreSQL (pg_dump/pg_dumpall).
>
> I have used this, and it's a great command.
>
> I could not work out from the documentation whether it takes a
> snapshot at the start time, or archives data at the time it find's it.
> The documentation (app-pg-dump.html). As the documentation does not
> clarify this very important point, I desided it's not safe to use when
> the system is in use.

Ummm, quoting from the pg_dump manpage:

pg_dump makes consistent backups even if the database is
being used concurrently. pg_dump does not block other
users accessing the database (readers or writers).

What part of this isn't clear?

It's safe. pg_dump does all its work inside a transaction, so MVCC
rules automatically guarantee that it sees a consistent snapshot.

> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid
> and consistent backup?

Absolutely.

> If so, you have all MySQL has here and in a more useful format.

I think MySQL's consistent hot backup has to lock tables, while PG's
doesn't...

-Doug


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Doug McNaught <doug(at)mcnaught(dot)org>
Cc: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>, Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 16:40:42
Message-ID: 200304171640.h3HGegL23646@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Doug McNaught wrote:
> Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com> writes:
>
>
> > > So does PostgreSQL (pg_dump/pg_dumpall).
> >
> > I have used this, and it's a great command.
> >
> > I could not work out from the documentation whether it takes a
> > snapshot at the start time, or archives data at the time it find's it.
> > The documentation (app-pg-dump.html). As the documentation does not
> > clarify this very important point, I desided it's not safe to use when
> > the system is in use.
>
> Ummm, quoting from the pg_dump manpage:
>
> pg_dump makes consistent backups even if the database is
> being used concurrently. pg_dump does not block other
> users accessing the database (readers or writers).
>
> What part of this isn't clear?
>
> It's safe. pg_dump does all its work inside a transaction, so MVCC
> rules automatically guarantee that it sees a consistent snapshot.

Too many people have missed that point --- it was too far down in the
manual page, after a long discussion about output formats. Now it is
right in the first paragraph, which should eliminate that question ---
it was almost becoming an FAQ.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Jon Jensen <jon(at)endpoint(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 17:38:02
Message-ID: Pine.LNX.4.50.0304171732480.1617-100000@louche.swelter.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Apr 2003, Dave Page wrote:

> > - Data types like 'ENUM' which appeal to ametures.
>
> Isn't that just syntactic sugar for a column with a check for specific
> values on it?

I believe it's actually different. In PostgreSQL you'd use a VARCHAR
column with CHECK constraints, which means there are actual possibly
lengthy strings in the database. In MySQL's ENUM, the table structure maps
a particular string to a bit pattern, so if you have two possible values,
'superdogfood' and 'onetwothreefourfivesixseven', your column will only
take 1 bit + overhead. Obviously no big deal until you get a few dozen
possibilities. This is also what allows the SET type to work -- it's a set
of binary flags for a named list of elements. The docs are here:

http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#ENUM

I don't like the fact that numbers don't really work (being used as
indices rather than names), that case isn't tolerated, that invalid
entries go in as empty strings, etc., so I certainly wouldn't want to see
them emulated exactly in PostgreSQL, but I imagine that ENUM could save a
lot of disk space in certain circumstances, and SET seems useful.

Jon


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
Cc: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 17:41:35
Message-ID: Pine.LNX.4.44.0304171506580.1617-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ben Clewett writes:

> I could not work out from the documentation whether it takes a snapshot
> at the start time, or archives data at the time it find's it. The
> documentation (app-pg-dump.html). As the documentation does not clarify
> this very important point, I desided it's not safe to use when the
> system is in use.
>
> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid and
> consistent backup?

From the pg_dump reference page:

<para>
<application>pg_dump</application> makes consistent backups even if the
database is being used concurrently. <application>pg_dump</application>
does not block other users accessing the database (readers or
writers).
</para>

From the chapter Backup and Restore:

<para>
Dumps created by <application>pg_dump</> are internally consistent,
that is, updates to the database while <application>pg_dump</> is
running will not be in the dump. <application>pg_dump</> does not
block other operations on the database while it is working.
(Exceptions are those operations that need to operate with an
exclusive lock, such as <command>VACUUM FULL</command>.)
</para>

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: "Ben Clewett" <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>, "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-17 19:46:44
Message-ID: 00ca01c3051a$13a2c9b0$5a00a8c0@hplaptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

----- Original Message -----
From: "Ben Clewett" <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
> >>- There are no administrative mandatorys. Eg, VACUUM.
> >>(A stand-alone
> >>commercial app, like an Email client, will be contrainted by
> >>having to
> >>be an app and a DBA in one.)
> >
> > PostgreSQL is by no means alone in this requirement. SQL Server for
> > example has 'optimizations' that are performed usually as part of a
> > scheduled maintenance plan and are analagous to vacuum in some ways.
>
> Is this a weekness in DBMS's that don't require this? (MySQL, Liant
> etc.) Is there a way of building a guarbage collector into the system?
> My Windows PC has no 'cron'.

Work is being done to build vacuum into the backend so that cron is not
required. Hopefully will be in 7.4

> >>- The tables (not innodb) are in different files of the
> >>same name.
> >>Allowing the OS adminitrator great ability. EG, putting tables on
> >>separate partitions and therefore greatly speeding performance.
> >
> > One reason for not doing this is that a table in PostgreSQL might span
> > mutiple files if it exceeds a couple of gigs in size.
>
> Working with IDE drives on PC's, you can double the performace of a DB
> just by putting half the tables on a disk on another IDE chain. Adding
> a DB using 'tar' is very a powerful ability.

You can do this using symlinks, but you do have to shut down the postmaster
before you play with the files directly.

> >>- They have extensive backup support. Including now,
> >>concurrent backup
> >>without user interuption or risk of inconsistency.
> >
> > So does PostgreSQL (pg_dump/pg_dumpall).
>
> I have used this, and it's a great command.
>
> I could not work out from the documentation whether it takes a snapshot
> at the start time, or archives data at the time it find's it. The
> documentation (app-pg-dump.html). As the documentation does not clarify
> this very important point, I desided it's not safe to use when the
> system is in use.
>
> Can this command can be used, with users in the system making heavy
> changes, and when takes many hours to complete, does produce a valid and
> consistent backup?

Yes it takes a snapshot from when it starts dumping the database, so it's
consistent no matter how much activity is going on after you start pg_dump.


From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>
Cc: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-21 16:40:27
Message-ID: Pine.LNX.4.33.0304211036000.5883-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Apr 2003, Dave Page wrote:

> Hi Ben
>
> > - Data types like 'ENUM' which appeal to ametures.
>
> Isn't that just syntactic sugar for a column with a check for specific
> values on it?

Yes. But isn't serial just syntactic sugar for create sequence seqname;
then including that sequence in a default clause?

Personally, i could easily see a use for enum() covering the check in
constraint. MySQL users can use enum() at their leisure, postgresql users
can pick if up if they want to, and underneath it all is a check
constraint the user can see with /dt just like with serial.

> > - There are no administrative mandatorys. Eg, VACUUM.
> > (A stand-alone
> > commercial app, like an Email client, will be contrainted by
> > having to
> > be an app and a DBA in one.)
>
> PostgreSQL is by no means alone in this requirement. SQL Server for
> example has 'optimizations' that are performed usually as part of a
> scheduled maintenance plan and are analagous to vacuum in some ways.

But at the same time, it's one of those gotchas that would be nice to get
rid of for people who just want a simple little database on their
workstation. It might be nice to have some kind of lazy auto vacuum
daemon installed by default and configured to run every hour or so
according to postgresql.conf.

> > - The tables (not innodb) are in different files of the
> > same name.
> > Allowing the OS adminitrator great ability. EG, putting tables on
> > separate partitions and therefore greatly speeding performance.
>
> One reason for not doing this is that a table in PostgreSQL might span
> mutiple files if it exceeds a couple of gigs in size.

And let's face it, if we get tablespaces implemented, this kind of thing
goes awawy.


From: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-22 08:25:22
Message-ID: 3EA4FC72.4080800@roadrunner.uk.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Matthew T. O'Connor wrote:

> ----- Original Message -----
> From: "Ben Clewett" <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>

>>Working with IDE drives on PC's, you can double the performace of a DB
>>just by putting half the tables on a disk on another IDE chain.
>
>
> You can do this using symlinks, but you do have to shut down the postmaster
> before you play with the files directly.

I was hoping this was the case. :)

From my data/base directory, I have a tree structure of numbered files
of no obvious structure. As well as some smaller directories, 'global',
'pg_xlog' and 'pg_clog'.

If I wanted to divide the postmaster read() calls evenly to files
located over several physical disks, how would you suggest distributing
the data-space? Would it be as simple as putting each child directory
in 'data/base' on a different physical disk in a round-robbin fasion
using symbolic links: Or is it more involved...

data/base/1 -> /dev/hda
data/base/2 -> /dev/hdb
data/base/3 -> /dev/hdc
data/base/4 -> /dev/hda
data/base/5 -> /dev/hdb
data/base/6 -> /dev/hdc (etc)

(I have made the assumption that the postmaster serves different
connections in parallel, otherwise this would have little effect :)

Thanks,

Ben


From: Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-22 08:37:57
Message-ID: 200304221407.57485.shridhar_daithankar@nospam.persistent.co.in
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 22 April 2003 13:55, Ben Clewett wrote:
> If I wanted to divide the postmaster read() calls evenly to files
> located over several physical disks, how would you suggest distributing
> the data-space? Would it be as simple as putting each child directory
> in 'data/base' on a different physical disk in a round-robbin fasion
> using symbolic links: Or is it more involved...
>
> data/base/1 -> /dev/hda
> data/base/2 -> /dev/hdb
> data/base/3 -> /dev/hdc
> data/base/4 -> /dev/hda
> data/base/5 -> /dev/hdb
> data/base/6 -> /dev/hdc (etc)

Don't bother splitting across disks unless you put them on different IDE
channels as IDE channel bandwidth is shared.

If you have that many disk, put them on IDE RAID. That is a much simpler
solution.

Shridhar


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing
Date: 2003-04-22 12:23:24
Message-ID: 1051014203.15057.16.camel@zeutrh9
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2003-04-22 at 04:25, Ben Clewett wrote:
> Matthew T. O'Connor wrote:
> From my data/base directory, I have a tree structure of numbered files
> of no obvious structure. As well as some smaller directories, 'global',
> 'pg_xlog' and 'pg_clog'.
>
> If I wanted to divide the postmaster read() calls evenly to files
> located over several physical disks, how would you suggest distributing
> the data-space? Would it be as simple as putting each child directory
> in 'data/base' on a different physical disk in a round-robbin fasion
> using symbolic links: Or is it more involved...
>
> data/base/1 -> /dev/hda
> data/base/2 -> /dev/hdb
> data/base/3 -> /dev/hdc
> data/base/4 -> /dev/hda
> data/base/5 -> /dev/hdb
> data/base/6 -> /dev/hdc (etc)
>
> (I have made the assumption that the postmaster serves different
> connections in parallel, otherwise this would have little effect :)

Yes connections are served in parallel. The best way to split the files
is something you have to figure out, probably based on usage. The round
robin directory method you mentioned above falls down in that it only
splits whole databases into different locations regardless of how much
I/O is related to those databases. You may wind up with inactive
databases on their own disk which would yield no performance gain. It's
also probably better to get down to the file / index level rather than
whole databases as you may have a few tables that get 90% of the work.

Hopefully some of that was helpful.


From: pgsql(at)mohawksoft(dot)com
To: shridhar_daithankar(at)persistent(dot)co(dot)in
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to 'Are we losing momentum?')
Date: 2003-04-22 14:45:11
Message-ID: 4128.68.162.220.216.1051022711.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Tuesday 22 April 2003 13:55, Ben Clewett wrote:
>> If I wanted to divide the postmaster read() calls evenly to files
>> located over several physical disks, how would you suggest
>> distributing the data-space? Would it be as simple as putting each
>> child directory in 'data/base' on a different physical disk in a
>> round-robbin fasion using symbolic links: Or is it more involved...
>>
>> data/base/1 -> /dev/hda
>> data/base/2 -> /dev/hdb
>> data/base/3 -> /dev/hdc
>> data/base/4 -> /dev/hda
>> data/base/5 -> /dev/hdb
>> data/base/6 -> /dev/hdc (etc)
>
> Don't bother splitting across disks unless you put them on different
> IDE channels as IDE channel bandwidth is shared.

While that is electricaly "true" it is not completely true. Modern IDE hard
disks are very advanced with large read-ahead caches. That combined with
IDE-DMA access, low seek times, faster spin rates, means you can get
performance across two IDE drives on the same channel.

For instance, two databases, one on HDA and the other database on HDB.
Successive reads inteleaved HDA/HDB/HDA/HDB etc. will share electical
bandwidth (as would SCSI). AFAIK, there is no standard asynchronous command
structure for IDE, however, the internal read-ahead cache on each drive will
usually have a pretty good guess at the "next" block based on some
predictive caching algorithm.

So, the "next" read from the drive has a good chance at coming from cache.
Plus the OS may "scatter gather" larger requests into smaller successive
requests (so a pure "read-ahead" will work great). Then consider
write-caching (if you dare).

It is very true you want to have one IDE drive per IDE channel, but these
days two drives on a channel are not as bad as it once was. This is not due
to shared electrical bandwidth of the system (all bus systems suffer this)
but because of the electrical protocol to address the drives. ATA and EIDE
have made strides in this area.

>
> If you have that many disk, put them on IDE RAID. That is a much
> simpler solution.

A hardware RAID system is obviously an "easier" solution, and
www.infortrend.com makes a very cool system, but spreading multiple
databases across multiple IDE drives and controllers will probably provide
higher overall performance if you have additional IDE channels instead of
forcing all the I/O through one controller (IDE or SCSI) channel.

Pretty good PCI/EIDE-DMA controllers are cheap, $50~$100, and you can fit a
bunch of them into a server system. Provided your OS has a reentrent driver
model, it should be possible for PostgreSQL to be performing as many I/O
operations concurrently as you have drive controllers, where as with an
IDE->SCSI raid controller, you may still be limited to how good your
specific driver handles concurrency within one driver instance.

The "best" solution is one hardware raid per I/O channel per database, but
that is expensive. One IDE driver per IDE channel per database is the next
best thing. Two IDE drives per channel, one drive per database, is very
workable if you make sure that the more active databases are on separate
controllers.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-22 15:18:45
Message-ID: 3032.1051024725@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ben Clewett <B(dot)Clewett(at)roadrunner(dot)uk(dot)com> writes:
> If I wanted to divide the postmaster read() calls evenly to files
> located over several physical disks, how would you suggest distributing
> the data-space?

AFAIK, the single biggest win you can get in this dimension is to put
the WAL log ($PGDATA/pg_xlog/) on a separate spindle from everything
else. At least for write-intensive databases, that can buy you
something like 2x improvement for the price of one easy symlink.

After that, the conventional wisdom is to put indexes on a third spindle
(separate from base tables and from xlog). But the bookkeeping and
maintenance effort needed for that is really too high to make it worth
worrying about, IMHO :-(. Eventually we will have some kind of
tablespace feature to make it easy.

My recommendation at the moment would be: WAL on dedicated spindle,
everything else on the best RAID array you can set up. And buy as much
RAM as you can afford.

See past discussions in pgsql-performance for more info.

regards, tom lane


From: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-22 16:32:14
Message-ID: 200304221232.14447.darcy@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 22 April 2003 11:18, Tom Lane wrote:
> My recommendation at the moment would be: WAL on dedicated spindle,
> everything else on the best RAID array you can set up. And buy as much
> RAM as you can afford.

Hmm. Is this safe? Let's assume for argument's sake that the RAID is 100%
reliable and that the local spindle will eventually fail. Can I lose data by
leaving WAL on the local spindle? Or are you suggesting two full RAID
systems?

--
D'Arcy J.M. Cain <darcy(at){druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-22 16:51:00
Message-ID: 3843.1051030260@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"D'Arcy J.M. Cain" <darcy(at)druid(dot)net> writes:
> Hmm. Is this safe? Let's assume for argument's sake that the RAID is 100%
> reliable and that the local spindle will eventually fail. Can I lose data by
> leaving WAL on the local spindle? Or are you suggesting two full RAID
> systems?

A two-disk mirror for the WAL would be good, yes; it needs redundancy
just as much as the main storage. But the discussion was about
performance not reliability; the OP hadn't mentioned that he wanted any
redundant storage.

regards, tom lane


From: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For the ametures. (related to "Are we losing momentum?")
Date: 2003-04-23 08:37:02
Message-ID: 200304230437.02814.darcy@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 22 April 2003 12:51, Tom Lane wrote:
> A two-disk mirror for the WAL would be good, yes; it needs redundancy
> just as much as the main storage. But the discussion was about
> performance not reliability; the OP hadn't mentioned that he wanted any
> redundant storage.

Ah. When you mentioned RAID I thought you were going after reliability. So I
am probably best in my environment to just leave everything on the RAID
device and let it do it's job as best that it can. It does seem to work
faster than using local storage, even SCSI.

--
D'Arcy J.M. Cain <darcy(at){druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.