Re: Hadoop backend?

Lists: pgsql-hackers
From: Paul Sheer <paulsheer(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Hadoop backend?
Date: 2009-02-21 20:17:30
Message-ID: c67e3dc60902211217p66906a35pe2cabe2c832e7b2d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hadoop backend for PostGreSQL....

A problem that my client has, and one that I come across often,
is that a database seems to always be associated with a particular
physical machine, a physical machine that has to be upgraded,
replaced, or otherwise maintained.

Even if the database is replicated, it just means there are two or
more machines. Replication is also a difficult thing to properly
manage.

With a distributed data store, the data would become a logical
object - no adding or removal of machines would affect the data.
This is an ideal that would remove a tremendous maintenance
burden from many sites ---- well, at least the one's I have worked
at as far as I can see.

Does anyone know of plans to implement PostGreSQL over Hadoop?

Yahoo seems to be doing this:
http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html

But they store tables column-ways for their performance situation.
If one is doing a lot of inserts I don't think this is most efficient - ?

Has Yahoo put the source code for their work online?

Many thanks for any pointers.

-paul


From: pi song <pi(dot)songs(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-22 02:37:29
Message-ID: 1b29507a0902211837h5dbabfc0yde274011b00317ce@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

1) Hadoop file system is very optimized for mostly read operation2) As of a
few months ago, hdfs doesn't support file appending.

There might be a bit of impedance to make them go together.

However, I think it should a very good initiative to come up with ideas to
be able to run postgres on distributed file system (doesn't have to be
specific hadoop).

Pi Song

On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer <paulsheer(at)gmail(dot)com> wrote:

> Hadoop backend for PostGreSQL....
>
> A problem that my client has, and one that I come across often,
> is that a database seems to always be associated with a particular
> physical machine, a physical machine that has to be upgraded,
> replaced, or otherwise maintained.
>
> Even if the database is replicated, it just means there are two or
> more machines. Replication is also a difficult thing to properly
> manage.
>
> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.
>
> Does anyone know of plans to implement PostGreSQL over Hadoop?
>
> Yahoo seems to be doing this:
>
> http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html
>
> But they store tables column-ways for their performance situation.
> If one is doing a lot of inserts I don't think this is most efficient - ?
>
> Has Yahoo put the source code for their work online?
>
> Many thanks for any pointers.
>
> -paul
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: pi(dot)songs(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-22 16:39:02
Message-ID: 136182E5-BC7E-4AEB-A2E8-4C225B2F9095@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi ...

i think the easiest way to do this is to simply add a mechanism to
functions which allows a function to "stream" data through.
it would basically mean losing join support as you cannot "read data
again" in a way which is good enough good enough for joining with the
function providing the data from hadoop.

hannu ( I think) brought up some concept as well some time ago.

i think a straight forward implementation would not be too hard.

best regards,

hans

On Feb 22, 2009, at 3:37 AM, pi song wrote:

> 1) Hadoop file system is very optimized for mostly read operation
> 2) As of a few months ago, hdfs doesn't support file appending.
>
> There might be a bit of impedance to make them go together.
>
> However, I think it should a very good initiative to come up with
> ideas to be able to run postgres on distributed file system (doesn't
> have to be specific hadoop).
>
> Pi Song
>
> On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer <paulsheer(at)gmail(dot)com>
> wrote:
> Hadoop backend for PostGreSQL....
>
> A problem that my client has, and one that I come across often,
> is that a database seems to always be associated with a particular
> physical machine, a physical machine that has to be upgraded,
> replaced, or otherwise maintained.
>
> Even if the database is replicated, it just means there are two or
> more machines. Replication is also a difficult thing to properly
> manage.
>
> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.
>
> Does anyone know of plans to implement PostGreSQL over Hadoop?
>
> Yahoo seems to be doing this:
> http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html
>
> But they store tables column-ways for their performance situation.
> If one is doing a lot of inserts I don't think this is most
> efficient - ?
>
> Has Yahoo put the source code for their work online?
>
> Many thanks for any pointers.
>
> -paul
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pi(dot)songs(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-22 20:47:15
Message-ID: 603c8f070902221247m4b7d412o2945de7e617ebf15@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 21, 2009 at 9:37 PM, pi song <pi(dot)songs(at)gmail(dot)com> wrote:
> 1) Hadoop file system is very optimized for mostly read operation
> 2) As of a few months ago, hdfs doesn't support file appending.
> There might be a bit of impedance to make them go together.
> However, I think it should a very good initiative to come up with ideas to
> be able to run postgres on distributed file system (doesn't have to be
> specific hadoop).

In theory, I think you could make postgres work on any type of
underlying storage you like by writing a second smgr implementation
that would exist alongside md.c. The fly in the ointment is that
you'd need a more sophisticated implementation of this line of code,
from smgropen:

reln->smgr_which = 0; /* we only have md.c at present */

Logically, it seems like the choice of smgr should track with the
notion of a tablespace. IOW, you might to have one tablespace that is
stored on a magnetic disk (md.c) and another that is stored on your
hypothetical distributed filesystem (hypodfs.c). I'm not sure how
hard this would be to implement, but I don't think smgropen() is in a
position to do syscache lookups, so probably not that easy.

...Robert


From: pi song <pi(dot)songs(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-22 22:18:52
Message-ID: 1b29507a0902221418u4fcb57b9ub891b69efe516ccc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

One more problem is that data placement on HDFS is inherent, meaning you
have no explicit control. Thus, you cannot place two sets of data which are
likely to be joined together on the same node = uncontrollable latency
during query processing.
Pi Song

On Mon, Feb 23, 2009 at 7:47 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sat, Feb 21, 2009 at 9:37 PM, pi song <pi(dot)songs(at)gmail(dot)com> wrote:
> > 1) Hadoop file system is very optimized for mostly read operation
> > 2) As of a few months ago, hdfs doesn't support file appending.
> > There might be a bit of impedance to make them go together.
> > However, I think it should a very good initiative to come up with ideas
> to
> > be able to run postgres on distributed file system (doesn't have to be
> > specific hadoop).
>
> In theory, I think you could make postgres work on any type of
> underlying storage you like by writing a second smgr implementation
> that would exist alongside md.c. The fly in the ointment is that
> you'd need a more sophisticated implementation of this line of code,
> from smgropen:
>
> reln->smgr_which = 0; /* we only have md.c at present */
>
> Logically, it seems like the choice of smgr should track with the
> notion of a tablespace. IOW, you might to have one tablespace that is
> stored on a magnetic disk (md.c) and another that is stored on your
> hypothetical distributed filesystem (hypodfs.c). I'm not sure how
> hard this would be to implement, but I don't think smgropen() is in a
> position to do syscache lookups, so probably not that easy.
>
> ...Robert
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pi(dot)songs(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-23 02:09:16
Message-ID: 603c8f070902221809w21b084ddra1b1636f31699959@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 22, 2009 at 5:18 PM, pi song <pi(dot)songs(at)gmail(dot)com> wrote:
> One more problem is that data placement on HDFS is inherent, meaning you
> have no explicit control. Thus, you cannot place two sets of data which are
> likely to be joined together on the same node = uncontrollable latency
> during query processing.
> Pi Song

It would only be possible to have the actual PostgreSQL backends
running on a single node anyway, because they use shared memory to
hold lock tables and things. The advantage of a distributed file
system would be that you could access more storage (and more system
buffer cache) than would be possible on a single system (or perhaps
the same amount but at less cost). Assuming some sort of
per-tablespace control over the storage manager, you could put your
most frequently accessed data locally and the less frequently accessed
data into the DFS.

But you'd still have to pull all the data back to the master node to
do anything with it. Being able to actually distribute the
computation would be a much harder problem. Currently, we don't even
have the ability to bring multiple CPUs to bear on (for example) a
large sequential scan (even though all the data is on a single node).

...Robert


From: pi song <pi(dot)songs(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-23 04:56:59
Message-ID: 1b29507a0902222056h7c576a65k50ab572c4da601ff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 23, 2009 at 3:56 PM, pi song <pi(dot)songs(at)gmail(dot)com> wrote:

> I think the point that you can access more system cache is right but that
> doesn't mean it will be more efficient than accessing from your local disk.
> Take Hadoop for example, your request for file content will have to go to
> Namenode (file chunk indexing service) and then you go ask the data node
> which then provides you data. Assuming that you're working on a large
> dataset, the probability of the data chunk you need staying in system cache
> is very low therefore most of the time you end up reading from a remote
> disk.
>
> I've got a better idea. How about we make the buffer pool multilevel? The
> first level is the current one. The second level represents memory from
> remote machines. Things that are used less often should stay on the second
> level. Has anyone ever thought about something like this before?
>
> Pi Song
>
> On Mon, Feb 23, 2009 at 1:09 PM, Robert Haas <robertmhaas(at)gmail(dot)com>wrote:
>
>> On Sun, Feb 22, 2009 at 5:18 PM, pi song <pi(dot)songs(at)gmail(dot)com> wrote:
>> > One more problem is that data placement on HDFS is inherent, meaning you
>> > have no explicit control. Thus, you cannot place two sets of data which
>> are
>> > likely to be joined together on the same node = uncontrollable latency
>> > during query processing.
>> > Pi Song
>>
>> It would only be possible to have the actual PostgreSQL backends
>> running on a single node anyway, because they use shared memory to
>> hold lock tables and things. The advantage of a distributed file
>> system would be that you could access more storage (and more system
>> buffer cache) than would be possible on a single system (or perhaps
>> the same amount but at less cost). Assuming some sort of
>> per-tablespace control over the storage manager, you could put your
>> most frequently accessed data locally and the less frequently accessed
>> data into the DFS.
>>
>> But you'd still have to pull all the data back to the master node to
>> do anything with it. Being able to actually distribute the
>> computation would be a much harder problem. Currently, we don't even
>> have the ability to bring multiple CPUs to bear on (for example) a
>> large sequential scan (even though all the data is on a single node).
>>
>> ...Robert
>>
>
>


From: Paul Sheer <paulsheer(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Hadoop backend?
Date: 2009-02-23 14:08:42
Message-ID: c67e3dc60902230608x12fd290dqc369b667ddb5b92f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> It would only be possible to have the actual PostgreSQL backends
> running on a single node anyway, because they use shared memory to

This is not problem: Performance is a secondary consideration (at least
as far as the problem I was referring to).

The primary usefulness is to have the data be a logical entity rather
than a physical entity so that one can maintain physical machines
without having to worry to much about where-is-the-data.

At the moment, most databases suffer from the problem of occasionally
having to move the data from one place to another. This is a major
nightmare that happens once every few years for most DBAs.
It happens because a system needs a soft/hard upgrade, or a disk
enlarged, or because a piece of hardware fails.

I have also found it's no use having RAID or ZFS. Each of these ties
the data to an OS installation. If the OS needs to be reinstalled, all
the data has to be manually moved in a way that is, well... dangerous.

If there is only one machine running postgres that is fine too: I can have
a second identical machine on standby in case of a hardware failure.
That means a short amount of downtime - most people can live
with that.

I read somewhere that replication was one of the goals of postgres's
coming development efforts. Personally I think hadoop might be
a better solution - *shrug*.

Thoughts/comments ??

-paul


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Paul Sheer <paulsheer(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hadoop backend?
Date: 2009-02-23 14:35:01
Message-ID: 603c8f070902230635p23f2ee56v26fdf1566cce2cda@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 23, 2009 at 9:08 AM, Paul Sheer <paulsheer(at)gmail(dot)com> wrote:
>> It would only be possible to have the actual PostgreSQL backends
>> running on a single node anyway, because they use shared memory to
>
> This is not problem: Performance is a secondary consideration (at least
> as far as the problem I was referring to).
>
> The primary usefulness is to have the data be a logical entity rather
> than a physical entity so that one can maintain physical machines
> without having to worry to much about where-is-the-data.
>
> At the moment, most databases suffer from the problem of occasionally
> having to move the data from one place to another. This is a major
> nightmare that happens once every few years for most DBAs.
> It happens because a system needs a soft/hard upgrade, or a disk
> enlarged, or because a piece of hardware fails.
>
> I have also found it's no use having RAID or ZFS. Each of these ties
> the data to an OS installation. If the OS needs to be reinstalled, all
> the data has to be manually moved in a way that is, well... dangerous.
>
> If there is only one machine running postgres that is fine too: I can have
> a second identical machine on standby in case of a hardware failure.
> That means a short amount of downtime - most people can live
> with that.

I think the performance aspect of things has to be considered. If the
system introduces too much overhead it won't be useful in practice.
But apart from that I agree with all of this.

> I read somewhere that replication was one of the goals of postgres's
> coming development efforts. Personally I think hadoop might be
> a better solution - *shrug*.
>
> Thoughts/comments ??

I think the two are solving different problems.

...Robert


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Paul Sheer <paulsheer(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Hadoop backend?
Date: 2009-02-23 14:38:19
Message-ID: 49A2B4DB.2040108@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Paul Sheer wrote
> I have also found it's no use having RAID or ZFS. Each of these ties
> the data to an OS installation. If the OS needs to be reinstalled, all
> the data has to be manually moved in a way that is, well... dangerous.
>
>

How about network storage, fiber attach? If you move the db you only
need to redirect the LUN(s) to a new WWN.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Paul Sheer <paulsheer(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Hadoop backend?
Date: 2009-02-23 20:13:33
Message-ID: 49A3036D.9060901@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Paul Sheer wrote:
> This is not problem: Performance is a secondary consideration (at least
> as far as the problem I was referring to).

Well, if you don't mind your database running .. ehm.. creeping several
orders of magnitudes slower, you might also be interested in
Single-System Image Clustering Systems [1], like Beowulf, Kerrighed [2],
OpenSSI [3], etc.. Besides distributed filesystems, those also provide
transparent shared memory across nodes.

> The primary usefulness is to have the data be a logical entity rather
> than a physical entity so that one can maintain physical machines
> without having to worry to much about where-is-the-data.

There are lots of solutions offering that already. In what way should
Hadoop be better than any of those existing ones? For a slightly
different example, you can get equivalent functionality on the block
devices layer with DRBD [4], which is in successful use for Postgres as
well.

The main challenge with distributed filesystems remains reliable failure
detection and ensuring that only exactly one node is alive at any time.

> At the moment, most databases suffer from the problem of occasionally
> having to move the data from one place to another. This is a major
> nightmare that happens once every few years for most DBAs.
> It happens because a system needs a soft/hard upgrade, or a disk
> enlarged, or because a piece of hardware fails.

You are comparing to standalone nodes here, which doesn't make much
sense, IMO.

> I have also found it's no use having RAID or ZFS. Each of these ties
> the data to an OS installation. If the OS needs to be reinstalled, all
> the data has to be manually moved in a way that is, well... dangerous.

I'm thinking more of Lustre, GFS, OCFS, AFS or some such. Compare with
those!

> If there is only one machine running postgres that is fine too: I can have
> a second identical machine on standby in case of a hardware failure.
> That means a short amount of downtime - most people can live
> with that.

What most people have trouble with is a master that revives and suddenly
confuses the new master (old slave).

> I read somewhere that replication was one of the goals of postgres's
> coming development efforts. Personally I think hadoop might be
> a better solution - *shrug*.

I'm not convinced at all. The trouble is not the replication of the data
on disk, it's rather the data in shared memory which poses the hard
problems (locks, caches, etc..). The former is solved already, the later
is a tad harder to solve. See [5] for my approach (showing my bias).

What I do find interesting about Hadoop is the MapReduce approach, but
lots more than writing another "storage backend" is required, if you
want to make use of that for Postgres. Greenplum claims to have
implemented MapReduce for their Database [6], however, to me it looks
like that is working a couple of layers above the filesystem.

Regards

Markus Wanner

[1]: Wikipedia: Single-System Image Clustering
http://en.wikipedia.org/wiki/Single-system_image

[2]: http://www.kerrighed.org/

[3]: http://www.openssi.org/

[4]: http://www.drbd.org/

[5]: Postgres-R:
http://www.postgres-r.org/

[6]: Greenplum MapReduce
http://www.greenplum.com/resources/mapreduce/


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pi(dot)songs(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-23 23:03:17
Message-ID: 36e682920902231503t5009cc8co2bc10dd43de784ac@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> In theory, I think you could make postgres work on any type of
> underlying storage you like by writing a second smgr implementation
> that would exist alongside md.c. The fly in the ointment is that
> you'd need a more sophisticated implementation of this line of code,
> from smgropen:
>
> reln->smgr_which = 0; /* we only have md.c at present */

I believe there is more than that which would need to be done nowadays. I
seem to recall that the storage manager abstraction has slowly been
dedicated/optimized for md over the past 6 years or so. It may even be
easier/preferred to write a hadoop specific access method depending on what
you're looking for from hadoop.

--
Jonah H. Harris, Senior DBA
myYearbook.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pi(dot)songs(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-24 00:24:43
Message-ID: 28913.1235435083@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:
> I believe there is more than that which would need to be done nowadays. I
> seem to recall that the storage manager abstraction has slowly been
> dedicated/optimized for md over the past 6 years or so.

As far as I can tell, the PG storage manager API is at the wrong level
of abstraction for pretty much everything. These days, everything we do
is atop the Unix filesystem API, and anything that smgr might have been
able to do for us is getting handled in kernel filesystem code or device
drivers. (Back in the eighties, when it was more plausible for PG to do
direct device access, maybe smgr was good for something; but no more.)

It's interesting to speculate about where we could draw an abstraction
boundary that would be more useful. I don't think the MySQL guys got it
right either...

regards, tom lane


From: pi song <pi(dot)songs(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-24 04:20:33
Message-ID: 1b29507a0902232020l5c40e4c3jea36338fafbd244@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

| I believe there is more than that which would need to be done
nowadays. I seem to recall that the storage manager|
| abstraction has slowly been dedicated/optimized for md over the past 6
years or so. It may even be easier/preferred
| to write a hadoop specific access method depending on what you're
looking for from hadoop.

I think you're very right. What Postgres needs is access method abstraction.
One should be able to plug in access method for SSD or network file systems
if appropriate. I don't talk about MapReduce bit in Hadoop because I think
that's a different story. What you need for MapReduce are 1) data store
which feeds you data and then 2) MapReduce does the query processing. This
has nothing to share with Postgres query processor in common. If you just
want data from Postgres then it should be easier to build postgres data
feeder in Hadoop (which might even already exist).

Pi Song

On Tue, Feb 24, 2009 at 11:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:
> > I believe there is more than that which would need to be done nowadays.
> I
> > seem to recall that the storage manager abstraction has slowly been
> > dedicated/optimized for md over the past 6 years or so.
>
> As far as I can tell, the PG storage manager API is at the wrong level
> of abstraction for pretty much everything. These days, everything we do
> is atop the Unix filesystem API, and anything that smgr might have been
> able to do for us is getting handled in kernel filesystem code or device
> drivers. (Back in the eighties, when it was more plausible for PG to do
> direct device access, maybe smgr was good for something; but no more.)
>
> It's interesting to speculate about where we could draw an abstraction
> boundary that would be more useful. I don't think the MySQL guys got it
> right either...
>
> regards, tom lane
>


From: Paul Sheer <paulsheer(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pi(dot)songs(at)gmail(dot)com
Subject: Re: Hadoop backend?
Date: 2009-02-24 10:01:20
Message-ID: c67e3dc60902240201n37b96391j6210ea1236c946b7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
>
> As far as I can tell, the PG storage manager API is at the wrong level
> of abstraction for pretty much everything. These days, everything we do
> is atop the Unix filesystem API, and anything that smgr might have been
>

Is there a complete list of filesystem API calls somewhere that I can get my
head around?
At least this will give me an idea of the scope of the effort.

> should be easier to build postgres data feeder in Hadoop (which might even
already exist).

do you know of a url?

-paul


From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pi(dot)songs(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-24 12:55:11
Message-ID: D62A3AB4-243C-4721-8F46-E9EA0F319CC4@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

why not just stream it in via set-returning functions and make sure
that we can mark a set returning function as "STREAMABLE" or so (to
prevent joins, whatever).
is it the easiest way to get it right and it helps in many other cases.
i think that the storage manager is definitely the wrong place to do
this.

it is also easy to use more than just one backend then if you get the
interface code right.

regards,

hans

On Feb 24, 2009, at 12:03 AM, Jonah H. Harris wrote:

> On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> In theory, I think you could make postgres work on any type of
> underlying storage you like by writing a second smgr implementation
> that would exist alongside md.c. The fly in the ointment is that
> you'd need a more sophisticated implementation of this line of code,
> from smgropen:
>
> reln->smgr_which = 0; /* we only have md.c at present */
>
> I believe there is more than that which would need to be done
> nowadays. I seem to recall that the storage manager abstraction has
> slowly been dedicated/optimized for md over the past 6 years or so.
> It may even be easier/preferred to write a hadoop specific access
> method depending on what you're looking for from hadoop.
>
> --
> Jonah H. Harris, Senior DBA
> myYearbook.com
>

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pi(dot)songs(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-24 13:08:32
Message-ID: 49A3F150.2040309@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> It's interesting to speculate about where we could draw an abstraction
> boundary that would be more useful. I don't think the MySQL guys got it
> right either...

The supposed smgr abstraction of PostgreSQL, which tells more or less
how to get a byte to the disk, is quite far away from what MySQL calls a
storage engine, which has things like open table, scan table, drop table
on a logical level (meaning open table, not open heap).

To my judgement, neither of these approaches is terribly useful from a
current, practical point of view.

In any case, in order to solve the "where to abstract" question, you'd
probably want to have one or two other storage APIs that you seriously
want to integrate, and then you can analyze how to unify them.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Paul Sheer <paulsheer(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-24 19:30:12
Message-ID: 49A44AC4.8040002@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.

Two things:

1) Hadoop is the wrong technology. It's not designed to support
transactional operations.

2) Transactional operations are, in general, your Big Obstacle for doing
anything in the way of a distributed storage manager.

It's possible you could make both of the above "go away" if you were
planning for a DW platform in which transactions weren't important.
However, that would have to become an incompatible fork of PostgreSQL.

AFAIK, the Yahoo platform does not involve Hadoop at all.

--Josh


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Paul Sheer <paulsheer(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-07-22 03:29:22
Message-ID: 4A668792.7060601@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Paul Sheer wrote:
> Hadoop backend for PostGreSQL....

Resurrecting an old thread, it seems some guys at Yale implemented
something very similar to what this thread was discussing.

http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html
> >
> >It's an open source stack that includes PostgreSQL Hadoop, and Hive, along
> >with some glue between PostgreSQL and Hadoop, a catalog, a data loader, and
> >an interface that accepts queries in MapReduce or SQL and generates query
> >plans that are processed partly in Hadoop and partly in different PostgreSQL
> >instances spread across many nodes in a shared-nothing cluster of machines.

Their detailed paper is here:

http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf

According to the paper, it scales very well.

> A problem that my client has, and one that I come across often,
> is that a database seems to always be associated with a particular
> physical machine, a physical machine that has to be upgraded,
> replaced, or otherwise maintained.
>
> Even if the database is replicated, it just means there are two or
> more machines. Replication is also a difficult thing to properly
> manage.
>
> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.
>
> Does anyone know of plans to implement PostGreSQL over Hadoop?
>
> Yahoo seems to be doing this:
> http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html
>
> But they store tables column-ways for their performance situation.
> If one is doing a lot of inserts I don't think this is most efficient - ?
>
> Has Yahoo put the source code for their work online?
>
> Many thanks for any pointers.
>
> -paul
>