Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-02-26 09:16:02
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F7FBEC@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > > If you're looking to just use GPU acceleration for improving
> > > individual queries, I would think that Robert's work around backend
> > > workers would be a more appropriate way to go, with the ability to
> > > move a working set of data from shared buffers and on-disk
> > > representation of a relation over to the GPU's memory, perform the
> operation, and then copy the results back.
> > >
> > The approach is similar to the Robert's work except for GPU adoption,
> > instead of multicore CPUs. So, I tried to review his work to apply the
> > facilities on my extension also.
>
> Good, I'd be very curious to hear how that might solve the issue for you,
> instead of using hte CustomScan approach..
>
I (plan to) use custom-scan of course. Once a relation is referenced
and optimizer decided GPU acceleration is cheaper, associated custom-
scan node read the data from underlying relation (or in-memory cache
if exists) then move to the shared memory buffer to deliver GPU
management background worker that launches asynchronous DMA one by one.
After that, custom-scan node receives filtered records via shared-
memory buffer, so it can construct tuples to be returned to the upper
node.

> > > "regular" PG tables, just to point out one issue, can be locked on a
> > > row-by-row basis, and we know exactly where in shared buffers to go
> > > hunt down the rows. How is that going to work here, if this is both
> a "regular"
> > > table and stored off in a GPU's memory across subsequent queries or
> > > even transactions?
> > >
> > It shall be handled "case-by-case" basis, I think. If row-level lock
> > is required over the table scan, custom-scan node shall return a tuple
> > being located on the shared buffer, instead of the cached tuples. Of
> > course, it is an option for custom-scan node to calculate qualifiers
> > by GPU with cached data and returns tuples identified by ctid of the cached
> tuples.
> > Anyway, it is not a significant problem.
>
> I think you're being a bit too hand-wavey here, but if we're talking about
> pre-scanning the data using PG before sending it to the GPU and then only
> performing a single statement on the GPU, we should be able to deal with
> it.
It's what I want to implement.

> I'm worried about your ideas to try and cache things on the GPU though,
> if you're not prepared to deal with locks happening in shared memory on
> the rows you've got cached out on the GPU, or hint bits, or the visibility
> map being updated, etc...
>
It does not remain any state/information on the GPU side. Things related
to PG internal stuff is job of CPU.

> > OK, I'll move the portion that will be needed commonly for other FDWs
> > into the backend code.
>
> Alright- but realize that there may be objections there on the basis that
> the code/structures which you're exposing aren't, and will not be, stable.
> I'll have to go back and look at them myself, certainly, and their history.
>
I see, but it is a process during code getting merged.

> > Yes. According to the previous discussion around postgres_fdw getting
> > merged, all we can trust on the remote side are built-in data types,
> > functions, operators or other stuffs only.
>
> Well, we're going to need to expand that a bit for aggregates, I'm afraid,
> but we should be able to define the API for those aggregates very tightly
> based on what PG does today and require that any FDW purporting to provides
> those aggregates do it the way PG does. Note that this doesn't solve all
> the problems- we've got other issues with regard to pushing aggregates down
> into FDWs that need to be solved.
>
I see. It probably needs more detailed investigation.

> > The custom-scan node is intended to perform on regular relations, not
> > only foreign tables. It means a special feature (like GPU
> > acceleration) can perform transparently for most of existing
> > applications. Usually, it defines regular tables for their work on
> > installation, not foreign tables. It is the biggest concern for me.
>
> The line between a foreign table and a local one is becoming blurred already,
> but still, if this is the goal then I really think the background worker
> is where you should be focused, not on this Custom Scan API. Consider that,
> once we've got proper background workers, we're going to need new nodes
> which operate in parallel (or some other rejiggering of the nodes- I don't
> pretend to know exactly what Robert is thinking here, and I've apparently
> forgotten it if he's posted it
> somewhere) and those interfaces may drive changes which would impact the
> Custom Scan API- or worse, make us deprecate or regret having added it
> because now we'll need to break backwards compatibility to add in the
> parallel node capability to satisfy the more general non-GPU case.
>
The custom-scan API is thin abstraction towards the plan node interface,
not tightly convinced with a particular use case, like GPU, remote-join
and so on. So, I'm quite optimistic for the future maintainability.
Also, please remind the discussion at the last developer meeting.
The purpose of custom-scan (we didn't name it at that time) is to avoid
unnecessary project branch for people who want to implement their own
special feature but no facilities to enhance optimizer/executor are
supported.
Even though we have in-core parallel execution feature by CPU, it also
makes sense to provide some unique implementation that may be suitable
for a specific region.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru Hanada 2014-02-26 09:17:46 Re: Custom Scan APIs (Re: Custom Plan node)
Previous Message Heikki Linnakangas 2014-02-26 09:04:36 Re: Unfortunate choice of short switch name in pgbench