Re: JIT compiler for expressions

Lists: pgsql-hackers
From: Dmitry Melnik <dm(at)ispras(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Ruben Buchatskiy <ruben(at)ispras(dot)ru>, Roman Zhuykov <zhroma(at)ispras(dot)ru>, Eugene Sharygin <eush(at)ispras(dot)ru>
Subject: JIT compiler for expressions
Date: 2016-10-28 11:47:35
Message-ID: CADviLuNjQTh99o6E0LTi0Ygks=naW8SXHmgn=8P+aaBXKXa0pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello hackers,

We'd like to present our work on adding LLVM JIT compilation of expressions
in SQL queries for PostgreSQL. The source code (based on 9.6.1) along with
brief manual is available at our github: https://github.com/ispras/postgres .
Сurrent speedup for TPC-H Q1 is 20% (on 40GB workload). Please feel free to
test it and tell us what you think.

Currently, our JIT is used to compile expressions in every query, so for
short-running queries JIT compilation may take longer than query execution
itself. We plan to address it by using planner estimate to decide whether
it worth JIT compiling, also we may try parallel JIT compilation. But for
now we recommend testing it on a large workload in order to pay off the
compilation (we've tested on 40GB database for Q1).

The changes in PostgreSQL code itself are rather small, while the biggest
part of new code in our repository is autogenerated (it's LLVM IR
generators for PostgreSQL backend functions). The only real reason for
shipping prebuild_llvm_backend.cpp is that it takes patched LLVM version to
generate, otherwise it's generated right from PostgreSQL source code
(please see more on automatic backend generation at our github site). With
pre-generated cpp file, building our github PGSQL version w/JIT requires
only clean, non-patched LLVM 3.7.

JIT compilation was tested on Linux, and currently we have 5 actual tests
failing (which results in 24 errors in a regtest). It requires LLVM 3.7
(3.7.1) as build dependency (you can specify path to proper llvm-config
with --with-llvm-config= configure option, e.g. it could be named
llvm-config-3.7 on your system). Mac support is highly experimental, and
wasn't tested much, but if you like to give it a try, you can do it with
LLVM 3.7 from MacPorts or Homebrew.

This work is a part of our greater effort on implementing full JIT compiler
in PostgreSQL, where along with JITting expressions we've changed the
iteration model from Volcano-style to push-model and reimplemented code
generation with LLVM for most of Scan/Aggregation/Join methods. That
approach gives much better speedup (x4-5 times on Q1), but it takes many
code changes, so we're developing it as PostgreSQL extension. It's not
ready for release yet, but we're now working on performance, compatibility,
as well as how to make it easier to maintain by making it possible to build
both JIT compiler and the interpreter from the same source code. More
information about our full JIT compiler and related work is available in
presentation at LLVM Cauldron (http://llvm.org/devmtg/2016-09/slides/Melnik-
PostgreSQLLLVM.pdf ) and PGCon (https://www.pgcon.org/2016/
schedule/attachments/411_ISPRAS%20LLVM+Postgres%20Presentation.pdf ).
Also we're going to give a lightning talk at upcoming PGConf.EU in Tallinn,
and discuss the further development with PostgreSQL community. We'd
appreciate any feedback!

--
Best regards,
Dmitry Melnik
Institute for System Programming of the Russian Academy of Sciences
ISP RAS (www.ispras.ru/en/)


From: Greg Stark <stark(at)mit(dot)edu>
To: Dmitry Melnik <dm(at)ispras(dot)ru>
Cc: Eugene Sharygin <eush(at)ispras(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ruben Buchatskiy <ruben(at)ispras(dot)ru>, Roman Zhuykov <zhroma(at)ispras(dot)ru>
Subject: Re: JIT compiler for expressions
Date: 2016-10-30 07:40:56
Message-ID: CAM-w4HMcEa9SZjRyqZ7BzcMu-yE+-U68ULkFBqAPDJpYWHZ--g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This sounds amazing.

My only comment is that LLVM 3.7 is kind of old in the accelerated world of
LLVM. If you have patches to LLVM you need you won't have much success
submitting them as patches on 3.7.

The current stable release is 3.9 and the development snapshots are 4.0. I
know LLVM moves quickly and that makes it hard to try to track the
development. If you worked with 4.0 you might find the apis you're using
getting deprecated and rewritten several times while your project is under
development.


From: Andres Freund <andres(at)anarazel(dot)de>
To: Dmitry Melnik <dm(at)ispras(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org, Ruben Buchatskiy <ruben(at)ispras(dot)ru>, Roman Zhuykov <zhroma(at)ispras(dot)ru>, Eugene Sharygin <eush(at)ispras(dot)ru>
Subject: Re: JIT compiler for expressions
Date: 2016-10-30 07:52:34
Message-ID: 20161030075234.2hiepnibuse74ns7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi
,
On 2016-10-28 14:47:35 +0300, Dmitry Melnik wrote:
> We'd like to present our work on adding LLVM JIT compilation of expressions
> in SQL queries for PostgreSQL.

Great! I'm also working in the area, albeit with a, I think, a bit
different approach[1]. Is your goal to integrate this work into postgres
proper, or is this more an academic research project?

If the former, lets try to collaborate. If the latter, lets talk, so
we're not integrating something dumb ;)

[1]. So far I've basically converted expression evaluation, and tuple
deforming, into small interpreted (switch/computed goto opcode dispatch)
mini-languages, which then can be JITed. Adding a small handwritten
x86-64 JIT (out of fun, not because I think that's a good approach) also
resulted in quite noticeable speedups. Did you experiment with JITing
tuple deforming as well? The reason I was thinking of going in this
direction, is that it's a lot faster to compile such mini pieces of
code, and it already gives significant speedups. There still are
function calls to postgres functions, but they're all direct function
calls, instead of indirect ones.

> Currently, our JIT is used to compile expressions in every query, so for
> short-running queries JIT compilation may take longer than query execution
> itself. We plan to address it by using planner estimate to decide whether
> it worth JIT compiling, also we may try parallel JIT compilation. But for
> now we recommend testing it on a large workload in order to pay off the
> compilation (we've tested on 40GB database for Q1).

Could you give some estimates about how long JITing takes for you, say
for Q1? Different approaches here obviously have very different
tradeoffs.

> This work is a part of our greater effort on implementing full JIT compiler
> in PostgreSQL, where along with JITting expressions we've changed the
> iteration model from Volcano-style to push-model and reimplemented code
> generation with LLVM for most of Scan/Aggregation/Join methods. That
> approach gives much better speedup (x4-5 times on Q1), but it takes many
> code changes, so we're developing it as PostgreSQL extension.

FWIW, I think long term, we're going to want something like that in
core. I'm currently working on a "gradual" transformation towards that,
by *optionally* dealing with "batches" of tuples which get passed
around. Noticeable speedups, but not in the 4-5x range.

> Also we're going to give a lightning talk at upcoming PGConf.EU in Tallinn,
> and discuss the further development with PostgreSQL community. We'd
> appreciate any feedback!

Cool, lets chat a bit, I'm also here.

Greetings,

Andres Freund


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Dmitry Melnik <dm(at)ispras(dot)ru>, <pgsql-hackers(at)postgresql(dot)org>
Cc: Ruben Buchatskiy <ruben(at)ispras(dot)ru>, Roman Zhuykov <zhroma(at)ispras(dot)ru>, Eugene Sharygin <eush(at)ispras(dot)ru>
Subject: Re: JIT compiler for expressions
Date: 2016-11-18 18:22:58
Message-ID: 0bcefb1e-9c5f-0ccd-552f-194012c1836e@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/28/16 6:47 AM, Dmitry Melnik wrote:
> We'd like to present our work on adding LLVM JIT compilation of
> expressions in SQL queries for PostgreSQL. The source code (based on
> 9.6.1) along with brief manual is available at our
> github: https://github.com/ispras/postgres
> <https://github.com/ispras/postgres> . Сurrent speedup for TPC-H Q1 is
> 20% (on 40GB workload). Please feel free to test it and tell us what you
> think.

For anyone looking to experiment with some of this stuff, it's possible
to get LLVM-based JIT via plpython and numba as well.

https://github.com/AustinPUG/PGDay2016/blob/master/Numba%20inside%20PostgreSQL.ipynb
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)