Re: fast read of binary data

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: eildert(dot)groeneveld(at)fli(dot)bund(dot)de
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: fast read of binary data
Date: 2012-11-12 16:57:43
Message-ID: CAHyXU0xHR5m77OxaVqGc1PhRqJHHhR-eQwPWTWfETUmEo45iMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Nov 12, 2012 at 4:45 AM, Eildert Groeneveld
<eildert(dot)groeneveld(at)fli(dot)bund(dot)de> wrote:
> Dear All
>
> I am currently implementing using a compressed binary storage scheme
> genotyping data. These are basically vectors of binary data which may be
> megabytes in size.
>
> Our current implementation uses the data type bit varying.
>
> What we want to do is very simple: we want to retrieve such records from
> the database and transfer it unaltered to the client which will do
> something (uncompressing) with it. As massive amounts of data are to be
> moved, speed is of great importance, precluding any to and fro
> conversions.
>
> Our current implementation uses Perl DBI; we can retrieve the data ok,
> but apparently there is some converting going on.
>
> Further, we would like to use ODBC from Fortran90 (wrapping the
> C-library) for such transfers. However, all sorts funny things happen
> here which look like conversion issues.
>
> In old fashioned network database some decade ago (in pre SQL times)
> this was no problem. Maybe there is someone here who knows the PG
> internals sufficiently well to give advice on how big blocks of memory
> (i.e. bit varying records) can between transferred UNALTERED between
> backend and clients.
>
> looking forward to you response.

Fastest/best way to transfer binary data to/from postgres is going to
mean direct coding against libpq since most drivers wall you off from
the binary protocol (this may or may not be the case with ODBC). If I
were you I'd be writing C code to manage the database and linking the
C compiled object to the Fortran application. Assuming the conversion
doesn't go the way you want (briefly looking, there is a 'bytea as LO'
option you may want to explore), ODBC brings nothing but complication
in this regard unless your application has to support multiple
database vendors or you have zero C chops in-house.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2012-11-12 19:49:00 performance regression with 9.2
Previous Message Merlin Moncure 2012-11-12 14:52:57 Re: PostreSQL v9.2 uses a lot of memory in Windows XP