Re: PGXS problem with pdftotext

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: PGXS problem with pdftotext
Date: 2009-07-02 20:20:42
Message-ID: 4A4CD04A0200002500028307@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been wondering whether anyone else would want to use the
functions we wrote to extract text from PDF documents stored in bytea
columns. If so, I would need to sort out the problems I've been
having with builds through the PGXS techniques. Here's the directory,
after a successful build under contrib:

kgrittn(at)project-db:~/postgresql-8.3.7/contrib/pdftotext> ll
total 108
-rw-r--r-- 1 kgrittn dbas 22990 2009-04-14 17:14 libpdftotext.a
lrwxrwxrwx 1 kgrittn dbas 19 2009-04-14 17:14 libpdftotext.so ->
libpdftotext.so.0.0
lrwxrwxrwx 1 kgrittn dbas 19 2009-04-14 17:14 libpdftotext.so.0 ->
libpdftotext.so.0.0
-rwxr-xr-x 1 kgrittn dbas 21666 2009-04-14 17:14 libpdftotext.so.0.0
-rw-r--r-- 1 kgrittn dbas 443 2009-04-14 17:14 Makefile
-rw-r--r-- 1 kgrittn dbas 2980 2008-07-22 13:00 pdftotext.c
-rw-r--r-- 1 kgrittn dbas 14184 2009-04-14 17:14 pdftotext.o
-rw-r--r-- 1 kgrittn dbas 285 2009-04-14 17:14 pdftotext.sql
-rw-r--r-- 1 kgrittn dbas 285 2008-07-22 13:00 pdftotext.sql.in
-rw-r--r-- 1 kgrittn dbas 4658 2009-04-13 17:02 poppler_compat.cc
-rw-r--r-- 1 kgrittn dbas 355 2008-07-22 13:00 poppler_compat.h
-rw-r--r-- 1 kgrittn dbas 8208 2009-04-14 17:14 poppler_compat.o
-rw-r--r-- 1 kgrittn dbas 733 2008-07-22 13:00 README.pdftotext

Here's the Makefile contents:

MODULE_big = pdftotext
OBJS = pdftotext.o poppler_compat.o
DATA_built = pdftotext.sql
DOCS = README.pdftotext

PG_CPPFLAGS =-I/usr/include/poppler -shared -fpic
SHLIB_LINK = -lpoppler -L/usr/local/lib

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/pdftotext
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

If we export PGXS=1 and make ; sudo make install outside the
PostgreSQL build tree, it seems to build and deploy OK, but it can't
find the poppler implementation at run time. If we do it in the build
tree, all is good. Where's the problem? Is the SHLIB_LINK setting
proper? What's the right way to do this?

BTW, libpoppler is GPL licensed, and always reminds me of what
Churchill said about democracy, if that affects anyone's interest in
the code. You're likely to need to tweak the code based on the
particular version of libpoppler you're using. If you use an older
version of libpoppler, it can crash the whole PostgreSQL environment
if you try to use it with a PDF using newer features. :-(

If anyone's still interested, and I can fix the build problem, I'll
throw the source code onto pgfoundry.

-Kevin

"It has been said that democracy is the worst form of government
except all the others that have been tried."


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-02 20:55:49
Message-ID: 19547.1246568149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> PG_CPPFLAGS =-I/usr/include/poppler -shared -fpic
> SHLIB_LINK = -lpoppler -L/usr/local/lib

It doesn't seem appropriate to put -shared or -fpic into PG_CPPFLAGS.
If you need those, the makefiles should add them automatically.

The other thing that seems peculiar is looking for the include files
in /usr/include and the library in /usr/local/lib. I've never
seen any package install itself like that --- either everything goes
under /usr/local or nothing does. I suspect you might have two
incompatible poppler installations on the machine and you're picking
up the wrong combination of files.

Running ldd or local equivalent on pdftotext.so might help you determine
what's going on as far as finding the library goes.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-02 21:03:25
Message-ID: 4A4CDA4D020000250002831B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> PG_CPPFLAGS =-I/usr/include/poppler -shared -fpic
>> SHLIB_LINK = -lpoppler -L/usr/local/lib
>
> It doesn't seem appropriate to put -shared or -fpic into
> PG_CPPFLAGS. If you need those, the makefiles should add them
> automatically.
>
> The other thing that seems peculiar is looking for the include files
> in /usr/include and the library in /usr/local/lib. I've never
> seen any package install itself like that --- either everything goes
> under /usr/local or nothing does. I suspect you might have two
> incompatible poppler installations on the machine and you're picking
> up the wrong combination of files.
>
> Running ldd or local equivalent on pdftotext.so might help you
> determine what's going on as far as finding the library goes.

Thanks. Let's just say that the poppler build from source has not
ever gone as smoothly as the most eventful PostgreSQL build from
source. We've had to do much ad hoc hacking to get anything usable,
and I'm sure we've made some bad choices in the process. I'll take a
close look at where everything has landed in light of your advice, and
see if I can arrange things more sensibly.

Does it seem likely that fixing these issues will allow PGXS to work?

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-02 21:39:08
Message-ID: 20622.1246570748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Does it seem likely that fixing these issues will allow PGXS to work?

Couldn't say. It would be useful to compare ldd output for
pdftotext.so built both ways.

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 06:15:45
Message-ID: CDFB381C-2321-47AF-A837-F819DE83217E@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Le 2 juil. 09 à 22:20, Kevin Grittner a écrit :
> Here's the Makefile contents:

You could compare to this:
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/backports/uuid-ossp/Makefile?rev=1.1.1.1&content-type=text/x-cvsweb-markup

> SHLIB_LINK = -lpoppler -L/usr/local/lib
SHLIB_LINK += $(OSSP_UUID_LIBS)
Dunno how far it'll get you, but it may help some :)
Regards,
--
dim


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 18:23:39
Message-ID: 4A4E065B020000250002835B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I cleaned up the poppler build situation, and all looks good except:

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> PG_CPPFLAGS =-I/usr/include/poppler -shared -fpic
>
> It doesn't seem appropriate to put -shared or -fpic into
> PG_CPPFLAGS. If you need those, the makefiles should add them
> automatically.

Leaving off -shared was OK, but when I left off -fpic, I got this:

/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../x86_64-suse-linux/bin/ld:
poppler_compat.o: relocation R_X86_64_32 against `a local symbol' can
not be used when making a shared object; recompile with -fPIC
poppler_compat.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [libpdftotext.so.0.0] Error 1

With -fPIC or -fpic in my Makefile, PGXS now seems to work as
intended. Is it worth doing anything to check on why that is needed
or how to get rid of it? Might it have something to do with compiling
both .c and .cc files?

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 18:45:01
Message-ID: 22384.1246646701@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Leaving off -shared was OK, but when I left off -fpic, I got this:

> /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../x86_64-suse-linux/bin/ld:
> poppler_compat.o: relocation R_X86_64_32 against `a local symbol' can
> not be used when making a shared object; recompile with -fPIC
> poppler_compat.o: could not read symbols: Bad value
> collect2: ld returned 1 exit status
> make: *** [libpdftotext.so.0.0] Error 1

Huh. On Linux platforms, the PG makefiles should include -fpic in
CFLAGS (via CFLAGS_SL) automatically; you should not need to repeat it
in CPPFLAGS. For instance, if I go into contrib/adminpack and make, I see

sed 's,MODULE_PATHNAME,$libdir/adminpack,g' adminpack.sql.in >adminpack.sql
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g -fpic -I../../src/interfaces/libpq -I. -I../../src/include -D_GNU_SOURCE -c -o adminpack.o adminpack.c
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g -fpic -shared adminpack.o -L../../src/port -Wl,-rpath,'/home/tgl/testversion/lib' -o adminpack.so

What do you get? What does pg_config report for the various FLAGS
variables?

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 18:54:31
Message-ID: 4A4E0D970200002500028365@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> What do you get?

sed 's,MODULE_PATHNAME,$libdir/adminpack,g' adminpack.sql.in
>adminpack.sql
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -fpic -I/usr/local/pgsql-8.3.7/include -I.
-I/usr/local/pgsql-8.3.7/include/server
-I/usr/local/pgsql-8.3.7/include/internal -D_GNU_SOURCE
-I/usr/include/libxml2 -c -o adminpack.o adminpack.c
ar crs libadminpack.a adminpack.o
ranlib libadminpack.a
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -fpic -shared -Wl,-soname,libadminpack.so.0 adminpack.o
-L/usr/local/pgsql-8.3.7/lib -Wl,-rpath,'/usr/local/pgsql-8.3.7/lib' -o
libadminpack.so.0.0

> What does pg_config report for the various FLAGS variables?

CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/usr/local/pgsql-8.3.7/lib'
LDFLAGS_SL =

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 19:02:21
Message-ID: 4A4E0F6D0200002500028370@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> What do you get?

More to the point, here's what I get when I use PGXS with my pdf code.

sed 's,MODULE_PATHNAME,$libdir/pdftotext,g' pdftotext.sql.in
>pdftotext.sql
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -fpic -I/usr/local/include/poppler -I.
-I/usr/local/pgsql-8.3.7/include/server
-I/usr/local/pgsql-8.3.7/include/internal -D_GNU_SOURCE
-I/usr/include/libxml2 -c -o pdftotext.o pdftotext.c
g++ -I/usr/local/include/poppler -I.
-I/usr/local/pgsql-8.3.7/include/server
-I/usr/local/pgsql-8.3.7/include/internal -D_GNU_SOURCE
-I/usr/include/libxml2 -c -o poppler_compat.o poppler_compat.cc
ar crs libpdftotext.a pdftotext.o poppler_compat.o
ranlib libpdftotext.a
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -fpic -shared -Wl,-soname,libpdftotext.so.0 pdftotext.o
poppler_compat.o -L/usr/local/pgsql-8.3.7/lib -lpoppler
-Wl,-rpath,'/usr/local/pgsql-8.3.7/lib' -o libpdftotext.so.0.0
/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../x86_64-suse-linux/bin/ld:
poppler_compat.o: relocation R_X86_64_32 against `a local symbol' can
not be used when making a shared object; recompile with -fPIC
poppler_compat.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [libpdftotext.so.0.0] Error 1

Since the gcc line has it, it must be the g++ line that's the problem?

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 19:10:16
Message-ID: 26095.1246648216@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What does pg_config report for the various FLAGS variables?

> CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
> CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
> -fwrapv -g
> CFLAGS_SL = -fpic
> LDFLAGS = -Wl,-rpath,'/usr/local/pgsql-8.3.7/lib'
> LDFLAGS_SL =

Well, that looks about right, so the next question is why the CFLAGS
value isn't getting used in your build. What's the whole output of
make when you try to build your module?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 19:12:26
Message-ID: 470.1246648346@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Since the gcc line has it, it must be the g++ line that's the problem?

Hmm, try adding
CXXFLAGS = $(CFLAGS)

Although in general we don't try very hard to support C++ code inside
the backend.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PGXS problem with pdftotext
Date: 2009-07-03 19:19:41
Message-ID: 4A4E137D0200002500028375@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Hmm, try adding
> CXXFLAGS = $(CFLAGS)

Thanks, that worked; I don't need to specify -fpic in my file if I put
the above line in.

> Although in general we don't try very hard to support C++ code
> inside the backend.

I try to avoid it when possible. The C++ code is the thinnest wrapper
we could arrange around the poppler code to allow access from the C
code.

Would it make sense to add the above to the PGXS file somewhere, for
those cases (like this) when someone has to access some existing C++
code base?

-Kevin