Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-10 08:03:43
Message-ID: AANLkTikfwBxBpBGnc0heTFjCmj-LCFY7VLAHD+BRzKvo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Updated JSON datatype patch. It cleans up the major problems that
have been discussed, and it's very close to being commit-worthy (I
think). The major issues as I see them are:

* Contains several utility functions that may be useful in general.
They are all in util.c / util.h
* It's still a contrib module
* No json_agg or json_object functions for constructing arrays / objects

The utility functions and their potential to collide with themselves
in the future is the main problem with this patch. Of course, this
problem could be sidestepped simply by namespacifying them (prepending
json_ to all the function names). I would like some thoughts and
opinions about the design and usefulness of the utility code.

An overview, along with my thoughts, of the utility functions:

FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT macros
* Useful-ometer: ()--------------------o
* Rationale: Using fcinfo->flinfo->fn_extra takes a lot of
boilerplate. These macros help cut down the boilerplate, and the
comment explains what fn_extra is all about.

TypeInfo structure and getTypeInfo function
* Useful-ometer: ()---------------------------o
* Rationale: The get_type_io_data "six-fer" function is very
cumbersome to use, since one has to declare all the output variables.
The getTypeInfo puts the results in a structure. It also performs the
fmgr_info_cxt step, which is a step done after every usage of
get_type_io_data in the PostgreSQL code.

getEnumLabelOids
* Useful-ometer: ()-----------------------------------o
* Rationale: There is currently no streamlined way to return a custom
enum value from a PostgreSQL function written in C. This function
performs a batch lookup of enum OIDs, which can then be cached with
fn_extra. This should be reasonably efficient, and it's quite elegant
to use (see json_op.c for an example).

UTF-8 functions:
utf8_substring
utf8_decode_char
(there's a patch in the works for a utf8_to_unicode function
which does the same thing as this function)
utf8_validate (variant of pg_verify_mbstr(PG_UTF8, str, length, true)
that allows '\0' characters)
server_to_utf8
utf8_to_server
text_to_utf8_cstring
utf8_cstring_to_text
utf8_cstring_to_text_with_len
* Useful-ometer: ()-------o
* Rationale: The JSON code primarily operates in UTF-8 rather than
the server encoding because it needs to deal with Unicode escapes, and
there isn't an efficient way to encode/decode Unicode codepoints
to/from the server encoding. These functions make it easy to perform
encoding conversions needed for the JSON datatype. However, they're
not very useful when operating solely in the server encoding, hence
the low usefulometric reading.

As for the JSON datatype support itself, nobody has come out against
making JSON a core datatype rather than a contrib module, so I will
proceed with making it one. I guess this would involve adding entries
to pg_type.h and pg_proc.h . Where would I put the rest of the code?
I guess json_io.c and json_op.c (the PG_FUNCTION_ARGS functions) would
become json.c in src/backend/utils/adt . Where would json.c and
jsonpath.c (JSON encoding/decoding functions and JSONPath
implementation) go?

Are there any other issues with the JSON code I didn't spot?

Thanks,

Joey Adams

Attachment Content-Type Size
json-datatype-wip-02.diff application/octet-stream 173.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-08-10 08:13:22 pg_restore should accept multiple -t switches?
Previous Message Yeb Havinga 2010-08-10 07:53:23 Re: Universal B-tree