Re: jsonb and nested hstore

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb and nested hstore
Date: 2014-02-03 15:22:52
Message-ID: CAHyXU0xGMcUkTv7WPpqeiZXSmBvY29C5JovgVM8EpnhZ+LhVqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 1, 2014 at 4:20 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote:
>> + <para id="functions-json-table">
>> + <xref linkend="functions-json-creation-table"> shows the functions that are
>> + available for creating <type>json</type> values.
>> + (see <xref linkend="datatype-json">)
>> </para>
>>
>> - <table id="functions-json-table">
>> - <title>JSON Support Functions</title>
>> + <indexterm>
>> + <primary>array_to_json</primary>
>> + </indexterm>
>> + <indexterm>
>> + <primary>row_to_json</primary>
>> + </indexterm>
>> + <indexterm>
>> + <primary>to_json</primary>
>> + </indexterm>
>> + <indexterm>
>> + <primary>json_build_array</primary>
>> + </indexterm>
>> + <indexterm>
>> + <primary>json_build_object</primary>
>> + </indexterm>
>> + <indexterm>
>> + <primary>json_object</primary>
>> + </indexterm>
>
> Hm, why are you collecting the indexterms at the top in the contrast to
> the previous way of collecting them at the point of documentation?
>
>> diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
>> index 1ae9fa0..fd93d9b 100644
>> --- a/src/backend/utils/adt/Makefile
>> +++ b/src/backend/utils/adt/Makefile
>> @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \
>> tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
>> tsvector.o tsvector_op.o tsvector_parser.o \
>> txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \
>> - rangetypes_typanalyze.o rangetypes_selfuncs.o
>> + rangetypes_typanalyze.o rangetypes_selfuncs.o \
>> + jsonb.o jsonb_support.o
>
> Odd, most OBJS lines are kept in alphabetical order, but that doesn't
> seem to be the case here.
>
>> +/*
>> + * for jsonb we always want the de-escaped value - that's what's in token
>> + */
>> +
>
> strange newline.
>
>> +static void
>> +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype)
>> +{
>> + JsonbInState *_state = (JsonbInState *) state;
>> + JsonbValue v;
>> +
>> + v.size = sizeof(JEntry);
>> +
>> + switch (tokentype)
>> + {
>> +
> ...
>
>> + default: /* nothing else should be here in fact */
>> + break;
>
> Shouldn't this at least Assert(false) or something?
>
>> +static void
>> +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c)
>> +{
>> + uint32 hentry = c & JENTRY_TYPEMASK;
>> +
>> + if (hentry == JENTRY_ISNULL)
>> + {
>> + v->type = jbvNull;
>> + v->size = sizeof(JEntry);
>> + }
>> + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || hentry == JENTRY_ISCALAR)
>> + {
>> + recvJsonb(buf, v, level + 1, (uint32) c);
>> + }
>> + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE)
>> + {
>> + v->type = jbvBool;
>> + v->size = sizeof(JEntry);
>> + v->boolean = (hentry == JENTRY_ISFALSE) ? false : true;
>> + }
>> + else if (hentry == JENTRY_ISNUMERIC)
>> + {
>> + v->type = jbvNumeric;
>> + v->numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, PointerGetDatum(buf),
>> + Int32GetDatum(0), Int32GetDatum(-1)));
>> +
>> + v->size = sizeof(JEntry) * 2 + VARSIZE_ANY(v->numeric);
>
> What's the *2 here?
>
>> +static void
>> +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header)
>> +{
>> + uint32 hentry;
>> + uint32 i;
>
> This function and recvJsonbValue call each other recursively, afaics
> without any limit, shouldn't they check for the stack depth?
>
>> + hentry = header & JENTRY_TYPEMASK;
>> +
>> + v->size = 3 * sizeof(JEntry);
>
> *3?
>
>> + if (hentry == JENTRY_ISOBJECT)
>> + {
>> + v->type = jbvHash;
>> + v->hash.npairs = header & JB_COUNT_MASK;
>> + if (v->hash.npairs > 0)
>> + {
>> + v->hash.pairs = palloc(sizeof(*v->hash.pairs) * v->hash.npairs);
>> +
>
> Hm, if I understand correctly, we're just allocating JB_COUNT_MASK
> (which is 0x0FFFFFFF) * sizeof(*v->hash.pairs) bytes here, without any
> crosschecks about the actual length of the data? So with a few bytes the
> server can be coaxed to allocate a gigabyte of data?
> Since this immediately calls another input routine, this can be done in
> a nested fashion, quickly OOMing the server.
>
> I think this and several other places really need a bit more input
> sanity checking.
>
>> + for (i = 0; i < v->hash.npairs; i++)
>> + {
>> + recvJsonbValue(buf, &v->hash.pairs[i].key, level, pq_getmsgint(buf, 4));
>> + if (v->hash.pairs[i].key.type != jbvString)
>> + elog(ERROR, "jsonb's key could be only a string");
>
> Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a
> few other places.
>
>> +char *
>> +JsonbToCString(StringInfo out, char *in, int estimated_len)
>> +{
>> + bool first = true;
>> + JsonbIterator *it;
>> + int type;
>> + JsonbValue v;
>> + int level = 0;
>> +
>> + if (out == NULL)
>> + out = makeStringInfo();
>
> Such a behaviour certainly deserves a documentary comment. Generally
> some more functions could use that.
>
>> + while ((type = JsonbIteratorGet(&it, &v, false)) != 0)
>> + {
>> +reout:
>> + switch (type)
>> + {
> ...
>> + {
>> + Assert(type == WJB_BEGIN_OBJECT || type == WJB_BEGIN_ARRAY);
>> + goto reout;
>
> Hrmpf.
>
>> +Datum
>> +jsonb_typeof(PG_FUNCTION_ARGS)
>> +{
> ...
>> +}
>
> Hm, shouldn't that be in jsonfuncs.c?
>
>> diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
>> index a19b222..f1eacc6 100644
>> --- a/src/backend/utils/adt/jsonfuncs.c
>> +++ b/src/backend/utils/adt/jsonfuncs.c
>> @@ -27,6 +27,7 @@
>> #include "utils/builtins.h"
>> #include "utils/hsearch.h"
>> #include "utils/json.h"
>> +#include "utils/jsonb.h"
>> #include "utils/jsonapi.h"
>> #include "utils/lsyscache.h"
>> #include "utils/memutils.h"
>> @@ -51,6 +52,7 @@ static inline Datum get_path_all(PG_FUNCTION_ARGS, bool as_text);
>> static inline text *get_worker(text *json, char *field, int elem_index,
>> char **tpath, int *ipath, int npath,
>> bool normalize_results);
>> +static inline Datum get_jsonb_path_all(PG_FUNCTION_ARGS, bool as_text);
>
> I don't see the point of using PG_FUNCTION_ARGS if you're manually
> calling it like
> + return get_jsonb_path_all(fcinfo, false);
>
> That just makes it harder if someday PG_FUNCTION_ARGS grows a second
> argument or something.
>
>
>> +Datum
>> +jsonb_object_keys(PG_FUNCTION_ARGS)
>> +{
>> + FuncCallContext *funcctx;
>> + OkeysState *state;
>> + int i;
>> +
>> + if (SRF_IS_FIRSTCALL())
>> + {
>> + MemoryContext oldcontext;
>> + Jsonb *jb = PG_GETARG_JSONB(0);
>> + bool skipNested = false;
>> + JsonbIterator *it;
>> + JsonbValue v;
>> + int r = 0;
>> +
>> + if (JB_ROOT_IS_SCALAR(jb))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("cannot call jsonb_object_keys on a scalar")));
>> + else if (JB_ROOT_IS_ARRAY(jb))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("cannot call jsonb_object_keys on an array")));
>> +
>> + funcctx = SRF_FIRSTCALL_INIT();
>> + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
>
> This will detoast 'jb' into the expression context, since
> PG_GETARG_JSONB() is called before the MemoryContextSwitchTo. But that's
> ok since the percall code only deals with ->result, right?
>
>> - /* make these in a sufficiently long-lived memory context */
>> old_cxt = MemoryContextSwitchTo(rsi->econtext->ecxt_per_query_memory);
>
> wh remove that comment?
>
>> +#define JENTRY_ISCALAR (0x10000000 | 0x40000000)
>
> Isn't there an S missing here?
>
>> --- a/contrib/hstore/hstore_compat.c
>> +++ b/contrib/hstore/hstore_compat.c
>> +/*
>> + * New Old version (new not-nested version of hstore, v2 version)
>> + * V2 and v3 (nested) are upward binary compatible. But
>> + * framework was fully changed. Keep here old definitions (v2)
>> + */
>
> That's an, err, interesting sentence. I think referring to old new
> version and stuff is less than helpful. I realize lots of that is
> baggage from existing code, but yet another version doesn't make it
> easier.
>
> I lost my stomach (or maybe it was the glass of red) somewhere in the
> middle, but I think this needs a lot of work. Especially the io code
> doesn't seem ready to me. I'd consider ripping out the send/recv code
> for 9.4, that seems the biggest can of worms. It will still be usable
> without.

Not having type send/recv functions is somewhat dangerous; it can
cause problems for libraries that run everything through the binary
wire format. I'd give jsonb a pass on that, being a new type, but
would be concerned if hstore had that ability revoked.

offhand note: hstore_send seems pretty simply written and clean; it's
a simple nonrecursive iterator...

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-02-03 15:24:41 Re: GIN improvements part2: fast scan
Previous Message Tom Lane 2014-02-03 15:20:57 Re: bgworker crashed or not?