Re: Call for GIST/GIN/SP-GIST opclass documentation

Lists: pgsql-docspgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org, pgsql-docs(at)postgreSQL(dot)org
Subject: Call for GIST/GIN/SP-GIST opclass documentation
Date: 2014-04-08 20:41:06
Message-ID: 27299.1396989666@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

I just created sections in the SGML manual chapters about GIST, GIN, and
SP-GIST to hold documentation about the standard opclasses provided for
them:

http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html

We never had any well-defined place to discuss such opclasses before,
but I think it's past time we did. I envision these sections as places to
document, for example, the difference between the two jsonb gin opclasses.
I put this text in about that:

Of the two operator classes for type jsonb, jsonb_ops is the
default. jsonb_hash_ops supports fewer operators but will work with
larger indexed values than jsonb_ops can support.

Is that accurate? Do we need to say more?

For the moment what's there is mostly just tables of the core opclasses
and the operators they support. If anyone can think of specific additions
that should be there, please send in patches.

(BTW, I didn't worry about btree and hash because they don't have such
a wide variety of opclass behaviors.)

regards, tom lane


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Subject: Re: Call for GIST/GIN/SP-GIST opclass documentation
Date: 2014-04-08 21:03:46
Message-ID: CAM3SWZQGZyqSBzaFZ4gmtnQZpcXLE=ceUL_O29gtMgnP=AvC+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 8, 2014 at 1:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I just created sections in the SGML manual chapters about GIST, GIN, and
> SP-GIST to hold documentation about the standard opclasses provided for
> them:

I think that that's a good idea. I too was bothered by this omission.

> Of the two operator classes for type jsonb, jsonb_ops is the
> default. jsonb_hash_ops supports fewer operators but will work with
> larger indexed values than jsonb_ops can support.
>
> Is that accurate? Do we need to say more?

Well, I'm not sure that it's worth noting there, but as you probably
already know jsonb_hash_ops will perform a lot better than the default
GIN opclass, and will have much smaller indexes. FWIW I think that the
size limitation is overblown, and performance is in fact the
compelling reason to prefer jsonb_hash_ops, although it's probably
incongruous to explain the issues that way in this section of the
docs. It probably suffices that that is covered in the "JSON Types"
section.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Subject: Re: Call for GIST/GIN/SP-GIST opclass documentation
Date: 2014-04-08 21:34:01
Message-ID: 28589.1396992841@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Tue, Apr 8, 2014 at 1:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Of the two operator classes for type jsonb, jsonb_ops is the
>> default. jsonb_hash_ops supports fewer operators but will work with
>> larger indexed values than jsonb_ops can support.
>>
>> Is that accurate? Do we need to say more?

> Well, I'm not sure that it's worth noting there, but as you probably
> already know jsonb_hash_ops will perform a lot better than the default
> GIN opclass, and will have much smaller indexes. FWIW I think that the
> size limitation is overblown, and performance is in fact the
> compelling reason to prefer jsonb_hash_ops, although it's probably
> incongruous to explain the issues that way in this section of the
> docs. It probably suffices that that is covered in the "JSON Types"
> section.

Well, the subtext is whether we should move that discussion to this
new section. I think there is some comparable discussion in the
full-text-indexing chapter, too.

(BTW, wasn't there some discussion of changing our minds about which
one is the default? We already have one bug report complaining about
jsonb_ops' size restriction, so that seems to be evidence in favor
of changing ...)

regards, tom lane


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Subject: Re: Call for GIST/GIN/SP-GIST opclass documentation
Date: 2014-04-08 21:39:29
Message-ID: CAM3SWZRAxisGCdODhbgeiNL6tayqL18XMzun4W3xesASmKfwDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> (BTW, wasn't there some discussion of changing our minds about which
> one is the default? We already have one bug report complaining about
> jsonb_ops' size restriction, so that seems to be evidence in favor
> of changing ...)

Yes, there was. I very nearly came down on the side of making
jsonb_hash_ops the default, but given that it doesn't make all
operators indexable, I ultimately decided against supporting that
course of action. I thought that that would be an odd limitation for
the default GIN opclass to have. It was a very close call in my mind,
and if you favor changing the default now, in light of the few
complaints we've heard, I think that's a reasonable decision. That
said, as I noted in the main -bugs thread, the case presented is
fairly atypical.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-08 21:46:22
Message-ID: 29030.1396993582@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> (BTW, wasn't there some discussion of changing our minds about which
>> one is the default? We already have one bug report complaining about
>> jsonb_ops' size restriction, so that seems to be evidence in favor
>> of changing ...)

> Yes, there was. I very nearly came down on the side of making
> jsonb_hash_ops the default, but given that it doesn't make all
> operators indexable, I ultimately decided against supporting that
> course of action. I thought that that would be an odd limitation for
> the default GIN opclass to have. It was a very close call in my mind,
> and if you favor changing the default now, in light of the few
> complaints we've heard, I think that's a reasonable decision. That
> said, as I noted in the main -bugs thread, the case presented is
> fairly atypical.

Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not

If that's an accurate summary then I would say that we've got
the default backwards. I would much rather tell people "you
can have more operators supported, but here are the tradeoffs"
than have a default that fails under evidently-not-so-improbable
cases.

regards, tom lane


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-08 21:57:44
Message-ID: CAM3SWZQzOxmryVe_bdu9WhqbJTBoiqM7knvKm0oRYxvfsFnz7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Well, let me see if I understand the situation correctly:
>
> * jsonb_ops supports more operators
>
> * jsonb_hash_ops produces smaller, better-performing indexes
>
> * jsonb_ops falls over on inputs with wide field values, but
> jsonb_hash_ops does not

There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.

--
Peter Geoghegan


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-08 22:18:31
Message-ID: 534475B7.6020908@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers


On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
> On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, let me see if I understand the situation correctly:
>>
>> * jsonb_ops supports more operators
>>
>> * jsonb_hash_ops produces smaller, better-performing indexes
>>
>> * jsonb_ops falls over on inputs with wide field values, but
>> jsonb_hash_ops does not
> There might be some compelling cases for indexing existence rather
> than containment, since the recheck flag isn't set there, but in
> general this summary seems sound. I would say that broadly, existence
> is a less useful operator than containment, and so jsonb_hash_ops is
> broadly more compelling. I didn't propose changing the default due to
> concerns about the POLA, but I'm happy to be told that those concerns
> were out of proportion to the practical benefits of a different
> default.
>

I tend to agree with Tom that POLA will be more violated by the default
ops class not being able to index some values.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 02:48:32
Message-ID: 2742.1397011712@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
>> ... I didn't propose changing the default due to
>> concerns about the POLA, but I'm happy to be told that those concerns
>> were out of proportion to the practical benefits of a different
>> default.

> I tend to agree with Tom that POLA will be more violated by the default
> ops class not being able to index some values.

We should wait a bit longer to see if anyone objects, but assuming that
this represents the consensus opinion ...

ISTM that the name "jsonb_ops" should have pride of place as the default
jsonb opclass. Therefore, if we make this change, jsonb_hash_ops needs to
be renamed to jsonb_ops, and we need a new name for what is now jsonb_ops.
I haven't paid attention to the technical details of the differences so
I have no idea what to suggest for the new name. Thoughts?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, pgsql-docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: Call for GIST/GIN/SP-GIST opclass documentation
Date: 2014-04-09 04:23:43
Message-ID: CA+TgmoYSELVHXdFweJgTfHGK65iFSkz+3bN63_y67BZEa-Fwwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I just created sections in the SGML manual chapters about GIST, GIN, and
> SP-GIST to hold documentation about the standard opclasses provided for
> them:
>
> http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
> http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
> http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html
>
> We never had any well-defined place to discuss such opclasses before,
> but I think it's past time we did.

+1. Great idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 06:37:24
Message-ID: 5344EAA4.1050605@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 04/09/2014 01:18 AM, Andrew Dunstan wrote:
>
> On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
>> On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Well, let me see if I understand the situation correctly:
>>>
>>> * jsonb_ops supports more operators
>>>
>>> * jsonb_hash_ops produces smaller, better-performing indexes
>>>
>>> * jsonb_ops falls over on inputs with wide field values, but
>>> jsonb_hash_ops does not
>> There might be some compelling cases for indexing existence rather
>> than containment, since the recheck flag isn't set there, but in
>> general this summary seems sound. I would say that broadly, existence
>> is a less useful operator than containment, and so jsonb_hash_ops is
>> broadly more compelling. I didn't propose changing the default due to
>> concerns about the POLA, but I'm happy to be told that those concerns
>> were out of proportion to the practical benefits of a different
>> default.
>
> I tend to agree with Tom that POLA will be more violated by the default
> ops class not being able to index some values.

Yeah.

<rant>

Both of the operator classes are actually much less flexible than I'd
like. Firstly, they index everything. In many cases, that's not what you
want, so you end up with much larger indexes than necessary. Secondly,
jsonb_ops indexes all values separately from the keys. That makes the
index pretty much useless for a query on, say, WHERE json @>
'{"needs_processing":true}', if all the rows also contain a key-value
pair "active":true. Thirdly, inequality operators are not supported; you
can't search for rows with (the json-syntax equivalent of) "price <
12.3". Fourthly, sometimes you would want to include the "path" to an
entry in the key, sometimes not.

If I understood correctly the way jsonb_hash_ops works, the limitation
compared to jsonb_ops is that it cannot be used for foo ? 'bar' type
queries. And the reason for that limitation is that it hashes the whole
path to the key; the naked values are not indexes separately. But why
not? jsonb_ops does - why is that decision related to whether you hash
or not? Or it could index both. Sure, it would be wasteful when you
don't need to support foo ? 'bar', but the point is that it should be up
to the DBA to decide, based on his needs.

As the code stands, you don't have a choice on any of those things. The
decisions have been made by us, PostgreSQL developers. The only choice
you have is between jsonb_ops and jsonb_hash_ops, with a strange
combination of tradeoffs in both. Sure, they're still useful, if not
optimal, for a wide-range of applications. For more complicated cases,
you will have to resort to expression indexes. It bugs me greatly that
the underlying indexam could do all those things, we're just not
exposing the capability.

ISTM we need a way to parameterize opclasses, so that when you create
the index, you specify the above things.

</rant>

The ship has cleatly sailed to add parameterized opclasses to 9.4, but
let's keep it in mind when we decide on the defaults.

In the absence of parameterizable opclasses, it would be much more
flexible to have opclasses that index, keys, values, key-value pairs and
paths separately, instead of the current json_ops and json_hash_ops
opclasses which index all of those in the same index. That way, if you
only e.g. ever query on the existence of a key, you'd only need to index
the keys.

I don't understand how we ended up with the current dichotomy of
json_ops and json_hash_ops...

- Heikki


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 07:40:45
Message-ID: CAM3SWZS-12EHfqvDzTbtkEsQHfVLJxx8XP1=jrCXK2wh9uKW-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> As the code stands, you don't have a choice on any of those things. The
> decisions have been made by us, PostgreSQL developers. The only choice you
> have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
> tradeoffs in both. Sure, they're still useful, if not optimal, for a
> wide-range of applications. For more complicated cases, you will have to
> resort to expression indexes. It bugs me greatly that the underlying indexam
> could do all those things, we're just not exposing the capability.

Why would you ever not have to use expression indexes? Idiomatic usage
of jsonb involves expression indexes because it's desirable to index
only a expression. People will want to do things like only index the
nested "tags" array far more frequently then they'll only want to
index keys (that is, Object pair keys) in the entire document. I don't
get why you'd say that they'd "resort" to expression indexes, like
they're a kludge. Have you ever tried out one of the new document
databases? I suggest you do. Expression indexes on jsonb map pretty
closely onto how you're frequently expected to index data in those
systems. That's something that they make heavy use of. Why would you
ever not really have to consider ahead of time what is important
enough to be indexed, and what is not?

> ISTM we need a way to parameterize opclasses, so that when you create the
> index, you specify the above things.

That would be nice.

> In the absence of parameterizable opclasses, it would be much more flexible
> to have opclasses that index, keys, values, key-value pairs and paths
> separately, instead of the current json_ops and json_hash_ops opclasses
> which index all of those in the same index. That way, if you only e.g. ever
> query on the existence of a key, you'd only need to index the keys.

I think only ever needing to index the keys is not a common use-case.
It pretty much makes exactly as much sense to do so as it would with
hstore, and yet hstore doesn't support that after all these years.

> I don't understand how we ended up with the current dichotomy of json_ops
> and json_hash_ops...

It makes sense if you consider jsonb_ops best suited to simpler
hstore-style indexing, while jsonb_hash_ops is best suited to testing
containment of JSON documents, potentially with lots of nesting. These
documents are typically homogeneous in structure. Idiomatic usage of
systems like MongoDB involves "collections" of fairly homogeneous
documents. If there is a lot of variability in their structure within
a collection, the collection more or less becomes impossible to
usefully query. They aim to be flexible, but still implicitly require
you to insert data with a half-way sensible/consistent structure. This
makes separately indexing the keys less than compelling as a default,
because there is so much duplication of keys in practice.

--
Peter Geoghegan


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 08:21:26
Message-ID: 53450306.8040104@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 04/09/2014 10:40 AM, Peter Geoghegan wrote:
> On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> As the code stands, you don't have a choice on any of those things. The
>> decisions have been made by us, PostgreSQL developers. The only choice you
>> have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
>> tradeoffs in both. Sure, they're still useful, if not optimal, for a
>> wide-range of applications. For more complicated cases, you will have to
>> resort to expression indexes. It bugs me greatly that the underlying indexam
>> could do all those things, we're just not exposing the capability.
>
> Why would you ever not have to use expression indexes? Idiomatic usage
> of jsonb involves expression indexes because it's desirable to index
> only a expression. People will want to do things like only index the
> nested "tags" array far more frequently then they'll only want to
> index keys (that is, Object pair keys) in the entire document. I don't
> get why you'd say that they'd "resort" to expression indexes, like
> they're a kludge.

Expression indexes are definitely nice, but you have to be careful to
formulate the query in exactly the same way to match the index.

> Have you ever tried out one of the new document
> databases? I suggest you do. Expression indexes on jsonb map pretty
> closely onto how you're frequently expected to index data in those
> systems. That's something that they make heavy use of. Why would you
> ever not really have to consider ahead of time what is important
> enough to be indexed, and what is not?

I didn't say that. On the contrary, I think the shotgun approach
jsonb_ops and jsonb_hash_ops take is too broad. It should be possible to
specify what to index in a more detailed fashion.

- Heikki


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 08:40:39
Message-ID: CAM3SWZQbjqippSdQU2RHFTOe_SyoMcH_MOCcUeYPA0NWPjYqLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> I didn't say that. On the contrary, I think the shotgun approach jsonb_ops
> and jsonb_hash_ops take is too broad. It should be possible to specify what
> to index in a more detailed fashion.

It is - use an expression index. That's by far the most important way
to specify what to index in a more detailed fashion. There are others,
but that's the major one. Beyond that, yes, it's necessary to
carefully write your query predicate a certain way. However, a similar
situation exists in MongoDB, where there is a distinction between
"Indexes on embedded fields" (which must be accessed using special
"dot notation") and "indexes on subdocuments" (which cannot be
accessed using "dot notation"). It's late here, but I'm pretty sure
that's a feature and not a limitation.

--
Peter Geoghegan


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 14:57:50
Message-ID: CAPpHfdtFpdYQXFJP+EnZMjfdz-1ARaaAJgHZX+-pdHyyjuMS-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 9, 2014 at 10:37 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> The ship has cleatly sailed to add parameterized opclasses to 9.4, but
> let's keep it in mind when we decide on the defaults.
>
> In the absence of parameterizable opclasses, it would be much more
> flexible to have opclasses that index, keys, values, key-value pairs and
> paths separately, instead of the current json_ops and json_hash_ops
> opclasses which index all of those in the same index. That way, if you only
> e.g. ever query on the existence of a key, you'd only need to index the
> keys.
>
> I don't understand how we ended up with the current dichotomy of json_ops
> and json_hash_ops...

+1 for parameterizable opclasses

------
With best regards,
Alexander Korotkov.


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 15:12:31
Message-ID: CAPpHfduitYNNDA8jh6a2qYyxuxROyBeobHsBuqZf0LaMKqMHJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 9, 2014 at 12:40 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
> > I didn't say that. On the contrary, I think the shotgun approach
> jsonb_ops
> > and jsonb_hash_ops take is too broad. It should be possible to specify
> what
> > to index in a more detailed fashion.
>
> It is - use an expression index. That's by far the most important way
> to specify what to index in a more detailed fashion. There are others,
> but that's the major one. Beyond that, yes, it's necessary to
> carefully write your query predicate a certain way. However, a similar
> situation exists in MongoDB, where there is a distinction between
> "Indexes on embedded fields" (which must be accessed using special
> "dot notation") and "indexes on subdocuments" (which cannot be
> accessed using "dot notation"). It's late here, but I'm pretty sure
> that's a feature and not a limitation.
>

I believe that serious limitation we now have is that we actually specify
kind of index to be used in the SQL query.
For example you need to find objects with active = true. You can write:

js @> {"active": true}

then GIN index on js can be used. Also you can write:

js->'active' = true

then btree expression index on (js->'active') can be used. For sure, one
can do

js @> {"active": true} AND js->'active' = true

This query can use any of indexes, but it is:
1) Cluge
2) Excess recheck
3) If both indexes present, excess "bitmap and".

Having to choose index in SQL-query we make our SQL more imperative and
less declarative. Similar things can happen without json/hstore (user have
to rewrite SQL in order to use expression index), but now it could become
very common. My opinion is that we have to do something in planner to make
it understand at least this two kinds of queries to be equivalent.

------
With best regards,
Alexander Korotkov.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 15:16:13
Message-ID: CA+TgmoZucoe-w5P_16A-gdfRBFCCu7PD17YO-KqrYPBhb5TxAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> Both of the operator classes are actually much less flexible than I'd like.
> Firstly, they index everything. In many cases, that's not what you want, so
> you end up with much larger indexes than necessary. Secondly, jsonb_ops
> indexes all values separately from the keys. That makes the index pretty
> much useless for a query on, say, WHERE json @> '{"needs_processing":true}',
> if all the rows also contain a key-value pair "active":true. Thirdly,
> inequality operators are not supported; you can't search for rows with (the
> json-syntax equivalent of) "price < 12.3". Fourthly, sometimes you would
> want to include the "path" to an entry in the key, sometimes not.

Maybe we should make *neither* of these the default opclass, and give
*neither* the name json_ops.

> ISTM we need a way to parameterize opclasses, so that when you create the
> index, you specify the above things.

Yeah, that would be great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 15:24:16
Message-ID: 30137.1397057056@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> Both of the operator classes are actually much less flexible than I'd like.

> Maybe we should make *neither* of these the default opclass, and give
> *neither* the name json_ops.

There's definitely something to be said for that. Default opclasses are
sensible when there's basically only one behavior that's interesting for
most people. We can already see that that's not going to be the case
for jsonb indexes, at least not with the currently available alternatives.

Not having a default would force users to make decisions explicitly.
Is that what we want?

One other point here is that non-default opclasses can't be used in
UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't
matter here since these aren't btree opclasses, but is there a
use-case for EXCLUDE with any of the supported jsonb operators?

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 15:56:33
Message-ID: 20140409155633.GW5822@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Tom Lane wrote:

> > On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
> > <hlinnakangas(at)vmware(dot)com> wrote:
> >> Both of the operator classes are actually much less flexible than I'd like.
>
> > Maybe we should make *neither* of these the default opclass, and give
> > *neither* the name json_ops.

+1. I was thinking the same thing after reading Heikki's rant.

> One other point here is that non-default opclasses can't be used in
> UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
> specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't
> matter here since these aren't btree opclasses, but is there a
> use-case for EXCLUDE with any of the supported jsonb operators?

That sounds like an oversight that could better be fixed in EXCLUDE, no?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, pgsql-docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: Call for GIST/GIN/SP-GIST opclass documentation
Date: 2014-04-09 16:03:36
Message-ID: 20140409160336.GX5822@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Robert Haas wrote:
> On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I just created sections in the SGML manual chapters about GIST, GIN, and
> > SP-GIST to hold documentation about the standard opclasses provided for
> > them:
> >
> > http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
> > http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
> > http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html
> >
> > We never had any well-defined place to discuss such opclasses before,
> > but I think it's past time we did.
>
> +1. Great idea.

Agreed.

I find that in my browser it's a bit difficult to make out the different
operators in the long list; ISTM a single space between operators isn't
enough. I wonder how can we do better; sticking commas (or any other
single character, really) between them is not likely to improve matters;
and using one column per operator is not going to work very well. Maybe
if it were possible to split a single cell in several sub-cells; or
perhaps there's a way to specify two whitespace units, or a long space,
or something like that.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 16:04:35
Message-ID: 30993.1397059475@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Tom Lane wrote:
>> One other point here is that non-default opclasses can't be used in
>> UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
>> specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't
>> matter here since these aren't btree opclasses, but is there a
>> use-case for EXCLUDE with any of the supported jsonb operators?

> That sounds like an oversight that could better be fixed in EXCLUDE, no?

Well, there hasn't been a use-case up to now. I'm not sure there's
one yet.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, pgsql-docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: Call for GIST/GIN/SP-GIST opclass documentation
Date: 2014-04-09 16:18:51
Message-ID: 31266.1397060331@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Robert Haas wrote:
>> On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
>>> http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
>>> http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html

> I find that in my browser it's a bit difficult to make out the different
> operators in the long list; ISTM a single space between operators isn't
> enough.

Yeah, I noticed that too, but wasn't sure what to do about it. One line
per operator isn't better, but I don't know how to get a bit more
horizontal space into the lists.

regards, tom lane


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 16:51:43
Message-ID: CAM3SWZQ2Pd9d7Ae3BbjyqZ5=FD+3FMaNcer9bG=xHAufbsqdPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 9, 2014 at 8:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Maybe we should make *neither* of these the default opclass, and give
>> *neither* the name json_ops.
>
> There's definitely something to be said for that. Default opclasses are
> sensible when there's basically only one behavior that's interesting for
> most people. We can already see that that's not going to be the case
> for jsonb indexes, at least not with the currently available alternatives.

I've heard worse ideas.

--
Peter Geoghegan


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 18:22:54
Message-ID: CAM-w4HNawF2sPDSUrZsng7zgGnste-Vy1+42AufyVq9UkVeCZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 9, 2014 at 11:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Maybe we should make *neither* of these the default opclass, and give
>> *neither* the name json_ops.
>
> There's definitely something to be said for that. Default opclasses are
> sensible when there's basically only one behavior that's interesting for
> most people. We can already see that that's not going to be the case
> for jsonb indexes, at least not with the currently available alternatives.
>
> Not having a default would force users to make decisions explicitly.
> Is that what we want?

I don't like the idea of having no default opclass. I think there's a
huge usability gain in being able to "just" create an index on a
column and have it do something reasonable for most use cases.

I can get behind the idea of having separate index opclasses for paths
and path-value pairs but I suspect the default should just be to index
both in the same index. If we can have one default index opclass that
supports containment and existence and then other opclasses that are
smaller but only support a subset of the operators that would seem
like the best compromise.

I'm a bit confused by Heikki's list though. I would expect path and
path-value pair to be the only useful ones. I'm not clear what an
index on keys or key-value would be -- it would index just the
top-level keys and values without recursing?

--
greg


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-22 22:32:30
Message-ID: 20140422223230.GL10046@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 9, 2014 at 02:22:54PM -0400, Greg Stark wrote:
> On Wed, Apr 9, 2014 at 11:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Maybe we should make *neither* of these the default opclass, and give
> >> *neither* the name json_ops.
> >
> > There's definitely something to be said for that. Default opclasses are
> > sensible when there's basically only one behavior that's interesting for
> > most people. We can already see that that's not going to be the case
> > for jsonb indexes, at least not with the currently available alternatives.
> >
> > Not having a default would force users to make decisions explicitly.
> > Is that what we want?
>
> I don't like the idea of having no default opclass. I think there's a
> huge usability gain in being able to "just" create an index on a
> column and have it do something reasonable for most use cases.
>
> I can get behind the idea of having separate index opclasses for paths
> and path-value pairs but I suspect the default should just be to index
> both in the same index. If we can have one default index opclass that
> supports containment and existence and then other opclasses that are
> smaller but only support a subset of the operators that would seem
> like the best compromise.
>
> I'm a bit confused by Heikki's list though. I would expect path and
> path-value pair to be the only useful ones. I'm not clear what an
> index on keys or key-value would be -- it would index just the
> top-level keys and values without recursing?

Where are we on the default JSONB opclass change?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-22 22:40:51
Message-ID: CAM3SWZRbVyijRX0vwhfap1S1o0OM1eGczJsQuL4CCrAs1++z7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 22, 2014 at 3:32 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Where are we on the default JSONB opclass change?

FWIW, I still don't have any strong opinion here. I defer to others on
this question.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-23 00:50:20
Message-ID: 16527.1398214220@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Where are we on the default JSONB opclass change?

Not sure. I'm for changing it, I think, but it wasn't at all clear
that we had consensus on that. We did not have a proposed new name
for the opclass either ...

regards, tom lane


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-23 08:20:42
Message-ID: 535777DA.3000203@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 23/04/14 00:40, Peter Geoghegan wrote:
> On Tue, Apr 22, 2014 at 3:32 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Where are we on the default JSONB opclass change?
>
> FWIW, I still don't have any strong opinion here. I defer to others on
> this question.
>

I vote for changing it, even though neither option is ideal I think that
given the nature of datatype the current default will break inserts for
common usage pattern and that's much worse than not being able to use
the index for some operators.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-23 10:56:30
Message-ID: 20140423105630.GQ10046@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 23, 2014 at 10:20:42AM +0200, Petr Jelinek wrote:
> On 23/04/14 00:40, Peter Geoghegan wrote:
> >On Tue, Apr 22, 2014 at 3:32 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >>Where are we on the default JSONB opclass change?
> >
> >FWIW, I still don't have any strong opinion here. I defer to others on
> >this question.
> >
>
> I vote for changing it, even though neither option is ideal I think
> that given the nature of datatype the current default will break
> inserts for common usage pattern and that's much worse than not
> being able to use the index for some operators.

I agree. We should choose the most general option as the default.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-23 12:17:49
Message-ID: CAB7nPqRGyBfCK9GKr-cqna02Mr5JnqaodQD=q54xttcZn82zRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Wed, Apr 23, 2014 at 7:56 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Wed, Apr 23, 2014 at 10:20:42AM +0200, Petr Jelinek wrote:
> > I vote for changing it, even though neither option is ideal I think
> > that given the nature of datatype the current default will break
> > inserts for common usage pattern and that's much worse than not
> > being able to use the index for some operators.
>
> I agree. We should choose the most general option as the default.
>
+1. Less operators are supported by the now-named jsonb_hash_ops but at
least users won't be surprised by failures caused by too long index records.
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-23 13:50:50
Message-ID: 21608.1398261050@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Wed, Apr 23, 2014 at 10:20:42AM +0200, Petr Jelinek wrote:
>> On 23/04/14 00:40, Peter Geoghegan wrote:
>>> On Tue, Apr 22, 2014 at 3:32 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>>> Where are we on the default JSONB opclass change?

>>> FWIW, I still don't have any strong opinion here. I defer to others on
>>> this question.

>> I vote for changing it, even though neither option is ideal I think
>> that given the nature of datatype the current default will break
>> inserts for common usage pattern and that's much worse than not
>> being able to use the index for some operators.

> I agree. We should choose the most general option as the default.

That seems to be the consensus, but now we need a name for the
soon-to-be-not-default opclass. What's a good short adjective for it?

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-23 14:02:51
Message-ID: 20140423140251.GL25695@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Wed, Apr 23, 2014 at 10:20:42AM +0200, Petr Jelinek wrote:
> >> On 23/04/14 00:40, Peter Geoghegan wrote:
> >>> On Tue, Apr 22, 2014 at 3:32 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >>>> Where are we on the default JSONB opclass change?
>
> >>> FWIW, I still don't have any strong opinion here. I defer to others on
> >>> this question.
>
> >> I vote for changing it, even though neither option is ideal I think
> >> that given the nature of datatype the current default will break
> >> inserts for common usage pattern and that's much worse than not
> >> being able to use the index for some operators.
>
> > I agree. We should choose the most general option as the default.
>
> That seems to be the consensus, but now we need a name for the
> soon-to-be-not-default opclass. What's a good short adjective for it?

"comprehensive"? Not particularly short ...

According to Merriam Webster:
Synonyms
all-embracing, all-in [chiefly British], all-inclusive,
broad-gauge (or broad-gauged), compendious, complete,
encyclopedic, cover-all, cyclopedic, embracive, exhaustive,
full, global, inclusive, in-depth, omnibus, panoramic, thorough,
universal

Related Words
broad, catholic, encyclical, general, inclusionary, overall;
cosmic (also cosmical), extensive, far, far-reaching, grand,
large, panoptic, sweeping, vast, wide, wide-ranging; blanket,
indiscriminate, unrestricted

jsonb_omnibus_ops ?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-23 14:17:48
Message-ID: 22285.1398262668@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Tom Lane wrote:
>> That seems to be the consensus, but now we need a name for the
>> soon-to-be-not-default opclass. What's a good short adjective for it?

> "comprehensive"? Not particularly short ...

> According to Merriam Webster:
> Synonyms
> all-embracing, all-in [chiefly British], all-inclusive,
> broad-gauge (or broad-gauged), compendious, complete,
> encyclopedic, cover-all, cyclopedic, embracive, exhaustive,
> full, global, inclusive, in-depth, omnibus, panoramic, thorough,
> universal

> Related Words
> broad, catholic, encyclical, general, inclusionary, overall;
> cosmic (also cosmical), extensive, far, far-reaching, grand,
> large, panoptic, sweeping, vast, wide, wide-ranging; blanket,
> indiscriminate, unrestricted

> jsonb_omnibus_ops ?

hm ... jsonb_full_ops seems nicely short, but on the other hand it just
begs the question "full what?". I'm a bit worried about future-proof-ness
too; what if somebody later comes up with a new opclass that indexes more
operators? We'd end up calling it jsonb_fuller_ops, ick.

I was kind of hoping for a technical adjective, like "hash" is for the
soon-to-be-default opclass. What is it about this opclass that
distinguishes it from other indexing approaches that someone might try?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 20:10:49
Message-ID: 20140506201048.GI30817@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 22, 2014 at 08:50:20PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Where are we on the default JSONB opclass change?
>
> Not sure. I'm for changing it, I think, but it wasn't at all clear
> that we had consensus on that. We did not have a proposed new name
> for the opclass either ...

Where are we on this question?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 20:18:50
Message-ID: 16769.1399407530@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Tue, Apr 22, 2014 at 08:50:20PM -0400, Tom Lane wrote:
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>> Where are we on the default JSONB opclass change?

>> Not sure. I'm for changing it, I think, but it wasn't at all clear
>> that we had consensus on that. We did not have a proposed new name
>> for the opclass either ...

> Where are we on this question?

Stuck on the naming question. I'd be willing to do the patch legwork
if we had a consensus (or even a proposal) for what to rename the
current jsonb_ops to.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 21:20:20
Message-ID: 20140506212020.GK30817@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, May 6, 2014 at 04:18:50PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Tue, Apr 22, 2014 at 08:50:20PM -0400, Tom Lane wrote:
> >> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> >>> Where are we on the default JSONB opclass change?
>
> >> Not sure. I'm for changing it, I think, but it wasn't at all clear
> >> that we had consensus on that. We did not have a proposed new name
> >> for the opclass either ...
>
> > Where are we on this question?
>
> Stuck on the naming question. I'd be willing to do the patch legwork
> if we had a consensus (or even a proposal) for what to rename the
> current jsonb_ops to.

Well, then, we only have a few days to come up with a name.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 21:31:23
Message-ID: 57E8AA44-F816-45F2-BB61-5A854FFB0A97@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On May 6, 2014, at 2:20 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

>> Stuck on the naming question. I'd be willing to do the patch legwork
>> if we had a consensus (or even a proposal) for what to rename the
>> current jsonb_ops to.
>
> Well, then, we only have a few days to come up with a name.

What are the options?

D


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 22:20:53
Message-ID: 28554.1399414853@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

"David E. Wheeler" <david(at)justatheory(dot)com> writes:
> On May 6, 2014, at 2:20 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Well, then, we only have a few days to come up with a name.

> What are the options?

We have no proposals as yet.

I've been looking at the source code to try to understand the difference
between the two opclasses (and BTW I concur with the opinions expressed
recently about the poor state of the internal documentation for jsonb).
If I've got it straight:

jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}"
would give rise to GIN entries that are effectively the strings "Kxyz"
and "V2". If you're looking for tuples containing "{xyz: 2}" then you
would be looking for the AND of those independent index entries, which
fortunately GIN is pretty good at computing. But you could also look
for just keys or just values.

jsonb_hash_ops creates an index entry only for values, but what it
stores is a hash of both the value and the key it's stored under.
So in this example you'd get a hash combining "xyz" and "2". This
means the only type of query you can perform is like "find JSON tuples
containing {xyz: 2}".

Because jsonb_ops stores the *whole* value, you can do lossless index
searches (no recheck needed on the heap tuple), but you also run the
risk of long strings failing to fit into an index entry. Since jsonb_ops
reduces everything to a hash, there's no possibility of index failure,
but all queries are lossy and require recheck.

TBH, at this point I'm sort of agreeing with the thought expressed
upthread that maybe neither of these should be the default as-is.
They seem like rather arbitrary combinations of choices. In particular
I wonder why there's not an option to store keys and values separately,
but as hashes not as the original strings, so that indexability of
everything could be guaranteed. Or a variant of that might be to hash
only strings that are too large to fit in an index entry, and force
recheck only when searching for a string that needed hashing.

I wonder whether the most effective use of time at this point
wouldn't be to fix jsonb_ops to do that, rather than arguing about
what to rename it to. If it didn't have the failure-for-long-strings
problem I doubt anybody would be unhappy about making it the default.

regards, tom lane


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 22:29:40
Message-ID: CAM3SWZS4Tn-xZOBqM_ds-nhB2wLSpKG86rcJnMSexqDJqWgzNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, May 6, 2014 at 3:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wonder whether the most effective use of time at this point
> wouldn't be to fix jsonb_ops to do that, rather than arguing about
> what to rename it to. If it didn't have the failure-for-long-strings
> problem I doubt anybody would be unhappy about making it the default.

I would expect the selectivity of keys on their own to be very low
with idiomatic usage of jsonb. Typically, every row in a table will
have almost the same keys. The current default opclass makes more
sense for when that isn't the case.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 22:39:37
Message-ID: 29011.1399415977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Tue, May 6, 2014 at 3:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I wonder whether the most effective use of time at this point
>> wouldn't be to fix jsonb_ops to do that, rather than arguing about
>> what to rename it to. If it didn't have the failure-for-long-strings
>> problem I doubt anybody would be unhappy about making it the default.

> I would expect the selectivity of keys on their own to be very low
> with idiomatic usage of jsonb. Typically, every row in a table will
> have almost the same keys. The current default opclass makes more
> sense for when that isn't the case.

Meh. I would not think that that represents effective use of JSON:
if the rows are all the same, why aren't you exposing that structure
as regular SQL columns? IMHO, the value of JSON fields within a SQL
table is to deal with data that is not so well structured.

In any case, it was certainly the complaint that insertions might
fail altogether that made me (and I assume others) want to not have
jsonb_ops as the default opclass. Is there a good reason not to
fix that limitation while we still can?

regards, tom lane


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 22:54:19
Message-ID: 5EE1519A-7DF7-4EB0-B7EA-5492F5F53C63@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On May 6, 2014, at 3:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Meh. I would not think that that represents effective use of JSON:
> if the rows are all the same, why aren't you exposing that structure
> as regular SQL columns? IMHO, the value of JSON fields within a SQL
> table is to deal with data that is not so well structured.

The use of JSON will not be ideal -- not in this sense. For example, at $work, we’re using it in place of an EAV model. Hence most rows have the same keys (or a subset of known keys). Or think of your favorite JSON API: every call to http://api.pgxn.org/user/$username.json is going to have a very similar structure.

> In any case, it was certainly the complaint that insertions might
> fail altogether that made me (and I assume others) want to not have
> jsonb_ops as the default opclass. Is there a good reason not to
> fix that limitation while we still can?

Fixing++

David


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-06 23:01:16
Message-ID: CAM3SWZSA9jDae11p6wMxap9MiGD_C7X5JSjqiFSr-5NcKJb4Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, May 6, 2014 at 3:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Meh. I would not think that that represents effective use of JSON:
> if the rows are all the same, why aren't you exposing that structure
> as regular SQL columns? IMHO, the value of JSON fields within a SQL
> table is to deal with data that is not so well structured.

I used to think that. That actually isn't what people want from a JSON
type, though. People want a flexible data model, but they still
realize that if within a table/"collection" everything is totally
heterogeneous, it becomes impossible to effectively query. They don't
want to run migrations. Or, maybe they are consuming JSON from a
third-party API, and have no control over the schema, even though it
is really is a schema (already represented as JSON, making jsonb a
compelling representation) -- that's a very common use case. It's much
more compelling to store semi-structured data as JSON. Totally
unstructured data just isn't that interesting.

Don't take my word for it, though. See
http://docs.mongodb.org/manual/data-modeling, for example. There is an
implicit assumption throughout that most documents within a MongoDB
collection have the same keys. The choice to not separately index keys
in the GIN hash opclass is far from arbitrary, even if you don't agree
with it.

> In any case, it was certainly the complaint that insertions might
> fail altogether that made me (and I assume others) want to not have
> jsonb_ops as the default opclass. Is there a good reason not to
> fix that limitation while we still can?

I have no objection to either changing the default, or having no default.

--
Peter Geoghegan


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-08 13:47:01
Message-ID: 20140508134701.GO30817@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Tue, May 6, 2014 at 06:20:53PM -0400, Tom Lane wrote:
> "David E. Wheeler" <david(at)justatheory(dot)com> writes:
> > On May 6, 2014, at 2:20 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> Well, then, we only have a few days to come up with a name.
>
> > What are the options?
>
> We have no proposals as yet.
>
> I've been looking at the source code to try to understand the difference
> between the two opclasses (and BTW I concur with the opinions expressed
> recently about the poor state of the internal documentation for jsonb).
> If I've got it straight:
>
> jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}"
> would give rise to GIN entries that are effectively the strings "Kxyz"
> and "V2". If you're looking for tuples containing "{xyz: 2}" then you
> would be looking for the AND of those independent index entries, which
> fortunately GIN is pretty good at computing. But you could also look
> for just keys or just values.
>
> jsonb_hash_ops creates an index entry only for values, but what it
> stores is a hash of both the value and the key it's stored under.
> So in this example you'd get a hash combining "xyz" and "2". This
> means the only type of query you can perform is like "find JSON tuples
> containing {xyz: 2}".

Good summary, thanks. This is the information I was hoping we had in
our docs. How does hstore deal with these issues?

> Because jsonb_ops stores the *whole* value, you can do lossless index
> searches (no recheck needed on the heap tuple), but you also run the
> risk of long strings failing to fit into an index entry. Since jsonb_ops
> reduces everything to a hash, there's no possibility of index failure,
> but all queries are lossy and require recheck.
>
> TBH, at this point I'm sort of agreeing with the thought expressed
> upthread that maybe neither of these should be the default as-is.
> They seem like rather arbitrary combinations of choices. In particular
> I wonder why there's not an option to store keys and values separately,
> but as hashes not as the original strings, so that indexability of
> everything could be guaranteed. Or a variant of that might be to hash
> only strings that are too large to fit in an index entry, and force
> recheck only when searching for a string that needed hashing.
>
> I wonder whether the most effective use of time at this point
> wouldn't be to fix jsonb_ops to do that, rather than arguing about
> what to rename it to. If it didn't have the failure-for-long-strings
> problem I doubt anybody would be unhappy about making it the default.

Can we hash just the values, not the keys, in jsonb_ops, and hash the
combo in jsonb_hash_ops. That would give us key-only lookups without a
recheck.

How do we index long strings now? Is it he combination of GIN and long
strings that is the problem?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-08 14:16:54
Message-ID: 5819.1399558614@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Tue, May 6, 2014 at 06:20:53PM -0400, Tom Lane wrote:
>> I wonder why there's not an option to store keys and values separately,
>> but as hashes not as the original strings, so that indexability of
>> everything could be guaranteed. Or a variant of that might be to hash
>> only strings that are too large to fit in an index entry, and force
>> recheck only when searching for a string that needed hashing.
>>
>> I wonder whether the most effective use of time at this point
>> wouldn't be to fix jsonb_ops to do that, rather than arguing about
>> what to rename it to. If it didn't have the failure-for-long-strings
>> problem I doubt anybody would be unhappy about making it the default.

> Can we hash just the values, not the keys, in jsonb_ops, and hash the
> combo in jsonb_hash_ops. That would give us key-only lookups without a
> recheck.

No, because there's nothing in JSON limiting the length of keys, any more
than values.

I think the idea of hashing only keys/values that are "too long" is a
reasonable compromise. I've not finished coding it (because I keep
getting distracted by other problems in the code :-() but it does not
look to be very difficult. I'm envisioning the cutoff as being something
like 128 bytes; in practice that would mean that few if any keys get
hashed, I think.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-08 14:34:04
Message-ID: 20140508143404.GP30817@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Thu, May 8, 2014 at 10:16:54AM -0400, Tom Lane wrote:
> > Can we hash just the values, not the keys, in jsonb_ops, and hash the
> > combo in jsonb_hash_ops. That would give us key-only lookups without a
> > recheck.
>
> No, because there's nothing in JSON limiting the length of keys, any more
> than values.
>
> I think the idea of hashing only keys/values that are "too long" is a
> reasonable compromise. I've not finished coding it (because I keep
> getting distracted by other problems in the code :-() but it does not
> look to be very difficult. I'm envisioning the cutoff as being something
> like 128 bytes; in practice that would mean that few if any keys get
> hashed, I think.

Yes, that would be nice. Ideally we would not be doing this so close to
beta, but it is what it is, and if we need to break binary compatibility
after beta1, at least we have pg_upgrade.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-08 14:37:05
Message-ID: 20140508143705.GA1703@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 2014-05-08 10:34:04 -0400, Bruce Momjian wrote:
> On Thu, May 8, 2014 at 10:16:54AM -0400, Tom Lane wrote:
> > > Can we hash just the values, not the keys, in jsonb_ops, and hash the
> > > combo in jsonb_hash_ops. That would give us key-only lookups without a
> > > recheck.
> >
> > No, because there's nothing in JSON limiting the length of keys, any more
> > than values.
> >
> > I think the idea of hashing only keys/values that are "too long" is a
> > reasonable compromise. I've not finished coding it (because I keep
> > getting distracted by other problems in the code :-() but it does not
> > look to be very difficult. I'm envisioning the cutoff as being something
> > like 128 bytes; in practice that would mean that few if any keys get
> > hashed, I think.
>
> Yes, that would be nice. Ideally we would not be doing this so close to
> beta, but it is what it is, and if we need to break binary compatibility
> after beta1, at least we have pg_upgrade.

If we break binary compatibility pg_upgrade won't be able to help. Since
the data files wont be, err, binary compatibile.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-08 14:40:12
Message-ID: 20140508144012.GQ30817@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Thu, May 8, 2014 at 04:37:05PM +0200, Andres Freund wrote:
> On 2014-05-08 10:34:04 -0400, Bruce Momjian wrote:
> > On Thu, May 8, 2014 at 10:16:54AM -0400, Tom Lane wrote:
> > > > Can we hash just the values, not the keys, in jsonb_ops, and hash the
> > > > combo in jsonb_hash_ops. That would give us key-only lookups without a
> > > > recheck.
> > >
> > > No, because there's nothing in JSON limiting the length of keys, any more
> > > than values.
> > >
> > > I think the idea of hashing only keys/values that are "too long" is a
> > > reasonable compromise. I've not finished coding it (because I keep
> > > getting distracted by other problems in the code :-() but it does not
> > > look to be very difficult. I'm envisioning the cutoff as being something
> > > like 128 bytes; in practice that would mean that few if any keys get
> > > hashed, I think.
> >
> > Yes, that would be nice. Ideally we would not be doing this so close to
> > beta, but it is what it is, and if we need to break binary compatibility
> > after beta1, at least we have pg_upgrade.
>
> If we break binary compatibility pg_upgrade won't be able to help. Since
> the data files wont be, err, binary compatibile.

Oops, yeah. pg_upgrade only helps with system table changes. We would
have to require users to dump/reload any changed tables or recreate any
indexs.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-08 22:39:11
Message-ID: 1888.1399588751@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

I wrote:
> I think the idea of hashing only keys/values that are "too long" is a
> reasonable compromise. I've not finished coding it (because I keep
> getting distracted by other problems in the code :-() but it does not
> look to be very difficult. I'm envisioning the cutoff as being something
> like 128 bytes; in practice that would mean that few if any keys get
> hashed, I think.

Attached is a draft patch for this. In addition to the hash logic per se,
I made these changes:

* Replaced the K/V prefix bytes with a code that distinguishes the types
of JSON values. While this is not of any huge significance for the
current index search operators, it's basically free to store the info,
so I think we should do it for possible future use.

* Fixed the problem with "exists" returning rows it shouldn't. I
concluded that the best fix is just to force recheck for exists, which
allows considerable simplification in the consistent functions.

* Tried to improve the comments in jsonb_gin.c.

Barring objections I'll commit this tomorrow, and also try to improve the
user-facing documentation about the jsonb opclasses.

regards, tom lane

Attachment Content-Type Size
jsonb-gin-fixes-1.patch text/x-diff 29.2 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 03:34:05
Message-ID: 20140509033405.GA23254@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Thu, May 8, 2014 at 06:39:11PM -0400, Tom Lane wrote:
> I wrote:
> > I think the idea of hashing only keys/values that are "too long" is a
> > reasonable compromise. I've not finished coding it (because I keep
> > getting distracted by other problems in the code :-() but it does not
> > look to be very difficult. I'm envisioning the cutoff as being something
> > like 128 bytes; in practice that would mean that few if any keys get
> > hashed, I think.
>
> Attached is a draft patch for this. In addition to the hash logic per se,
> I made these changes:
>
> * Replaced the K/V prefix bytes with a code that distinguishes the types
> of JSON values. While this is not of any huge significance for the
> current index search operators, it's basically free to store the info,
> so I think we should do it for possible future use.
>
> * Fixed the problem with "exists" returning rows it shouldn't. I
> concluded that the best fix is just to force recheck for exists, which
> allows considerable simplification in the consistent functions.
>
> * Tried to improve the comments in jsonb_gin.c.
>
> Barring objections I'll commit this tomorrow, and also try to improve the
> user-facing documentation about the jsonb opclasses.

Looks good. I was thinking the jsonb_ops name could remain unchanged
and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
the key and value into a single index entry.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 04:09:51
Message-ID: 536C550F.50108@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 09/05/14 15:34, Bruce Momjian wrote:
> On Thu, May 8, 2014 at 06:39:11PM -0400, Tom Lane wrote:
>> I wrote:
>>> I think the idea of hashing only keys/values that are "too long" is a
>>> reasonable compromise. I've not finished coding it (because I keep
>>> getting distracted by other problems in the code :-() but it does not
>>> look to be very difficult. I'm envisioning the cutoff as being something
>>> like 128 bytes; in practice that would mean that few if any keys get
>>> hashed, I think.
>> Attached is a draft patch for this. In addition to the hash logic per se,
>> I made these changes:
>>
>> * Replaced the K/V prefix bytes with a code that distinguishes the types
>> of JSON values. While this is not of any huge significance for the
>> current index search operators, it's basically free to store the info,
>> so I think we should do it for possible future use.
>>
>> * Fixed the problem with "exists" returning rows it shouldn't. I
>> concluded that the best fix is just to force recheck for exists, which
>> allows considerable simplification in the consistent functions.
>>
>> * Tried to improve the comments in jsonb_gin.c.
>>
>> Barring objections I'll commit this tomorrow, and also try to improve the
>> user-facing documentation about the jsonb opclasses.
> Looks good. I was thinking the jsonb_ops name could remain unchanged
> and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
> the key and value into a single index entry.
>
If you have 'jsonb_combo_ops' - then surely 'jsonb_op' should be called
'jsonb_xxx_ops', where the 'xxx' distinguishes that from
'jsonb_combo_ops'? I guess, if any appropriate wording of 'xxx' was too
cumbersome, then it would be worse.

Cheers,
Gavin


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 05:20:08
Message-ID: CAM3SWZTK+dpASbSqY3tn9eOKgJG48w4Vcty13GzBi6w=ss_VyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Thu, May 8, 2014 at 3:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Barring objections I'll commit this tomorrow

Looks good to me.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 11:04:17
Message-ID: 18360.1399633457@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> writes:
> On 09/05/14 15:34, Bruce Momjian wrote:
>> Looks good. I was thinking the jsonb_ops name could remain unchanged
>> and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
>> the key and value into a single index entry.

> If you have 'jsonb_combo_ops' - then surely 'jsonb_op' should be called
> 'jsonb_xxx_ops', where the 'xxx' distinguishes that from
> 'jsonb_combo_ops'? I guess, if any appropriate wording of 'xxx' was too
> cumbersome, then it would be worse.

Yeah, I'm disinclined to change the opclass names now. It's not apparent
to me that "combo" is a better choice than "hash" for the second opclass.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 13:53:36
Message-ID: 20140509135336.GC23254@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Fri, May 9, 2014 at 07:04:17AM -0400, Tom Lane wrote:
> Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> writes:
> > On 09/05/14 15:34, Bruce Momjian wrote:
> >> Looks good. I was thinking the jsonb_ops name could remain unchanged
> >> and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
> >> the key and value into a single index entry.
>
> > If you have 'jsonb_combo_ops' - then surely 'jsonb_op' should be called
> > 'jsonb_xxx_ops', where the 'xxx' distinguishes that from
> > 'jsonb_combo_ops'? I guess, if any appropriate wording of 'xxx' was too
> > cumbersome, then it would be worse.
>
> Yeah, I'm disinclined to change the opclass names now. It's not apparent
> to me that "combo" is a better choice than "hash" for the second opclass.

Well, if we are optionally hashing json_ops for long strings, what does
jsonb_hash_ops do uniquely with hashing? Does it always hash, while
json_ops optionally hashes? Is that the distinguishing characteristic?
It seemed the _content_ of the indexed value was more important, rather
than the storage method.

Should jsonb_hash_ops do only optional hashing too for long strings?

Also, with json_ops, when you index '{"exterior" : "white", "interior":
"blue"}', if you query for {"exterior" : "blue"}, does it match and have
to be rechecked in the heap because the index doesn't know which values
go with which keys?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Greg Stark <stark(at)mit(dot)edu>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 14:26:33
Message-ID: CAM-w4HOQ+ynm70Cgr4BwMqMc0jfL_8EZAwG2UD6r=EehjH3OOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Fri, May 9, 2014 at 2:53 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Well, if we are optionally hashing json_ops for long strings, what does
> jsonb_hash_ops do uniquely with hashing? Does it always hash, while
> json_ops optionally hashes? Is that the distinguishing characteristic?
> It seemed the _content_ of the indexed value was more important, rather
> than the storage method.
>
> Should jsonb_hash_ops do only optional hashing too for long strings?

Well the question seems to me to be that if we're always doing recheck
then what advantage is there to not hashing everything?

--
greg


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 14:26:48
Message-ID: 20140509142648.GD23254@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Fri, May 9, 2014 at 09:53:36AM -0400, Bruce Momjian wrote:
> On Fri, May 9, 2014 at 07:04:17AM -0400, Tom Lane wrote:
> > Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> writes:
> > > On 09/05/14 15:34, Bruce Momjian wrote:
> > >> Looks good. I was thinking the jsonb_ops name could remain unchanged
> > >> and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
> > >> the key and value into a single index entry.
> >
> > > If you have 'jsonb_combo_ops' - then surely 'jsonb_op' should be called
> > > 'jsonb_xxx_ops', where the 'xxx' distinguishes that from
> > > 'jsonb_combo_ops'? I guess, if any appropriate wording of 'xxx' was too
> > > cumbersome, then it would be worse.
> >
> > Yeah, I'm disinclined to change the opclass names now. It's not apparent
> > to me that "combo" is a better choice than "hash" for the second opclass.
>
> Well, if we are optionally hashing json_ops for long strings, what does
> jsonb_hash_ops do uniquely with hashing? Does it always hash, while
> json_ops optionally hashes? Is that the distinguishing characteristic?
> It seemed the _content_ of the indexed value was more important, rather
> than the storage method.

Also, are people going to think that jsonb_hash_ops creates a hash
index, which is not crash safe, even though it is a GIN index? Do we
have this "hash" confusion anywhere else?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 14:32:01
Message-ID: 20140509143201.GF30231@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 2014-05-09 10:26:48 -0400, Bruce Momjian wrote:
> On Fri, May 9, 2014 at 09:53:36AM -0400, Bruce Momjian wrote:
> > On Fri, May 9, 2014 at 07:04:17AM -0400, Tom Lane wrote:
> > > Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> writes:
> > > > On 09/05/14 15:34, Bruce Momjian wrote:
> > > >> Looks good. I was thinking the jsonb_ops name could remain unchanged
> > > >> and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
> > > >> the key and value into a single index entry.
> > >
> > > > If you have 'jsonb_combo_ops' - then surely 'jsonb_op' should be called
> > > > 'jsonb_xxx_ops', where the 'xxx' distinguishes that from
> > > > 'jsonb_combo_ops'? I guess, if any appropriate wording of 'xxx' was too
> > > > cumbersome, then it would be worse.
> > >
> > > Yeah, I'm disinclined to change the opclass names now. It's not apparent
> > > to me that "combo" is a better choice than "hash" for the second opclass.
> >
> > Well, if we are optionally hashing json_ops for long strings, what does
> > jsonb_hash_ops do uniquely with hashing? Does it always hash, while
> > json_ops optionally hashes? Is that the distinguishing characteristic?
> > It seemed the _content_ of the indexed value was more important, rather
> > than the storage method.
>
> Also, are people going to think that jsonb_hash_ops creates a hash
> index, which is not crash safe, even though it is a GIN index? Do we
> have this "hash" confusion anywhere else?

The operator class has to be specified after the USING GIN in CREATE
INDEX so I think that rest is neglegible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-09 20:44:32
Message-ID: 28961.1399668272@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> Well the question seems to me to be that if we're always doing recheck
> then what advantage is there to not hashing everything?

Right now, there's not much. But it seems likely to me that there will be
more JSON operators in future, and some of them might be able to make use
of the additional specificity of unhashed entries. For example, it's only
a very arbitrary definitional choice for the exists operator (ie, not
looking into sub-objects) that makes jsonb_ops lossy for it. We might
eventually build a recursive-exists-check operator for which the index
could be lossless, at least up to the string length where we start to
hash.

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-10 20:42:34
Message-ID: 536E8F3A.40706@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 05/09/2014 11:44 PM, Tom Lane wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
>> Well the question seems to me to be that if we're always doing recheck
>> then what advantage is there to not hashing everything?
>
> Right now, there's not much. But it seems likely to me that there will be
> more JSON operators in future, and some of them might be able to make use
> of the additional specificity of unhashed entries. For example, it's only
> a very arbitrary definitional choice for the exists operator (ie, not
> looking into sub-objects) that makes jsonb_ops lossy for it. We might
> eventually build a recursive-exists-check operator for which the index
> could be lossless, at least up to the string length where we start to
> hash.

Back to the naming:

The main difference between the two opclasses from a user's standpoint
is not whether they hash or not. The big difference is that one indexes
complete paths from the root, and the other indexes just the "leaf"
level. For example, if you have an object like '{"foo": {"bar": 123 }
}', one will index "foo", "foo->bar", and "foo->bar->123" while the
other will index "foo", "bar" and "123".

Whether the opclasses use hashing to shorten the key is an orthogonal
property, and IMHO not as important. To reflect that, I suggest that we
name the opclasses:

json_path_ops
json_value_ops

or something along those lines.

- Heikki


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-10 21:00:54
Message-ID: 536E9386.6050701@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers


On 05/10/2014 04:42 PM, Heikki Linnakangas wrote:
>
>
> The main difference between the two opclasses from a user's standpoint
> is not whether they hash or not. The big difference is that one
> indexes complete paths from the root, and the other indexes just the
> "leaf" level. For example, if you have an object like '{"foo": {"bar":
> 123 } }', one will index "foo", "foo->bar", and "foo->bar->123" while
> the other will index "foo", "bar" and "123".
>
> Whether the opclasses use hashing to shorten the key is an orthogonal
> property, and IMHO not as important. To reflect that, I suggest that
> we name the opclasses:
>
> json_path_ops
> json_value_ops
>
> or something along those lines.
>
>

That looks like the first suggestion I've actually liked and that users
will be able to understand.

cheers

andrew


From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-10 21:03:24
Message-ID: CAF4Au4wVcdsGLQOnzib3NoiJPF7seC9ydR4e6XVV_9D0WUgNWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

+1
but bit confused with json instead of jsonb

On Sun, May 11, 2014 at 1:00 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 05/10/2014 04:42 PM, Heikki Linnakangas wrote:
>>
>>
>>
>> The main difference between the two opclasses from a user's standpoint is
>> not whether they hash or not. The big difference is that one indexes
>> complete paths from the root, and the other indexes just the "leaf" level.
>> For example, if you have an object like '{"foo": {"bar": 123 } }', one will
>> index "foo", "foo->bar", and "foo->bar->123" while the other will index
>> "foo", "bar" and "123".
>>
>> Whether the opclasses use hashing to shorten the key is an orthogonal
>> property, and IMHO not as important. To reflect that, I suggest that we name
>> the opclasses:
>>
>> json_path_ops
>> json_value_ops
>>
>> or something along those lines.
>>
>>
>
>
> That looks like the first suggestion I've actually liked and that users will
> be able to understand.
>
> cheers
>
> andrew
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-10 21:52:50
Message-ID: CAM3SWZTerM-f8nhmuatR-YhPiMwrqR01=TwtCNM1E9PERkvAAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Sat, May 10, 2014 at 1:42 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> For example, if you have an object like '{"foo": {"bar": 123 } }', one will
> index "foo", "foo->bar", and "foo->bar->123" while the other will index
> "foo", "bar" and "123".

That isn't quite right, if we're talking about the user's perspective,
and concerns about what is made indexable. You cannot actually query
"foo->bar" there, because containment semantics don't support it.
Basically, you'd have to write that like this:

select * from docs where jdoc @> '{"foo":{"bar":*}'::jsonb;

I've added a wildcard to the rhs jsonb here, which of course won't
work, the proximate cause being that that simply isn't valid jsonb.
It's also something inherently impossible to support with the current
jsonb_hash_op's indexing strategy. That only hashes elements and
values, mixing in keys from all outer nesting levels (so it's possible
for there to be 0 *nentries). In the strict physical sense, it only
indexes "123" (with all keys mixed in to the hash) from your example,
because that's the only element or value that appears.

Your description of only indexing the "leaf level" is not ideal,
because it makes me think of B-Trees. Unlike B-Trees, jsonb can and
frequently will have jsonb_hash_ops-indexable primitive
values/elements at all nesting levels (that is, both "inner"
containers and "leaf" containers).

> Whether the opclasses use hashing to shorten the key is an orthogonal
> property, and IMHO not as important. To reflect that, I suggest that we name
> the opclasses:
>
> json_path_ops
> json_value_ops
>
> or something along those lines.

I would like to once again emphasize the user-level distinction
between the two: one (the text storage opclass) is better for
hstore-style indexing, where jsonb data is heterogeneous in structure,
most probably a hodge-podge of key/value pairs. The other (the hash
storage opclass) is better for document-database style use cases,
where keys alone have low selectivity, and there is an
unenforced/implicit schema that is mostly adhered to by client
applications. That's why I don't think it's much of a problem that the
example query above won't work (you could do something with
expressions, or expression indexes to make something work, but it
won't work with the text-storage opclass because there is no
appropriate operator). In general, for those document-database style
use cases, indexing keys alone isn't useful. Wherever that turns out
to be untrue, ad-hoc expression indexes of the text storage opclass
are probably the best solution.

Anyway, I agree with your general assessment; hashing is nothing more
than an implementation detail. I'm not sure which of your proposed
names is intended for which opclass, though. Do you mean
"jsonb_path_ops" and "jsonb_key_ops"? That makes perfect sense to me,
because indexing keys alone is the main user-level distinction.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-10 21:55:44
Message-ID: 20591.1399758944@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 05/10/2014 04:42 PM, Heikki Linnakangas wrote:
>> Whether the opclasses use hashing to shorten the key is an orthogonal
>> property, and IMHO not as important. To reflect that, I suggest that
>> we name the opclasses:
>>
>> json_path_ops
>> json_value_ops
>>
>> or something along those lines.

> That looks like the first suggestion I've actually liked and that users
> will be able to understand.

I'd prefer to stick with just "jsonb_ops" for the default opclass.
Also, "jsonb_value_ops" is not le mot juste anyway, because that opclass
doesn't just index values it also indexes keys (if I've got the JSON
terminology straight). However, I could accept renaming jsonb_hash_ops
to jsonb_path_ops, as long as we do it PDQ.

regards, tom lane


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-11 00:25:49
Message-ID: CAM3SWZQpg8cX+MccHd38KCQMgD9Leh-4BNBFnV-8gh5+Rnb1Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Sat, May 10, 2014 at 2:52 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I've added a wildcard to the rhs jsonb here, which of course won't
> work, the proximate cause being that that simply isn't valid jsonb.
> It's also something inherently impossible to support with the current
> jsonb_hash_op's indexing strategy. That only hashes elements and
> values, mixing in keys from all outer nesting levels (so it's possible
> for there to be 0 *nentries).

It occurs to me that this could be a particular problem for
jsonb_hash_ops. Consider this jsonb:

'{"a":{}}'::jsonb

In one sense, the outermost level's "a" key does have a value: an
empty object. So we may test containment in an indexable fashion like
this:

select * from foo where j @> '{"a":{}}'::jsonb

But in another sense, the sense that is relevant to jsonb_hash_ops, it
does not. There would be *no* GIN keys passed back from
gin_extract_jsonb_hash() if it were tasked with extracting keys from
this rhs jsonb.

Now, I'm not all that worried about this, because this is surely an
odd-ball use case, particularly for jsonb_hash_ops where no keys are
separately indexed (separately from *primitive* elements/values).
However, it is worth noting in the documentation in my view. I attach
a doc patch that adds this.

--
Peter Geoghegan

Attachment Content-Type Size
jsonb_structure_containment.patch text/x-patch 680 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-11 00:54:33
Message-ID: 4986.1399769673@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> Now, I'm not all that worried about this, because this is surely an
> odd-ball use case, particularly for jsonb_hash_ops where no keys are
> separately indexed (separately from *primitive* elements/values).
> However, it is worth noting in the documentation in my view. I attach
> a doc patch that adds this.

Agreed, we'd better mention that somewhere.

I'm not sure whether we have consensus to rename jsonb_hash_ops to
jsonb_path_ops, but since time is so short I went ahead and made a draft
patch to do so (attached). Probably the most interesting part of this is
the new text in json.sgml explaining the difference between the two
opclasses. I also added a paragraph about the empty-query hazard that
Peter mentions. Do people think this discussion is correct and useful?

regards, tom lane

Attachment Content-Type Size
rename-jsonb_hash_ops.patch text/x-diff 21.4 KB

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-11 01:31:35
Message-ID: CAM3SWZQMTFQO3o2Y8i42E-ABOHaJt-0_VtMxzFanp+5nAqD-pQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On Sat, May 10, 2014 at 5:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm not sure whether we have consensus to rename jsonb_hash_ops to
> jsonb_path_ops, but since time is so short I went ahead and made a draft
> patch to do so (attached). Probably the most interesting part of this is
> the new text in json.sgml explaining the difference between the two
> opclasses. I also added a paragraph about the empty-query hazard that
> Peter mentions. Do people think this discussion is correct and useful?

I for one am fine with the name change you propose.

> + especially if
> + there are a very large number of rows containing any single one of the
> + three keys

I suggest that you phrase this as "three index items".

> + A disadvantage of the <literal>jsonb_path_ops</literal> approach is
> + that it produces no index entries for JSON structures not containing
> + any values, such as <literal>{"a": {}}</literal>. If a search for

I suggest "any values or elements".

Even though I previously called hashing an implementation detail, we
are bound to have to mention it in passing when discussing the
limitations of jsonb_hash_ops/jsonb_path_ops. I think that you should
proceed with committing the entire patch, including the doc changes
that discuss implementation details around the two GIN opclasses.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-05-11 02:09:38
Message-ID: 6662.1399774178@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Sat, May 10, 2014 at 5:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> + especially if
>> + there are a very large number of rows containing any single one of the
>> + three keys

> I suggest that you phrase this as "three index items".

Good idea --- "key" is overloaded in this discussion. I'd meant to use
"item" uniformly for the index entries, but missed some spots.

>> + A disadvantage of the <literal>jsonb_path_ops</literal> approach is
>> + that it produces no index entries for JSON structures not containing
>> + any values, such as <literal>{"a": {}}</literal>. If a search for

> I suggest "any values or elements".

Meh --- the previous para is also using "value" to include array elements,
and I don't see anything in RFC 7159 suggesting that that's not preferred
terminology. But I added a footnote to clarify:

The technical difference between a <literal>jsonb_ops</literal>
and a <literal>jsonb_path_ops</literal> GIN index is that the former
creates independent index items for each key and value in the data,
while the latter creates index items only for each value in the
data.<footnote><para>For this purpose, the term <quote>value</>
includes array elements, though JSON terminology sometimes considers
array elements distinct from values within objects.</para></footnote>

> Even though I previously called hashing an implementation detail, we
> are bound to have to mention it in passing when discussing the
> limitations of jsonb_hash_ops/jsonb_path_ops. I think that you should
> proceed with committing the entire patch, including the doc changes
> that discuss implementation details around the two GIN opclasses.

I'll hold off committing till the morning to see if there are objections.

regards, tom lane