Re: GSoC on WAL-logging hash indexes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tan Tran <tankimtran(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSoC on WAL-logging hash indexes
Date: 2014-04-30 19:46:40
Message-ID: CAMkU=1yhg806uMaRsJeaHgcUJa45+n=DMEt6BGwq9d_bRsDi8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers pgsql-students

On Wed, Apr 30, 2014 at 11:19 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Wed, Apr 30, 2014 at 11:03 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > If we don't put in the work to make them useful, then they won't ever
> become
> > useful.
> >
> > If we do put in the effort (and it would be considerable) then I think
> they
> > will be. But you may be correct that the effort required would perhaps
> be
> > better used in making btree even more better. I don't think we can
> conclude
> > that definitively without putting in the work to do the experiment.
>
> My argument doesn't hinge on there being more important work to do.
> Rather, I simply don't think that there is never going to be a
> compelling reason to use hash indexes in production.

I have an indexed text column with an average length of 50, a stddev length
of 15, and a pronounced right skew. Currently, the longest value in it is
811. But inevitably someone will need to insert something longer
than 2712. When that day comes, I will drop the btree index and add a hash
index (unless we remove that limitation from btree indexes in the mean
time). It lets me sleep at night knowing that I have that option today,
even if it would complicate crash recovery.

Apart from the
> obvious inflexibility, consider what it takes to make index creation
> fast - insertion-style building of indexes is much slower. Consider
> multi-key indexes.
>

I'm pretty sure hash indexes already implement a bulk creation fast path.
In any case, I've never noticed them being slow, and I've tested some
pretty big ones.

> Now, I'm not telling anyone what to work on, and if someone wants to
> make hash indexes WAL-logged to plug that hole, don't let me stop you.
> It probably makes sense as a project to learn more about Postgres
> internals. However, it would be unfair to not speak up given my
> misgivings around the practical utility of hash indexes.
>

Sure, and we all have our own opinions on that. Should we summarize them
somewhere easier to follow than a long email thread but more detailed than
a TODO entry? Whatever happened with the GSOC people? That should be well
under way by now, is anyone working on it? Are the discussions of their
efforts on-list, or is it between them and their mentors?

Cheers,

Jeff

In response to

Browse pgsql-advocacy by date

  From Date Subject
Next Message Alvaro Herrera 2014-04-30 19:57:50 Re: GSoC on WAL-logging hash indexes
Previous Message Stephen Frost 2014-04-30 19:29:34 Re: GSoC on WAL-logging hash indexes

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-04-30 19:57:50 Re: GSoC on WAL-logging hash indexes
Previous Message Stephen Frost 2014-04-30 19:29:34 Re: GSoC on WAL-logging hash indexes

Browse pgsql-students by date

  From Date Subject
Next Message Alvaro Herrera 2014-04-30 19:57:50 Re: GSoC on WAL-logging hash indexes
Previous Message Stephen Frost 2014-04-30 19:29:34 Re: GSoC on WAL-logging hash indexes