Re: GSoC Query

Lists: pgsql-hackers
From: gaurav gupta <gauravkumar(dot)gupta(at)students(dot)iiit(dot)ac(dot)in>
To: xzilla(at)users(dot)sourceforge(dot)net
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: GSoC Query
Date: 2010-03-29 02:01:45
Message-ID: dbf361031003281901r1e22466am30a74396b569d524@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sir/Ma'am,

I am a Mtech student and want to participate in GSoC. I have a project
idea and want to discuss its feasibility, usability and chance of selection
with you.

My idea is to add a functionality of Auto tuning and Auto Indexing/
Reindexing in DB languages.

Though I am not working on this I have some idea about implementation.
Idea is that on the no. of rows deleted, Inserted in the table we can make
our system capable to reindex the table that will save the time of user.
Similarly using the no. of select hits on a table we can check that if
maximum no. of times it is on a non-index field we can index on that field
to make select faster.

I am looking forward to hear from you.

--
Thanks & Regards,
Gaurav Kumar Gupta
+91-9032844745


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: gaurav gupta <gauravkumar(dot)gupta(at)students(dot)iiit(dot)ac(dot)in>
Cc: xzilla(at)users(dot)sourceforge(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC Query
Date: 2010-03-29 02:52:50
Message-ID: 603c8f071003281952x400ca11fg11cba64ce848a213@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 10:01 PM, gaurav gupta
<gauravkumar(dot)gupta(at)students(dot)iiit(dot)ac(dot)in> wrote:
> My idea is to add a functionality of Auto tuning and Auto Indexing/
> Reindexing in DB languages.
>
> Though I am not working on this I have some idea about implementation.
> Idea is that on the no. of rows deleted, Inserted in the table we can make
> our system capable to reindex the table that will save the time of user.

Reindexing is not routine maintenance for PostgreSQL, so this seems
fairly pointless.

> Similarly using the no. of select hits on a table we can check that if
> maximum no. of times it is on a non-index field we can index on that field
> to make select faster.

Well, a SELECT statement "hits" a whole row, not a single column; but
even if you could somehow figure out a way to tally up per-column
statistics (and it's certainly not obvious to me how to do such a
thing) it doesn't follow that a column which is frequently accessed is
a good candidate for indexing.

I don't think this is a good project for a first-time hacker, or
something that can realistically be completed in one summer. It
sounds more like a PhD project to me. I wrote to another student who
is considering submitting a GSOC proposal with some ideas I thought
might be suitable. You might want to review that email:

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01034.php

...Robert


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: gaurav gupta <gauravkumar(dot)gupta(at)students(dot)iiit(dot)ac(dot)in>
Cc: xzilla(at)users(dot)sourceforge(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC Query
Date: 2010-03-29 07:00:21
Message-ID: 4BB05005.3050007@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

gaurav gupta wrote:
> My idea is to add a functionality of Auto tuning and Auto Indexing/
> Reindexing in DB languages.

Ah, the classic request to start with the tuning and index wizards.
Really fun to work on, always lots of interest in it. Here's the
thing: projects in this area attract endless interest. If it were
possible to write something useful in a couple of months, we'd have a
hundred such programs fighting for attention. So the fact that we
actually have zero of them should tell you something about the actual
difficultly level of the work. You could spend the whole summer just
reading research papers on this topic and maybe catch up to the late
90's by the end.

Here's the usual advice I give to students looking to make a useful
contribution to any mature development project: the more boring the
work sounds, the more likely it is you'll actually do something people
can use. It's easy to find people who want to work on fun projects--so
easy that they've all been done already. What's left is either much
harder than it looks, or kind of dull to do. The idea behind
intentionally picking a boring one is that you're more likely to get one
that's unfinished for that reason, rather than because it's actually a
year or two of work to complete. Or, in the case you're asking about, a
decade or three if you were to start from scratch and were really
smart. If you started working on this now rather than stopping to
follow the research already done you might catch up to
http://portal.acm.org/citation.cfm?id=810505 in a couple of months.

> Similarly using the no. of select hits on a table we can check that if
> maximum no. of times it is on a non-index field we can index on that
> field to make select faster.

It's impractical to figure out where indexes should go at without
simulating what the optimizer would then do with them against a sample
set of queries. You can't do anything useful just with basic statistics
about the tables.

I would recommend
http://msdn.microsoft.com/en-us/library/aa226167(SQL.70).aspx as a good,
practical introduction to the topic of what it takes to figure out where
indexes go at, from someone who came up with a reasonable solution to
that problem. You can find a list of the underlying research they cite
(and an idea what has been done since then) at
http://portal.acm.org/citation.cfm?id=673646

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: gaurav gupta <gauravkumar(dot)gupta(at)students(dot)iiit(dot)ac(dot)in>, xzilla(at)users(dot)sourceforge(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC Query
Date: 2010-03-29 10:42:08
Message-ID: 9362e74e1003290342i645d27a7sa7fd21b417432b56@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
>
>
> Similarly using the no. of select hits on a table we can check that if
>> maximum no. of times it is on a non-index field we can index on that field
>> to make select faster.
>>
>
> It's impractical to figure out where indexes should go at without
> simulating what the optimizer would then do with them against a sample set
> of queries. You can't do anything useful just with basic statistics about
> the tables.
>
> I would recommend
> http://msdn.microsoft.com/en-us/library/aa226167(SQL.70).aspx<http://msdn.microsoft.com/en-us/library/aa226167%28SQL.70%29.aspx>as a good, practical introduction to the topic of what it takes to figure
> out where indexes go at, from someone who came up with a reasonable solution
> to that problem. You can find a list of the underlying research they cite
> (and an idea what has been done since then) at
> http://portal.acm.org/citation.cfm?id=673646
>
>
Even if you have devised a way to find the appropriate set of indexes, just
have a index adviser, which would advise a set of indexes for a set of
queries and let the DBA and the application user take the final call, after
looking at them..

Gokul.