Re: Fast insertion indexes: why no developments

From: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fast insertion indexes: why no developments
Date: 2013-11-13 09:07:56
Message-ID: 1384333676597-5778092.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote
> On 5 November 2013 14:28, Leonardo Francalanci &lt;

> m_lists@

> &gt; wrote:
>
>> Either my sql is not correct (likely), or my understanding of the minmax
>> index is
>> not correct (even more likely), or the minmax index is not usable in a
>> random inputs
>> scenario.
>
> Please show the real world SQL you intend to run, so we can comment on
> it. Inventing a use case that breaks effectiveness of any optimisation
> is always easy, but the question is whether the use case is likely or
> even desirable.

The use case is pretty simple.
Think it as the NSA, as it would be much easier.
Collect all calls made/received by any user on a mobile network.
(in fact, it's something more than calls, so in fact is 2-10 rows per call).
Keep the data for 20 days.
That's the insert part.

Query:
search calls made/received by the user using IMSI (caller id) or IMEI
(phone id). Date range is usually days (past 4 days, from 10 days ago
to 5 days ago...)

The result is just a very small percentage of the rows present in the
table: a single user doesn't call that much!
Searches are made by a human, so no that many request per second.

It's not a "write mostly" scenario, it's a 99% write 1% read scenario.

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.
Solution so far? partition every 15 minutes, create the indexes in bulk.

Simon Riggs wrote
> If we have a query to show the most recent calls by a particular caller
>
> SELECT *
> FROM cdr
> WHERE callerid = X
> ORDER BY call_timestamp DESC
> LIMIT 100
>
> then this could potentially be optimised using a minmax index, by
> traversing the data ranges in call_timestamp order. That is not part
> of the code in this initial release, since the main use case is for
> WHERE call_timestamp >= X, or WHERE primarykey = Y

I don't understand how a index on call_timestamp would help
in the query above.

Simon Riggs wrote
> I don't believe there is a credible business case for running that
> same query but without the ORDER BY and LIMIT, since it could
> potentially return gazillions of rows

Gazillion of rows??? We're talking about calls made/received by
one user here. How many calls do you make in 10 days???

Simon Riggs wrote
> so it isn't surprising at all
> that it would access a large % of the table.

In fact, the query I use return a fraction of the table, and only
a very small amount of users get searched.

Simon, you keep on talking about these minmax indexes, and
I still don't see any reference to some performance tests.

And, again, I think that random values insertion is the worst
use case for minmax indexes.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Colin 't Hart 2013-11-13 09:28:07 Re: TABLE not synonymous with SELECT * FROM?
Previous Message Nicolas Barbier 2013-11-13 08:57:44 Re: Fast insertion indexes: why no developments