[GSoC] kmedoids status report

From: Maxence Ahlouche <maxence(dot)ahlouche(at)gmail(dot)com>
To: Hai Qian <hqian(at)gopivotal(dot)com>, Caleb Welton <cwelton(at)gopivotal(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Andreas Scherbaum <ascherbaum(at)gopivotal(dot)com>, Sujit Philip <sphilip(at)gopivotal(dot)com>, Marc Pantel <Marc(dot)Pantel(at)enseeiht(dot)fr>, "devel(at)madlib(dot)net" <devel(at)madlib(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: [GSoC] kmedoids status report
Date: 2014-08-07 10:42:23
Message-ID: CAJeaomUmGypOcfhkkgwYRZmCLQQ-6uA0=1yeHTr567EaW3CBEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Here is a report of what has been discussed yesterday on IRC.

The kmedoids module now seems to work correctly on basic datasets. I've
also implemented its variants with different seeding methods: random
initial medoids, and initial medoids distributed among the points (similar
to kmeans++ [0]).

The next steps are:

- Making better tests (1-2d)
- Writing the documentation (1d)
- Adapting my code to GP and HAWQ -- btw, are default parameters now
available in GP and HAWQ? (1-2d)
- Refactoring kmedoids and kmeans, as there is code duplication between
those two.
For this step, I don't know if I'll have time to create a clustering
module, and make kmeans and kmedoids submodules of it. If yes, then it's
perfect; otherwise, I'll just rename the common functions in kmeans, and
have kmedoids call them from there.

Hai also helped me setup (once more) the VM where GreenPlum and HAWQ are
installed, so that I can test my code on these DBMS.

As a reminder, I'm supposed to stop coding next Monday, and then the last
week is dedicated to documentation, tests, refactoring and polishing.

Regards,

Maxence

[0] https://en.wikipedia.org/wiki/K-means%2B%2B

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2014-08-07 10:47:08 Re: Reporting the commit LSN at commit time
Previous Message Teodor Sigaev 2014-08-07 10:34:47 Wraparound limits