Fwd: [DBRG] Special Talk on Friday: HaLoop

From: Selena Deckelmann <selenamarie(at)gmail(dot)com>
To: Postgresql PDX_Users <pdxpug(at)postgresql(dot)org>
Subject: Fwd: [DBRG] Special Talk on Friday: HaLoop
Date: 2010-07-21 00:19:52
Message-ID: AANLkTin8Qb0y7CFT6QBMrV-r_6eDSbXsawZbgmTWoPrO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pdxpug

Seems cool if you have a chance to see it!

---------- Forwarded message ----------
From: David Maier <maier(at)cs(dot)pdx(dot)edu>
Date: Tue, Jul 20, 2010 at 5:18 PM
Subject: [DBRG] Special Talk on Friday: HaLoop
To: dbreading(at)cs(dot)pdx(dot)edu, datalab(at)cs(dot)pdx(dot)edu, Grossniklaus Michael
<grossniklaus(at)inf(dot)ethz(dot)ch>

Bill Howe will be in town Friday, and has offered to give us a preview
of his VLDB paper.

It will be at reading-group time (10a); assume the small conference room
-- but I am looking in to using FAB 150
================================================

HaLoop: Efficient Recursive Query Processing on Large Scale Clusters
Bill Howe
University of Washington

The growing demand for large-scale data mining and data analysis
applications has motivated new parallel data processing platforms that
ignore conventional database features such as declarative languages,
schemas, indexing, optimization, and transactions in favor of
flexibility, ease of use, and scalability.   MapReduce, as implemented
in Hadoop, is a popular example. Newer systems have begun to include
database-like features, suggesting a larger design space encompassing
both types of systems.  In the first part of the talk, I will
characterize the design space of parallel processing platforms and
position existing systems within it, arguing that all such systems are
essentially "query processing" engines.

In the second part of the talk, we point out that all such systems
lack support for the iterative processing capabilities needed in data
mining, web ranking, graph analysis, scientific data processing, and
model fitting.   I will describe our work on HaLoop, a modified
version of the Hadoop MapReduce framework designed to serve these
applications.  HaLoop not only extends MapReduce with programming
support for iterative applications, it also dramatically improves
their efficiency by making the task scheduler loop-aware and by adding
various caching mechanisms.  I will present our results evaluating
HaLoop on real queries and real datasets, showing that HaLoop reduces
query runtime relative to Hadoop by a factor of about 2 on average.

Bio:
Bill is a Senior Scientist at the eScience Institute at the University
of Washington and an Affiliate Assistant Professor in the Computer
Science and Engineering Department, also at UW.  His research focuses
on data-intensive scalable computing for science through awards from
NSF, Microsoft Research, and PNNL.  Bill holds a Phd in Computer
Science from Portland State University and a Bachelor's degree in
Industrial and Systems Engineering from Georgia Tech.

_______________________________________________
dbreading mailing list
dbreading(at)cecs(dot)pdx(dot)edu
https://mailhost.cecs.pdx.edu/cgi-bin/mailman/listinfo/dbreading

--
http://chesnok.com/daily - me

Browse pdxpug by date

  From Date Subject
Next Message Selena Deckelmann 2010-07-24 23:37:31 Call for speakers in October and November!
Previous Message gabrielle 2010-07-08 14:43:10 Hot Standby: Bugs + Beer