26Jul Sat2008 | Pluggable Hadoop
Tom White looks
at how people are looking at extending Hadoop, including my
little plan for a
consistent lifecycle for hadoop services.
Although it will make subclassing easier, my real goal there is
to make it possible to start, stop and ping these services. The
fact that I've had to subclass the existing stuff today is because
they don't have easy ways to start and stop them, and no liveness
checks at all. With a unified base class and lifecycle, most of my
subclassing hacks are unneeded. No, where I'm looking at doing
interesting stuff is in Configuring
Hadoop; being able to manage the stuff is a precursor
Looking at the other areas of work, I think scheduling will get
the most interest from different people. Why? Because its where
people like Platform Computing deliver value. It's not the APIs for
grid computing, it's in distributing work to chosen machines. The
current Job Scheduler works, but it is very simple. Every task
worker node has a number of 'slots' -work is assigned to workers
with spare slots. The scheduler is location aware, looking for the
closest open slot to data, but there is no real examination of how
much work a node is really doing, what the expected workload of the
new job is (based on past experience), or anything resembling
balanced scheduling between users. Over time, that's where there is
going to be fun. Watch that space.
|