Steve: Developing on the Edge - Pluggable Hadoop
Steve: Developing on the Edge
Thoughts on development, Web-services, technology and mountains.
26Jul
Sat2008
Pluggable Hadoop

Tom White looks at how people are looking at extending Hadoop, including my little plan for a consistent lifecycle for hadoop services.

Although it will make subclassing easier, my real goal there is to make it possible to start, stop and ping these services. The fact that I've had to subclass the existing stuff today is because they don't have easy ways to start and stop them, and no liveness checks at all. With a unified base class and lifecycle, most of my subclassing hacks are unneeded. No, where I'm looking at doing interesting stuff is in Configuring Hadoop; being able to manage the stuff is a precursor

Looking at the other areas of work, I think scheduling will get the most interest from different people. Why? Because its where people like Platform Computing deliver value. It's not the APIs for grid computing, it's in distributing work to chosen machines. The current Job Scheduler works, but it is very simple. Every task worker node has a number of 'slots' -work is assigned to workers with spare slots. The scheduler is location aware, looking for the closest open slot to data, but there is no real examination of how much work a node is really doing, what the expected workload of the new job is (based on past experience), or anything resembling balanced scheduling between users. Over time, that's where there is going to be fun. Watch that space.

Comments