I'm excited to be talking about Application Architectures for
the Cloud at ApacheCon EU at Easter.
This is not going to be a talk on Hadoop and MapReduce.
It's going be a sketching out of the architecture for big
applications that you want to deploy on a datacentre, one probably
hosted by a third party. In this world, machines come and go on
demand and your app needs to be designed not just to cope with it,
but to take advantage of it. It also needs to take advantage of the
fact that the distributed filestore and MR engine close the loop in
terms of feedback: start by assuming there is a big DFS there
instead of a database.
Lots of the Java ecosystem needs to evolve in this world. All
the logging tools now need to think about pushing facts out to the
DFS, to have post-mortem analysis tools running over it and looking
for recurrent errors across 500+ nodes. Same for the xUnit test
runners: no more one-XML-file-per-test case, now you have 500
servers running the same test suite, and the most interesting
problems are those that fail on 15% of the supposedly homogenous
servers. That's frequent enough to matter, but not the 100% failure
rate that is easy to debug.
There are lots of other bits in this story, and a new edition to
my slideware will be Project Voldemort,
assuming the J.K.Rowling lawyers haven't had it renamed by then.
More details on High-Scalability