| Steve: Developing on the Edge Thoughts on development, Web-services, technology and mountains. | |
2Oct Fri2009 | A Cloud Tools Manifesto
Dedicated to the belief that tooling
that works reliably can only be achieved by good designs and
adequate testing
There's been lots of discussion about what people who deploy
their applicationss in "the cloud" want -it gets fairly political,
as the Open
Cloud Manifesto shows. That manifesto contains requirements
about portability that are anathema to people hosting applications
directly in their infrastructure -Google, Microsoft,
Salesforce.com- and I worry about it too. I worry about the HTML
source of the manifesto -have you seen it? Scary. Someone has gone
to a lot of effort to make a web page look like a document.
Having read the manifesto, I cannot help but smirk when I read
the a bit about re-using existing standards and judicious creation
of new ones. This manifesto came from the same company that gave
the world WS-ResourceFramework? Either they have learned the error
of their ways, or WS-RF is one of the existing standards they have
in mind.
Anyway, I'm not going to comment on it in detail, except to say
I'm working at a different level, namely the problem of getting
client code to work with different clouds with different machine
management APIs. Listing what you want to do with the machines,
that's something to worry about in a different place -my concern is
what be can be done to work with the infrastructures, what do tool
authors need from the service providers.
Here, then, is my Cloud Tools Manifesto. I have no signatories
for it yet, having recently written it. I did post a draft to the
Typica list, where there's been recent talk of a mock EC2 stack.
One of the responses was an invitation to get involved in the Open
Cloud Computing Interface (OCCI) group, which is part of the Open
Grid Forum. I was most amused, for reasons some people (Savas) should recognise. Time spent
with Savas and Jim were the best parts of the GGF process.
Cloud Tools Manifesto
API requirements
- Provide enough information about library/protocol calls that
anyone can implement tools to work with the infrastructure -any
vendor specific tooling is the start, not the finish
- Licensing of the header files, any other parts of the
specification, should not prevent open source -any license- or
closed source implementations.
- Do not require that the tooling is implemented in a specific
language, using a specific SOAP library, on a specific OS. There
may be restrictions on what the infrastructure can run, but that
does not need to affect the tools used to get that code
working.
- Where possible, use underlying protocols/specifications that
are, by virtue of their stability in the field or rigorous
specification and test suite, highly interoperable.
- When XML is used, generate well-formed XML 1.0 in responses and
error messages.
- Parse XML formatted responses in a proper XML parser; not be
brittle to different XML encodings.
- If you add a new authentication method to HTTP, provide the
relevant patches and tests for popular libraries such as Apache
HttpClient.
- Have a structured form for error messages. SoapFaults, ugly as
they are, are something you can chuck back on HTTP responses.
- Provide stable constants for some failure modes (no auth, no
credit, not enough machines), document them online, ideally with a
URL rule that lets us take an error name "e_no_auth" and map to
some documentation such as
"http://cloud.example.org/messages/en/e_no_auth" . Better yet, make
the URL the fault constant, as it includes explicit namespace
information.
- List the constants, in machine parseable XML as well as HTML,
so that XSL transforms can generate language-specific error
constants
- If you adopt someone else's cloud machine management API,
retain their error response structure and the error codes. You may
need to add new faults, in which case they should be your own URLs.
If you do not provide faults that parse the same way as the
original API, you have not implemented the API.
- Include API/build version data in the error. This is very
useful for fielding bugreps against rapidly evolving
implementations.
- Don't assume the caller's clock is accurate, it really makes
testing under VMWare tricky
- If the service is somehow sensitive to clock times, provide a
documented means for a caller to easily determine your endpoint's
view of system time; this can be used to calculate the offset the
client needs to apply to its own clock.
- Provide some means of contact with people who can help debug
interoperability problems. Good: Forums, email, issue trackers. Not
acceptable: requiring the developers to travel round the world for
meetings and conferences.
- Where possible, engage us in discussions about API futures.
NDAs and conflicts of interest complicate this, but it would still
be useful.
- Listen to our feedback. If something is hard for us to test, it
probably doesn't work right in our code.
Testing
Mock Endpoint
Provide a mock endpoint that:
- Has the same API and error responses as the production
endpoint
- Simulates the allocation/release of VMs and other assets,
validates all requests
- Can be set up by a caller to fail for the next request from a
specific account, with a specific failure.
- Is free to use to everyone with an account.
- Can be used by test accounts whose authentication details
aren't required to be kept a secret. This would let us embed the
tests in open source releases, run on hudson, etc.
- If the mock endpoint can be redistributed as a program , a
library or a VM Image, provide a means of downloading or hosting it
for independent testing.
Note that while we create our own mock endpoints -and often do,
those mock endpoints will contain our assumptions about the API,
our beliefs on what the failure messages will be. A mock endpoint
provided by the production team would fail in the ways the
production team expect things to go wrong, and be more
rigorous.
Production Endpoint
On the production endpoint
- Provide discounted/free machines to the test tool teams. These
can be massively underpowered VMs, as we are normally simulating
complex systems, not doing real work. That we can pay for. It's the
unit tests that run up our bills; creating and destroying machines
all the time.
- Offer access to forthcoming features/API versions, NDAs
permitting
Nightly builds
If the infrastructure team has an automated build process with a
staging cluster, consider:
- Offering the tooling developers access to this endpoint, so
that they can report problems sooner rather than later.
- Running a local copy of the tooling against the development
branch's endpoint, as part of the CI process.
- Adding open source tools build and test runs to the CI server's
build and test process. This helps find interop problems with the
trunk versions of everyone's code.
Our Obligations
In exchange we agree to:
- Read your documentation and look at your examples before
getting into trouble.
- Write code that usually appears to work.
- When XML is needed, generate well-formed XML.
- Parse XML formatted responses in a proper XML parser.
- Document our client for others to use.
- Provide our client identification/version info in an HTTP
header.
- Write functional tests.
- Test our code against your endpoints.
- Test our code against your endpoints with a proxy in the
way.
- Write code that fails in some vaguely useful way when things go
wrong.
- Write code that provides diagnostics information when things go
wrong, so as to help in blame assignment when something does not
work. For example, list the endpoint, proxy settings, client code,
dump the error response.
- Have an option to log interactions with the server in more
detail.
- Write client applications that can be switched to different
endpoints, such as the test clusters, or third-party
implementations
- Where the far end requires the caller's clock to be roughly
similar, get the system time from the far end and use the
calculated offset to drive the timestamps.
- Not to cache DNS values indefinitely; to assume that hostnames
move around.
There, that isn't asking for too much, is it?
|
| |
Posted by steve at
15:26 |
| |