Topic - NetKernel News Volume 1 Issue 31
Topic - NetKernel News Volume 1 Issue 31 Topic - NetKernel News Volume 1 Issue 31
from forum News
 forum index   my profile   search 
 new topic  post reply 
moderators: pjr tab
NetKernel News Volume 1 Issue 31
Joined: 7-February-2005
Posts: 591
Location: UK
Posted: 17-June-2010 10:02
[First sent to mailing list subscribers on June 6th 2010.  To subscribe for early access join NetKernel portal https://cs.1060research.com/csp/]

What's new this week?  Steady state on repo updates.  HTTP Bridge, scope
and caching.

Repo Status
-----------

No updates this week.

HTTP Bridge, Spacial Scope and Caching
---------------------------------------

There have recently been independent questions and discussion about the
design of the HTTP Bridge and it use in the Back and Front-end fulcrums.
Notably there are some subtleties to do with caching when working with
HTTP query parameters.  I wanted to wade in and talk about this since
this area has broader implications and lets me share some ideas on the
longer term roadmap.

The general architectural arrangement of the HTTP fulcrum's is shown in
the first diagram here...

http://docs.netkernel.org/book/view/book:tpt:http:book/doc:tpt:http:overview

The HTTP transport server endpoint is instantiated in a space (we call a
space with a transport a fulcrum since it is pivot point beneath which
other spaces are related and is the origin point for external root
requests).

The HTTP transport receives external HTTP requests - these include state
  from the socket level, the Headers, the Body, the request URL, method,
cookies etc etc.

If you just instantiated the HTTP transport on its own in your own test
space it would be quite happy.  It takes the incoming HTTP events and
issues an NK request with the same identifier as the external URL into
its host space.  It provides access to the underlying raw HTTPRequest
and HTTPResponse servlet objects by wrapping them in a combined
HTTPRequestResponseRepresentation and providing this as the primary
argument on the NK request.

If you want to get down and dirty you can work directly with the raw
servlet level apis in your service that would resolve the NK request
(which has the same URL as the http request).

Here's a very simple example...

<rootspace>
<endpoint>
   <prototype>HTTPServer</prototype>
</endpoint>
<accessor>
   <grammar>http://www.myhost.com/path</grammar>
   <class>myAccessor</class>
</accessor>
</rootspace>

Where the myAccessor would extend StandardAccessorImpl and override
onSource()...

public void onSource(context)
{
      
httpreqrep=context.sourcePrimary(HTTPRequestResponseRepresentation.class);

...Do stuff with HTTPRequest and HTTPResponse servlet objects...

}

This is kind of like NK's equivalent of a raw servlet.

So pretty simple, and all the low level raw control you could wish for.

But stepping down from the Web's resource oriented world to this
servlet-like object oriented world to handle HTTP requests is not very
powerful.  Especially since NK offers an ROC software environment that
allows the two worlds of the web and NK to be gracefully impedance matched.

So you'll see from the diagram we don't set up the HTTP fulcrums with
this raw pattern. Instead we provide a complementary partner endpoint
called the HTTPBridge endpoint.   The HTTPBridge is there to receive all
incoming requests from the HTTP transport and to present the dispersed,
non-normalized state of the HTTP request (URL, body etc etc) as a
uniform ROC addressable resource model.

To do this the HTTP Bridge dynamically creates a new space containing
endpoints that handle requests for httpRequest:* and httpResponse:*.
These resource sets are internally backed by the underlying raw servlet
objects but are now accessible uniformly in the ROC domain.

The HTTP Bridge constructs a new request with the same path as the
external HTTP URL but with the scheme and host parts removed and
replaced with the generic res:/ scheme.   We do this to avoid the
potential security risk of external recursion should you be importing
the http:/ client scheme in your application spaces.   This rewriting of
the URL to the request is actually a configurable parameter (see last
weeks news letter) and if you wished you could change this to include
the host etc as needed.

OK so now your application can focus on providing endpoints that resolve
the correct res:/x/y/z paths etc.  If required your application can
obtain representations from the httpRequest: space. For example, to
obtain the body of a POST you can do this in your code...

body=context.source("httpRequest:/body")

to get the Authorization header...

authHeader=context.source("httpRequest:/header/Authorization")

and to write into the http response during your processing you can issue
SINK requests to the httpResponse: set.

context.sink("httpResponse:/redirect","http://www.netkernel.org")

You'll have seen this when looking at any of the introductory tutorials etc.

Now the key point of this particular discussion is that the HTTPBridge
dynamically creates and inserts a normalized httpRequest: httpResponse:
space into the request scope for *each and every* HTTP transport
request.  As soon as the the HTTPBridge receives a response for its
sub-request (the rewritten one) then the inserted transient state space
"pops" and goes out of existence.  It exists only for the duration of
the inner request issued from the HTTPBridge.

Why is this significant?  Its significant since when NetKernel caches
resources it uses as their primary key a combination of their identity
and their resolution and evaluation scopes.   NK recognizes that to
serve something from cache it has to be able to determine the validity
of both what you are asking for and where you are asking for it!

It turns out there is something very powerful that becomes possible
here. NK can often determine that two independent requests asking for
the same resource identifier from two different locations (say one
coming in through HTTPServer and one coming in from JMS server) would
both ultimately resolve to the same representation and hence can serve a
single cached representation to both. (If you know the referential
integrity constraints necessary to implement memoization in a function,
then you'll understand that NK is dynamically computing extrinsic
referential integrity across the whole system and so can eliminate
unnecessary cache entries and consequent management/GC overhead).

The example of two independent transport might seem a little contrived
but just consider that any time you make an import of an address space
in more than one location (this happens a lot in applications with
service libraries etc) then NK will be able to see that they are
equivalent and any internal or even composite resources that are used in
the application will only incurr a single cache entry.  This is a very
powerful and efficient optimization.

Turn on the visualizer and look at any representation that is marked as
"(from cache)". You'll see that the yellow marked scope is the spacial
scope that was significant (referentially significant) in order to
decide if this could be served from cache.  Incidentally we call this
whole process the "High water mark" algorithm since you start to observe
a "tide line" of significant scope in application resource sets.

OK. Now for the subtle implications of this for HTTP applications.  If
you request something from the httpRequest: space then anything you
create in your layer of the app will depend on that resource *and* the
continuing validity of the spacial context.   But we just said that the
HTTPBridge's injected space is transient.  Your resource will have a
very short cache lifetime if it depends on having touched the
httpRequest: space. Catch-22!

Actually in non-cacheable HTTP methods like PUT, POST, DELETE, PATCH etc
then this has no consequences.  In fact NK will still be caching
internally generated stuff within the lifetime of handling the http
request so it still optimizes.  If the HTTP method is GET with no query
parameters then you're going to get full caching with no consequences
either.

But what if you have query parameters?  They're surely just opaque parts
of an identifier and ought to be fine as the primary key of a cache
entry. But if you request httpRequest:/param/x then you've touched the
HTTPBridge space and so your representation will expire as soon as the
space goes away. Doh! How dumb is that!

So this is where some of the recent discussion has been.  Hopefully you
can see that by including the spacial context into the cache algorithm
we get an overall system optimization that is very significant.  For the
functional programmers its generalized systemic memoization.

OK.  So why doesn't the HTTPBridge just relay through the query
parameters on the rewritten res:/ request?  So for example..

http://x.com/path?x=y

Might internally become...

res:/path?x=y

This is perfectly valid.  NK wouldn't care.  All you'd need to do is
ensure your grammars match and you're all set.  Caching will be fine.

There are several HTTPBridge design considerations that I need to share.

Firstly we wanted the HTTPBridge in NK4 to be completely free of
configuration. In NK3 we'd seen how it could be quite messy to have a
modal pre-processing Bridge that relays different values from the HTTP
state according sets of different "zones".  Relaying parameters caused
brittleness and required all layers of the application to care about the
stuff being relayed down. With NK4's ever-present httpRequest: space
only the appropriate tier of the application needs to care to deal with it.

We also wanted applications to just be able to publish themselves to the
front-end fulcrum space without deployment needing any manual
configuration to the front-end.

Also we wanted the HTTPBridge to seamlessly allow higher order
application protocols to be layered on top - again with no
configuration.  Hence the NK4 soap server is just an overlay that can
sit below the HTTPBridge and which makes requests for the SOAP action,
message etc from the httpRequest: space and then issues a normalized
soap message request into the soap application space.

Finally, specifically to params, there are two reasons we don't just
relay the params on as URI params on the inner request.  Firstly this
then means that every handling endpoint has to internally parse and
process the request parameters (sort of going down the raw servlet path
again).  Secondly there is no requirement in HTTP to specify query
parameter order.

So imagine this...

res:/path?a=1&b=2

and

res:/path?b=2&a=1

Both equally valid identifiers but to the cache these are two different
indentifiers - there is no way it could ever work out that they are the
same (without introducing application specific code into the cache,
which is general purpose and will make us grumpy if you start saying
such things).

So what's the solution.  It surely is a use case that is not uncommon?

For sure, what you do is introduce another layer after the HTTPBridge.
An "inversion endpoint" that receives all requests (or at least matches
all the requests you know will have parameters on them) and issues a
normalized request to the application layer below...

What does this endpoint do?  In a groovy script

reqid=context.getThisRequest().getIdentifier()
params=context.source("httpRequest:/params", IHDSNode.class)
req=context.createRequest("normalized:"+reqid);
params.getNodes("/*").each{ node ->
   req.addArgument(node.getName(), node.getValue());
}
resp=context.issueRequestForResponse(req)
context.createResponseFrom(resp)

In detail: we get the identifier that was requested, source the
httpRquest:/params, create a new request like "normalized:res:/path" and
iteratively take each of the query parameters and make them arguments on
the sub-request.  We issue the sub-request and returns its response as
our response.

Consider our two previous external URL examples going through the layers...

HTTPServer        http:/x.com/path?a=1&b=2
HTTPBridge        res:/path
Inversion Endpoint   normalized:res:/path+a@1+b@2

So we have moved the non-normalized query parameters to normalized
active URI arguments.  One final tip, NKF always sorts arguments to
ensure that argument order is normalized.  So this also works...

HTTPServer        http:/x.com/path?b=2&a=1
HTTPBridge        res:/path
Inversion Endpoint   normalized:res:/path+a@1+b@2

So finally, the endpoint that provides normalized:res:/path+a@1+b@2
never has to touch the httpRequest: space. Its resource identifier is
100% opaque and all of its state is computed using layers below the
inversion point.  Therefore that is the high water mark and it will
always come from cache.

Cacheable, normalized and minimized cache footprint for the sake of 8
lines of code and a mapper layer in your application.

Hopefully you can see that sometimes things might on face value look
"dumb" but in fact we're always trying to strike an optimal engineering
balance.

--

OK another long story.  Why does it have longer-term implications?  Well
you'll now be seeing that scope and context are integral to an ROC
system.  They don't occur in the Web since its mono-scoped.  You'll also
see that being able to dynamically construct spaces has some very
important consequences and can provide elegant architectures.  So much
so that NK4 is using this already internally in several endpoints.

Our long term vision is that we will be able to provide a standard
module "space runtime".  That is an endpoint that receives a declarative
space "recipe" and constructs and animates this space on the fly.
Imagine what you'll be able to do when you can have processes create
spaces based on application state.

At present the pieces to do this are actually all in place.  As I said,
the HTTPBridge does this,  the dynamic import and the virtual host
endpoints actually internally generate something like module.xml and
"execute it".  However you can also see that this is shark infested
waters to hand over in raw form to end-users - its going to need
refinement and safety nets and error handling and debugging and toolsets
similar to the trace tools etc. (scopes and meters cf last week).

But hopefully it gives you a hint of what's possible even today and what
we're intent on delivering in the future.  It also indicates why
learning to understand the bearing of scope on caching in the "safe"
playground of the HTTPBridge will pay off big time with these future
systems with extensive use of transient dynamic spaces.
 new topic  post reply  To find out about new replies to this post as they occur
please subscribe to one of these feeds:
AtomRSS moderate 
© 2003-2006, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.