|
How about last modification time in resource's metadata? | 
Joined: 31-August-2007 Posts: 1 Location: Korea | Posted:
1-September-2007 07:01 Hi, I'm really impressed and excited by what is built into the NetKernel so far. Actually, I was building a very similar system in my spare time for the past 3 years. A week ago, I discovered NetKernel and the docs for ROC (through the Powered by Jetty page). I was first shocked how identical it was to my idea, but soon I was so excited to see how that idea could be extended, generalized and realized. I strongly believe ROC will change the landscape of the entire software industry in the near future. I read the detailed documentations and played with some of my XSLTs and Unix commands. I was going to quit building my underlying system, port my applications to NK and concentrate on them. But yesterday, I found one thing that really frustrates me. As an exercise, I created a time consuming image transformation(JPEG decoding, resizing, encoding) job on a URI and accessed it with my Firefox web browser. As expected, it wasn't very fast at the first time, but it seemed to never get faster. I tried modifying the default expiry time and also checked that NetKernel's cache is being hit, but the reload action remained to have a great latency (~3s). It seemed that there was no "Last-Modified" in the HTTP header, and the HTTP transport didn't support If-Modified-Since requests. Every time I reloaded the resource, HTTP 200 OK was coming with the long content, instead of a simple 304 Not Modified header. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25I believe Last-Modified and If-Modified-Since of HTTP is the key element of web performance. This must be same to every ROC system, since those two give the ability to poll the freshness of resources with an absolute metric, in contrast to expiry time which is a relative metric depending on a particular cache. Cache validity should be decided in two ways, by expiry time and the "recent modification time" of the resource or the most recent one of it's dependencies. I've looked into some codes of ICachelet and IURMeta and found that the entire cache model was designed to use only the expiry system. Many resources can provide their timestamp information, e.g. file:, ffcpl:, http: (with Last-Modified:). Since the dependency tree is already there, we only need a channel for timestamps to flow. I would really like to implement such channel and timestamp providers myself. But it would take quite a long time for me to understand the structure of NetKernel internals. I wish 1060 or someone familiar with the internals could implement this crucial feature and include in the next release. | Let's work out how to achieve this | 
Joined: 7-February-2005 Posts: 279 Location: between ROC and a hard place | Posted:
1-September-2007 18:26 Hi netj. It is really encouraging to hear that you have been working on very similar concepts and that you share our belief that ROC is coming of age. I think that now REST is becoming more fashionable and widely understood it will much easier for more people to make the transition to understand ROC.
I understand the concerns you raise regarding the caching so let me give you some background that I hope will help you make a decision to persevere with NetKernel.
NetKernel is abstracted away from HTTP and the specifics of REST over HTTP. This enables it to support a wider range of transport options. As such the internal middleware of NetKernel is quite neutral. There is always a difficultly when doing this of whether you go for a simple minimal subset of functionality and can't richly support anything. However I don't think this is the case with NetKernels approach. Certainly some of the specific technologies such as the HTTP Bridge may need more refinement but at this level everything is pluggable. Development of such technologies has always been driven by user requirements or contributions.
Internal caching in NetKernel makes the assumption that the cache is local to the processing. In fact this local caching allows us to cache intermediate processing steps and gain higher levels of cache hits that lead to some of NetKernels great performance characteristics. When caching is local, validity can be driven a single method:
boolean isExpired()
which can be implemented in any way. So for example by checking file timestamps of by expiring at a fixed time or more sophisticated patterns such as the golden thread (http://docs.1060.org/docs/3.0.0/book/architectguide/doc_tutorial_patterns_golden_thread.html) The pessimistic expiry time value which you see on the internal metadata is simply an internal optimisation to avoid needing to call isExpired(). This has similar characteristics to the HTTP Expiry header and hence the HTTP Bridge does use this value to set the Expiry header.
A possibly useful pointer may include the fact that you attach arbitrary HTTP headers on the response using the HTTPResponseCode accessor: http://docs.1060.org/docs/3.0.0/book/developerreference/doc_ura_HTTPResponseCode.html
One other development that you may not be aware of is that there is new fully re-architected NetKernel 4.0 platform which we plan to start an alpha programme with later this year. This includes (amongst many, many other things that we are very excited to start talking about soon) an extensible application metadata model which would make the channels you talk about much easier to implement.
So we should talk about how to support the 304 Not Modified, Last-Modified and If-Modified-Since headers. Improving the HTTP support within NetKernel is something that we see as very important. We could work together towards this or I could offer you our collective experience and knowledge of NetKernel.
Cheers, Tony | Work together |
Joined: 10-July-2007 Posts: 6 Location: Prague | Posted:
6-September-2007 15:48 It's the same here... I am quite excited about NetKernel and I thought I'd quickly migrate to it. Nevertheless, some things that I expect to be solved in no time take me loads of time (because of things missing or not being too elegant). I've been working on very similar ideas and came to many similar solutions and conclusions. I am ready to cooperate and help. What is your development process? Is there access to some NK development repository (i.e. svn)? Pete | Re: Work Together | 
Joined: 7-February-2005 Posts: 446 Location: UK | Posted:
7-September-2007 09:13 Hi netj/Peter, Your comments and excitement are really appreciated. The fact is we started working on the ideas behind NetKernel over 8 years ago, intially in HP and then as a startup. If you think about it we started working on understanding REST and implementing a logical URI addressing model for software 2 years before Fielding's REST thesis was published. For the first few years of 1060 almost nobody understood what we were talking about. REST was, by then, brand new but then to extrapolate REST into the internals of software architecture was a step too far for many. Fortunately a few did understand and have built some really cool commercial systems on NK which has meant that we have been able to organically grow a successful software business. Our philosophy has always been about community participation and sharing the code - in fact Resource oriented thinking almost makes it inevitable that code has to be shared - code is a resource that can be processed just like anything else. NetKernel comes with the complete source code - each module contains its code and to build it all you need to do is link against the kernel and possibly the module imports that are referenced in module.xml. However, that said, I have to put my hand up. For a very long time we have intended to move the CVS repository to a public server - the truth is that some of it goes back a long long time and needs to be cleaned up - there is confidential customer code and in house applications - one of the limitations of CVS is that it is a single monolithic model. In the last few years we have been victims of our success - the time to attend to this sort of housekeeping gets swallowed up! But, we are currently working on a 3.3 refresh to NetKernel. Your enthusiasm is the spur to get us to do something about this. So I have created a new forum for NetKernel Developer discussion. Notice that I've also moved this topic to that forum. We have also started a public alpha build program - there is a topic providing links to the latest build and changelog here. Last but not least in the next week or so we will move the NK3.3 codebase onto a public SVN server. Needless to say - we'd love to hear what parts of NK can be improved. As for netj's very good points, I will post another reply in a minute about how we can realise HTTP If-Modified-Since. Its great to know that what we've been working on for so long is exciting people. As Tony hints above - whilst we are preparing a refresh of 3.3, some incredibly exciting things are happening with the 4.0 development - we have been working on a completely new and general NK abstraction for 3 years now! It has some very significant breakthroughs which we can't wait to share - unfortunately, for technical reasons we can't reveal them just yet but we plan to have a preview release before the end of the year. Finally, welcome to the NK community! Peter | If-Modified-Since | 
Joined: 7-February-2005 Posts: 446 Location: UK | Posted:
7-September-2007 10:14 Hi netj - you make a very good point. In HTTP the transfer cost of sending a resource representation to a GET request can be eliminated by maintaining a creation time for the requested resource and not serving it if the requestor already has a valid representation. In the simplest web server - where a directory is mapped to a logical URL path - then the server can observe the timestamp of the file and respond appropriately to the If-Modified-Since header received from the client.
It would be relatively simple to add a timestamp record into the metadata associated with the internal Representation within NetKernel. However NetKernel has been implemented to be a generalized computing abstraction. We have also been very vigilent to maintain a minmal abstraction at the core of the system and implement higher order requirements in the application layer. For example the NetKernel Foundation API is implemented in the layer1 module and has no footprint in the kernel.
When it comes to metadata, for optimal efficiency, we aim for the minimum possible to implement the NK abstraction. So why is there no timestamp field in the IURMeta interface? Because *within* the logically linked "web" of software inside NK there is no transfer cost for a resource representation. Whenever a request is made if the resource is cached it is returned immediately with a O(log) time lookup. So internally within NK there is never a need to have an If-Modified-Since construct as with HTTP. The lookup time is always less than any additional metadata processing could possibly yield. Ultimately this means NK is very very efficient.
Now the question of supporting if modified since is better thought of as an application level problem. To NK HTTP is an application level concern (it is, after all an application protocol). So to think about this I need to explain the design philosophy of the HTTP transport.
A transport in NK is really an event handler - an external event triggers some code to execute in the transport. The transport must then interpret that event and construct a logical resource request and issue this into the NK address space. This design means that NK is completely general and the resource oriented approach to software development can be combined with any appropriate external system - whether it is an application protocol, a message bus architecture or a GUI.
Now the HTTP transport is very important because it is probably the most important network application protocol. It is implemented in two parts. The HTTPTransport is based on Jetty. It is declaratively configurable to set up one ore more cleanly isolated HTTP server. When an HTTP request arrives at the server it is processed by the the Jettty HTTP Handler chain. This chain can be configured by the user - so for example you can put in access control handlers etc. Finally the NetKernel HTTPTransport handler receives the request.
The HTTPTransport handler constructs a HTTPRequestResponse representation containing references to the underlying Jetty Request / Response objects. These give you complete control of every aspect of the HTTP request - from URL, headers, cookies etc etc. The Response object provides complete control to write back the appropriate HTTP response and return a representation.
The transport takes the URL of the request and simply reissues this into the NetKernel address space and associated with the request is the HTTPRequestResponse aspect - so the HTTP I/O actually gets passed into the calling application as a resource.
Now HTTP is very successful but it is not necessarily the cleanest model that could have been devised. There is a dispersion of information between the URL, the headers, the body resource, the cookies etc etc. Internally inside NK we like to be tidy. We like to express everything as a URI addressable resource - typically we use the active URI scheme for this. So for NK we like to make everything URI addressable. Having bits and pieces locked inside the Jetty HTTPRequest/Response objects is not Resource Oriented. So we provide another technology called the HTTPBridge.
By default our HTTP fulcrums always send HTTP requests to the HTTPBridge for processing.
The job of the HTTPBridge is to marshall the dispersed information in the HTTP request and turn it into a uniform URI addressable request for the NK address sapce to process. The Bridge takes a declarative configuration that allows any number of different processing models to be applied in different 'zones' (ie URL paths).
The bridge is designed to be an 80% case tool. Most of the time it provides a lot of convenience and suits most applications most of the time. However if you are a power HTTP user and want to get to the low levels then you can simply remove the bridge and your application will receive the HTTPRequestResponse - at which point you have the same level of control as any other API based approach.
Now that's the background. So what about this metadata for If-Modified-Since. Well we can easily implement this - either in the HTTPBridge or as a seperate tool. The implementation would be as follows. The URL of a GET request would be used as the primary key in a map. When the request is issued into the NK address space a resource representation response will be received. The hashcode of the response representation would be placed in the map along with a timestamp. Next time a request comes in (GET with If-Modified-Since header) the bridge would still issue the request into NK (remember if the resource is cached inside NK then the lookup time is log and less than processing additional internal metadata). The response will then receive the representation (as a Java reference - no data has to move inside NK, its not like HTTP). The bridge can then lookup the URL in the local state map and determine if the previously sent representation and the current valid representation are the same. If they are then it can respond with a 304 response and eliminate the HTTP transfer cost. The nice thing about this is that the representation coming from the NK address space can be fully dynamically generated and the system will take care of the cacheabiltiy etc. All the bridge needs to care about is managing the very small amount of state for URL to hash code for each resource in the web application/service.
We can implement this very easily in the HTTP bridge. No need for any additional metadata in the system. I'd be very happy to hear your ideas about what would be the most convenient way to implement this - either in the existing HTTPBridge or as a seperate "mapper" pattern accessor?
As Tony mentioned above, in NK4.0 all gloves are off. There is a completely user extensbile metadata model (ie we make metadata a part of the application level domain) and so all sorts of higher order application patterns are possible. However, again the philosophy is to have the kernel be as minimal as possible and only concerned with implementing the general abstraction. We maintain that this philosophy yields the highest performance most efficient system.
Sorry for such a long story! Does this make sense?
Cheers,
Peter | conditional get |
Joined: 14-March-2005 Posts: 73 Location: Amsterdam, The Netherlands | Posted:
7-September-2007 10:17 Hi Tony/netj, After reading the RESTful Web Services book I have also been thinking about implementing conditional get in NK. Unfortunately I didn't find the time to implement them, but maybe I can sketch my idea here. The first step would be a wrapper, along the lines of the gatekeeper and the sessionmapper. <rule> <match>(.*)</match> <to>active:conditional-get+uri@$e1</to> </rule> |
The conditional-get wrapper would do the following: (1) request the uri using the new get accessor (see below) (2) from the result compare the Last-Modified and eTag headers with the one from the request (3) return 304 Not Modified when they are the same (4) return 200 (if I remember correctly) and the result when they are (or at least one is) different The get accessor would do the following: (1) request the uri (2) set the Last-Modified and eTag headers To have this properly working the get accessors result should be cachable, but that will depend on the uri. If the uri is cachable, the get result will be cachable and the conditional-get will work. If the uri isn't cachable, the get result will not be cached and will be regenerated each time the conditional-get wrapper makes its request, i.e. a new Last-Modified and eTag will be attached to the result and the conditional-get wrapper will return the new result. Notice that the get request will also make sense without the conditional-get wrapper, so you may push all your HTTP endpoints through it to attach these HTTP headers. (see also my BXML script to retrieve the last modified date of a static resource in this topic http://www.1060.org/forum/topic/159) Does this make sense? I'm currently in the last phase of a project, so I don't have the time to actually work on this now. But in October I should have some time to really try to implement this or an alternative scheme we may come up with. Menzo |  |
Joined: 14-March-2005 Posts: 73 Location: Amsterdam, The Netherlands | Posted:
7-September-2007 10:32 Hi Peter,
One moment a topic is quiet and the next moment two people around the globe are sketching the same idea around the same time, although on different application levels ;-)
Greetings from Amsterdam,
Menzo |  | 
Joined: 7-February-2005 Posts: 446 Location: UK | Posted:
7-September-2007 10:34 Hi Menzo,
Great minds think alike ;-)
Greetings from Chipping Sodbury!
P. | Serverside ETag / 304 Support | 
Joined: 7-February-2005 Posts: 446 Location: UK | Posted:
21-September-2007 19:38 We just posted a new alpha build (26) of NetKernel 3.3. It incorporates an updated HTTPBridge which implements ETag and IfModifiedSince, IfNoneMatch. It uses the strategy I outlined above - there is no need to add time-based interpretation, the cached representation within NK is uniquely identified by its hashcode and we simply send a hex string version of this for the ETag. The browser sends this back when re-requesting the same URI and before sending the response we do a O(log) quick lookup to see if the resource we are about to send is the same as on the client side. If it is then we send 304. The NK abstraction means that there is no cost to the implementation - the inbound process is exactly the same - if we hit the cache then we will receive the same resource as we sent - if we don't then the ETag will not match the rep hashcode and we will send a 200 response. The complete implementation is 12 lines of code and is completely transparent to the application developer - it just happens for free. As was first pointed out by netj, this is a very good optimization for HTTP and I think you'll find the impact is dramatic on the NK backend tools. I think much of this performance is not actually network transfer cost savings - more that the browser rendering engine is able to reduce or entirely eliminate reflow of the page structure when it has already been internally generated. Please give it a try and let me know your thoughts - the next question is should we implement a corresponding resource cache for the NK client side HTTP accessors? Something Menzo was suggesting. Obviously using a client side caching proxy server such as squid could do the job - but do we want to make the HTTP tools have a full-blown browser style representation cache? Again I suspect this is pretty trivial to implement but will require a little bit of thought into the production implications of the cache management. Please let me know how you get on, Peter |
|
|
|