Topic - Best design strategy for HTTP per-client caching of representations?
Topic - Best design strategy for HTTP per-client caching of representations? Topic - Best design strategy for HTTP per-client caching of representations?
from forum General Support
 forum index   my profile   search 
 new topic  post reply 
moderators: pjr tab
Best design strategy for HTTP per-client caching of representations?
Joined: 6-March-2007
Posts: 13
Posted: 7-January-2008 18:43
I'm trying to work out the best way to achieve efficient representation caching for HTTP web applications.

I may be misunderstanding something fundamental, and if so please correct me, but it seems as if in 'personalised' web applications the advantages of netkernel caching will be mostly lost.

The scenario I'm considering is multiple HTTP clients accessing a personalised HTTP web application. Each client maintains a session. That, along with other parameters, is encoded by the HTTPBridge into the Cookie and param for the target resources.

The documentation says:
If arguments used to compute a resource are passed by value, then the result will not be cached. For example, when a computed value is placed into a variable in DPML and passed to another DPML script the result of the DPML script cannot be cached. Equally if a query is passed in from the HTTP transport and gets converted into a XML parameter by the HTTPBridge then this result will be transient.


I was looking for a way to have individual client's representations cached, such that clients A B C could have cached representations that they may repeatedly call upon without recalculation (handy for repeatedly serving common fragments of portal pages).

If I was to do it via adding the client ID as a value to the URI then it won't be cached. The XML parameters supplied by the HTTPBridge are transient. The suggestion is to use the data: schema but I've been unable to find any examples of best-use of that.

Right now it seems something like this would do it:

active:module+module@urn/my/module/accessor+resource@ffcpl:/my/unique/resource.xml+data@data:text/plain,theClientID


(Note: If the plus signs are missing in the URI above, as they were in the Preview, it's a problem with the post editor removing them for some reason and it won't accept the encoded form ! either!)
Caching Strategies
Joined: 7-February-2005
Posts: 397
Location: UK
Posted: 9-January-2008 10:16
Hi TJ,

You are correct that pages that are generated for a specific user must be either uncached or identified with a URI unique to that user/session.

However this doesn't mean you can't get a lot of value from caching common resources.  Generally these resources are provided from lower layers of the application - they then can be cached at this layer boundary.  Your user specific layer can then compose the final user-specific resource from these.

I don't know if you've looked at XRL.  It considers the rendered web-page as really a 'plan view' of the true XHTML tree structure.  XRL allows each node of the tree to be a URI identified resource and the composite tree is recursively generated in one pass.  In this approach many of the branches of the XHTML tree can be treated as common resources from an appropriate layer and will be cached.  Only the final top-level composition is user-specific.

In addition you can also use this composite approach to create a common page structure and then use a decorator-like pattern to overlay the user-specific data.  For example you can use the sed accessor to decorate the seriailzed XHTML tree and substitute in a declarative user-specific set of data into the structure just before it goes over the wire (kind of like PHP,JSP, ASP tags).

I hope these suggestions give you some ideas for your application.  Let me know if you want to go into the details.

If you want a detailed examination of this layered (and channeled) approach to composite web development please take a look at this architects guide document that goes into a detailed deconstruction of this forum application.

Cheers

Peter

PS One last thing - you are using a very handy pattern in your example above - converting a pass-by-value resource into a data: URI can be  very good way to decouple a user-specific layer from a lower cacheable resource layer.
Joined: 6-March-2007
Posts: 13
Posted: 9-January-2008 11:47
Thanks once again, Peter.

Apologies for my string of questions but I need to be sure I fully understand the implications of design decisions and methodologies on performance. NetKernel is the first technology in a long time that has really excited me as a breath of fresh air - I think the last one was Java itself:)

The scenario I'm designing for is where the fragments making up the user-requested (aggregate) document are themselves personalised to the user and are frequently requested.

An off-the-top-of-my-head example:

A 'social networking' site portal page after the subscriber has authenticated. The  user can choose which components appear on their page (think Portlets) and where they are positioned (think iGoolge / Google Web Toolkit).

Now, with a fully capable browser with Javascript enabled the updating of the portlets is delegated to the client Javascript libraries, that in turn request individual portlet content from the server (as XML or JSON).

In other instances where Javascript is disabled or unavailable then the server will render each entire portal view update and return it as XHTML.

Within the portal view let's assume there are ten portlets.

Five of them return generic site content that is invariant across users. To achieve optimal netkernel caching for these the requests are keyed on the URI, with a golden thread that invalidates the cached representations when underlying data changes.

The other five portlets return information that is user-specific. As an example I can think of an account status portlet that reports statistics such as number of new messages in Inbox, storage space used, number of pending link requests, etc. This is frequently re-requested by the client.

For the status portlet the accessor that provides the resource has to make potentially expensive calls to a variety of other services on remote hosts as well as local sub-requests, so the aim is to ensure the representation is cached.

To achieve optimum performance the application has to ensure the URI is cacheable.

The web-application can either use a golden thread or a periodic cron job that batch-updates the statistics for all connected users (cron jobs are better for performance as they can be synchronised with jobs on remote servers so updates don't impose ad-hoc load).

The earlier example URI I gave using data: is a very simple one. In practice the session has to be taken into consideration, and possibly some QueryString parameters.

I've looked at session: handling, especially related to HTTP. The documentation suggests that the session ID is passed-by-value on the URI:

active:dpml+operand@ffcpl:/mypath/myprocess.idoc+session@session1234567890123456


and also says

The sessionmapper requires that your HTTPBridge be configured to pass cookies


As I understand it both of these will prevent the representation being cached, so I'm looking for the best way to achieve caching of the representation whilst still being able to pass sufficient context.

For the session ID I'm thinking the session mapper functionality could be amended to use a data: URI:

active:dpml+operand@ffcpl:/mypath/myprocess.idoc+data@data:text/plain,theClientID+session@data:text/plain,SessionID


If I'm lucky, the fact that the Session Mapper is receiving the cookie from HTTPBridge and sinking it means that the URI I've proposed immediately above would be sufficient for the endpoint accessor to access keys in the cookie it needs to create those personal representations, and the representation to still benefit from caching.

The representation will be cached whether it is a sub-request of the aggregate portal view accessor or a direct (mapped) request from the client Javascript libraries.

If I'm unlucky there's a lot of work to do reworking the parameter handling!
Joined: 7-February-2005
Posts: 397
Location: UK
Posted: 9-January-2008 12:19
I think what you are proposing is quite achievable.  One thing that will definitely save you work is that the session is conceptually a user-specific resource address space.  The doc is missing a colon but is otherwise correct

"Finally the session mapper reissues the original request with the session URI attached.

active:dpml+operand@ffcpl:/mypath/myprocess.idoc+session@session:1234567890123456"

The sessionmapper passes its session address space by reference - it provides the session space URI session:xxxxxx  you can sink and source to this address space.

Cheers,

Peter

PS Thanks for the kind words - its great to hear this.  We were pathologically obsessed in designing NK to be clean and simple - our philosophy is that solutions arise from putting simple things together - just like on Unix - emergent complexity.
Joined: 6-March-2007
Posts: 13
Posted: 9-January-2008 15:07
Aha! So, I understand your inference to be:

"Once a cookie is being handled by the Session Mapper an HTTP request with cookie attached coming via the HTTPBridge is cacheable."

So the missing colon in the documentation was probably the worst typo that could have happened since it makes the example suggest a pass-by-value parameter named  "session" with a value of "session1234567890123456" whereas it should be a pass-by-reference parameter named "session" with a URI of session:{GUID} where the GUID is "1234567890123456".

Knowing that all URL resources are inherently cacheable it would be obvious a URI that includes a session URI is also cacheable.

I'll put my last question on this topic in a separate post to prevent confusion.
Altering output of HTTPBridge?
Joined: 6-March-2007
Posts: 13
Posted: 9-January-2008 15:56
The final questions relate to handling of HTTP parameters and combining the session and parameter passing.

1) If the application needs to pass certain HTTP GET QueryString parameters to the accessor and still allow caching, is there an easy way to do this (yes/no) ?

2) I saw that the examples that show:
<HTTPBridgeConfig>
  <defaultExceptionURI />
  <!-- TestZone -->
  <zone>
    <match>.*/test/.*</match>
    <processQueries />
    <passMethod />
    <passCookies />
    <passHeaders>User-Agent</passHeaders>
    <exceptionURI>ffcpl:/test/exception.idoc</exceptionURI>
  </zone>

I'm wondering if the <passHeaders>User-Agent</passHeaders> is a clue that with a URL of the form "http://host/path?OnBehalfOf=TJ" we can also do something like <processQueries>OnBehalfOf</processQueries> (how would one specify more than one key name) ?

3) I know the HTTPBridge puts the various GET and POST parameters in a Name-Value Pair (NVP) resource which prevents caching of the request URI.

Is there a way to use some form of configuration setting to tell the HTTPBridge to put those parameters as data: in the URI when in normal mode, as the SOAP documentation shows it has the passByURI option for just that purpose ?

4)Alternatively, is the way to do this to redirect the HTTPBridge/SessionMapper output to an accessor that extracts the NVPs and puts the required pairs as data: in sub-request URIs ?

5) and finally...

The documentation shows isolated examples of module HTTPBridge configuration
<module>
  <rewrite>
    <!-- Route HTTP Transport issued root requests to the HTTP Bridge -->
    <rule>
      <match>(http.*?:.*)</match>
      <to>active:http-bridge+url@$e1+config@ffcpl:/etc/HTTPBridgeConfig.xml</to>
    </rule>
  </rewrite>
  <mapping>
    <!-- Rewrite HTTP Bridge sub-requests in the jetty: scheme to the ffcpl: scheme -->
    <rewrite>
      <match>jetty://.*?/(.*)</match>
      <to>ffcpl:/$1</to>
    </rewrite>


and a required module rewrite rule to enable Sessions:
To use the automatic session facility a module maps all requests to the session mapper using a simple URI rewrite rule and provides the mapper with configuration information defining the various zones. Note: in the rewrite rule the $e1 is important as it indicates that the URI to be mapped to a session is escaped.
<rule>
  <match>(.*)</match>
  <to>active:sessionmapper+uri@$e1</to>
</rule>

With this rewrite mapping rule in place, the session mapper will receive all requests and use its configuration information to match the requests against defined session zones.


As things stand the two examples appear to be mutually exclusive.

Is there an example of the module rules with HTTPBridge and Session rules combined so that the root HTTP request is passed to HTTPBridge, which then issues a jetty: request that is rewritten to sessionmapper: ?
Joined: 6-March-2007
Posts: 13
Posted: 9-January-2008 16:06
I just noticed I misread the HTTPBridge configuration documentation. It does in fact say that passByURI is only available in normal mode since:
SOAPMode      Specifies that all requests in this zone are processed using SOAP HTTP Bindings. See below  for details. All other configuration elements are ignored in this mode

So the only confirmation needed is whether passByURIL can be applied to processQueries as well as passMethod, passHeaders and passRemoteHost (I'm guessing not) ?
Answering my own questions!
Joined: 6-March-2007
Posts: 13
Posted: 9-January-2008 17:01
Doh - I think I need to learn to read! I can answer my own question 2b "how would one specify more than one key name" from the documentation:

passHeaders       Create an argument for each HTTP header listed in a space separated list of headers. The listed header is used as the argument name. (see passByURI for mode).
The configuration for HTTPBridge with sessions and authentication?
Joined: 6-March-2007
Posts: 13
Posted: 9-January-2008 22:10
I wonder if this is close to what would be required to answer question 5, the combined configuration for using HTTPBridge and sessions (and authentication of the root) ?

/etc/HTTPBridgeConfig.xml
<HTTPBridgeConfig>
<zone>
  <match>.*</match>
  <processQueries/>
  <passCookies/>
  <passRequestURL/>
  <passMethod/>
  <passByURI/>
  <exceptionURI>ffcpl:/error</exceptionURI>
</zone>
</HTTPBridgeConfig>

/module.xml
<module>
<export>
  <uri>
   <match>ffcpl:/.*</match>
   <match>ffcpl:/etc/HTTPBridgeConfig.xml</match>
  </uri>
</export>
<rewrite>
  <rule>
   <match>(active:.*)</match>
   <to>active:gk+uri@$1</to>
  </rule>
  <rule>
   <match>(active:gk.*)</match>
   <to>active:sessionmapper+uri@$e1</to>
  </rule>
</rewrite>
<mapping>
  <this>
   <match>ffcpl:/etc/(SessionPolicy|GateKeeperPolicy|HTTPBridgeConfig)\.xml</match>
  </this>
  <ura>
   <match>ffcpl:/links.xml</match>
   <class>my.org.component.links</ura>
  </ura>
</mapping>
</module>

/etc/SessionPolicy.xml
<SessionPolicy>
<zone>
  <match>.*</match>
  <path>/</path>
  <token>cookie</token>
  <type>
   <simple/>
  </type>
</zone>
<meta>
  <id>web-app-name</id>
  <max-sessions>2000</max-sessions>
  <max-duration>500000</max-duration>
  <min-duration>50000</min-duration>
</meta>
</SessionPolicy>

/etc/GateKeeperPolicy.xml
<GateKeeperPolicy>
<zone>
  <match>.*</match>
  <isValidURI>active:dpml+operand@ffcpl:/check_session</isValidURI>
  <loginURI>active:mapper+operand@ffcpl:/do_login+operator@ffcpl:/links.xml</loginURI>
</zone>
<zone>
</GateKeeperPolicy>

And /links2.xml file including:
<links basepath="ffcpl:/">
<link>
  <name>portal</name>
  <ext>/</ext>
  <int>ffcpl:/portal</int>
  <args>links,param,session</args>
</link>
</links>
 new topic  post reply  To find out about new replies to this post as they occur
please subscribe to one of these feeds:
AtomRSS moderate 
© 2003-2006, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.