Topic - NetKernel News Volume 1 Issue 24
Topic - NetKernel News Volume 1 Issue 24 Topic - NetKernel News Volume 1 Issue 24
from forum News
 forum index   my profile   search 
 new topic  post reply 
moderators: pjr tab
NetKernel News Volume 1 Issue 24
Joined: 7-February-2005
Posts: 591
Location: UK
Posted: 23-April-2010 15:26
[Sent to members of the NetKernel Newsletter April 16th 2010 - to subscribe join the NetKernel Portal https://cs.1060research.com/csp/]

What's new this week?  NetKernel Standard/Enterprise Edition 4.1.0.  Si
Valley ROC talk. NetKernel Protocol status check. Do it yourself twitter
infrastructure.

NetKernel 4.1.0
===============

We're pleased to announce that two new distributions of NetKernel are
available...

NetKernel Standard Edition 4.1.0 is available here:
http://download.netkernel.org/nkse/

NetKernel Enterprise Edition 4.1.0 preview 4 is available here:
https://cs.1060research.com/csp/download/  (registration required)

Repository Status
-----------------

We will maintain the NKSE 4.0.x repository with significant bug fixes
and/or critical security updates for the next six months.

We recommend that for development and to stay abreast of the latest
updates and new features that you transition to use a 4.1.x distribution.

Common Updates
--------------

Both NKSE and NKEE 4.1.0 incorporate all of the last 6 months of package
updates and reset the baseline with their own 4.1.0 repositories.

The distributions are backwards compatible and your applications can be
deployed straight to a 4.1.0 distro.

H2 Migration
------------

One potential area to take note for application compatibility is the H2
database.

The H2 embedded database is used in the NK system in a few places.
Notably it's the default persistence behind pds: and is also the engine
used for the apposite client.  Over the last year H2 has been
transitioning to the 1.2.x series which is based on their new "Page
Store" filestructure offering better integrity and performance.  Recent
releases have even removed support for their older 1.1.x format entirely.

Since this component plays an important role in the system, we want to
ensure we can track ongoing fixes and enhancements. It has therefore
made sense for the NK 4.1.0 builds to update to 1.2.x and its Page Store
format. Therefore you'll find that the h2.db module ships the very
latest 1.2.133 build of H2.

  From the regular NK system point of view this is not news and you
won't see any difference.  However, if you've used H2 as your DB engine
for your applications developed on NKSE 4.0.x you'll probably want to
convert them to the new page store format so that you can seamlessly
keep working with the NKSE 4.1.0 db packages. Here's what you need to do...

1. Download a copy of H2 1.2.128 (this is their last build that
supported the old format)...

http://code.google.com/p/h2database/downloads/list

2. Temporarily place the 1.2.128 jar in the urn.org.netkernel.h2.db
module's lib/ directory and rename 1.2.133 from jar to some other
extension (so its not in the classpath).
3. Change your application's JDBC URL connection and add the following
at the end ";PAGE_STORE=true"  (note semicolon).
4. Run your app as normal.  The first time that H2 tries to connect to
your DB it will automatically convert it to the new format. (It takes a
backup of the older format files in a zip if needed).  You can tell the
new format as it has the file extension ".h2.db".
5. You're now done.  You can revert out the 1.2.128 version from the
db.h2 module and use the latest version we ship.

If you've got business critical data and need any help with this process
just let us know and we'll help out.

NetKernel Enterprise Edition 4.1.0 Preview 4
============================================

We are still in preview mode with Enterprise Edition, but rapidly
closing in on the completed feature set. This preview 4 release includes
the common NKSE 4.1.0 infrastructure.  It also includes the latest
builds of the enhanced NKEE tools and libraries.  This release features
a new nkee-architecture module...

nkee-architecture:  Is a library of Enterprise architectural components
that provide control and ease-of-implementation of a number of advanced
architectural patterns.  Included in this first release is the Profile
Overlay, which can be used to transparently wrap a space and provides
runtime statistical profiling of all requests into the space.  Also
included is a Virtual Host Endpoint, which allows hostname-based
routing into isolated application address spaces enabling full virtual
hosting on NKEE.  The endpoint is general purpose and could be used to
provide arbitrary grammar based routing into isolated spaces, not just
hostname routing.

A number of other NKEE architectural components are in the works and
will be released via nkee-architecture package updates in the coming weeks.


Si Valley / Bay Area NetKernel Talk/Meet-up
===========================================

I'm going to be in Silicon Valley / SFO Bay Area from the 16th May
(volcanoes permitting).  I'd like to use the opportunity to give a
NetKernel/ROC talk.  Anyone have any ideas for locations or, even
better, could help host ;-)    Thinking this would be the evening of
either Monday 17th or Tuesday 18th May.  If you know anyone in the area,
please put the word out.

Would also be happy to go out for beers!

NKP Status Report
=================

The NetKernel protocol (NKP) development is moving on nicely. The
client-side now presents the logical endpoints from the server side.
That is, the set of endpoints present in the host fulcrum space of the
server side (downstream NetKernel) are logically mapped over the wire
into the address space on the client-side (upstream NetKernel).

So a request on the client side can be resolved locally (using the
remote side's grammars), before being shipped over the wire if it is
resolvable - this is very efficient since it means that network cost is
only incurred when you know that the resource will be reified (computed)
by the server.  So both network and computation costs are deferred to
just in time.

This starts to show a little of how NKP offers extra dimensions beyond
HTTP/REST.  In HTTP, client-side resolution only goes as far as the DNS
stage and the URL->resource resolution is done server side.  The reason
being that in HTTP/REST, resolution is not a part of the abstraction and
so there is no resource that offers resolution metadata.  In NetKernel
(locally) and NKP (distributed), the resolution phase of the resource
oriented system is itself resource oriented, and treats metadata as a
resource too!

You might worry that NKP's client-side resolution implies that the
client-side requires pre-knowledge of all possible sets of resources,
but that's not the case.  NKP means you can take appropriate engineering
decisions about pre- or post- resolution/reification. So if you wish you
can fallback to the identical pattern as HTTP, where the client-side
would fulfil the same role as DNS and only decide which cluster node to
direct a request to. These choices of system balance become an
application-level architectural variable you have control of.

As far as work remaining on NKP, there are still some low-level metadata
consistency hooks to introduce, but to first-order the current state of
development implements a practical 90% coverage and could already
be used to implement many distributed architectures.

DIY Twitter
===========

The NKP work got me thinking.  What would be a good way to demonstrate
the technology?

As you might have noticed by the increased content in this newsletter,
I've recently forced myself, against my nature/cultural inclinations
(reserved Brit), to be more vocal.  To that end, I've been trying to
increase my twitter use (http://twitter.com/pjr1060).

Twitter is one of those things who's success is due to its innate
simplicity. Its resource model is easy for end-users to understand and
use.  The hidden complexity is in the engineering needed to ensure that
the backend scales to deliver the immediacy that each user demands.

As an exercise in ROC modelling and NKP architectural partitioning,
I thought I'd talk through what an NK implementation of the twitter
backend might look like [pjr comment: What follows is fairly long and
has been deliberately left as a stream of consciousness  - if the
details don't interest you, the thought processes still might be useful
as insight into the way I go about an ROC design]...

Firstly lets consider the resource model.  It consists of 3 core sets of
first-order resources:

(A) mytweets/[userid]  - the merged stream of [userid]'s tweets and all
of the tweets of [userid]'s friends.
(B) tweets/[userid] - the stream of tweets by [userid].
(C) friends/[userid] - the list of friends of [userid].

We can see, at least for the sake of our first consideration of the
model, that (B) and (C) are atomic resources, ie they are not composite
resources.  We also see that (A) is a composite resource which consists
of the transformation of (B) and (C) (the sorted superset of
(B)'s belonging to the (C)'s).

There are other first order resources, for example

(S) search/[identifier] - the set of all tweets matching [identifier].
(T) time/[range] - the set of all tweets within a given time [range].

For now it'll be apparent that the general architectural principles
concerned with the (A-C) will be applicable to these other sets.

OK, that's a rough outline of the read-side channels of the resource
model.  On the write side we have status updates and adding/removing
friends but lets not worry about defining these for now.

Lets step away from the resource model and think in terms of the
system engineering, computational cost, the quality-of-service
requirements etc etc.

What's our most expensive cost?  Probably the the persistence mechanism,
the database.  This is for a number of reasons. Firstly atomic
persistence demands consistency, which means its either hard, or
expensive, or both to distribute the stored data, so access to the
database is a natural bottleneck.  Secondly, high-end database vendors
have non-linear pricing models, they know that once you lay down data in
their DB engines, the bigger you get the more they make.  So for
performance and economic reasons our imperative for this architecture is
to find a function min([DB requests]).

What do we know that allows us to look for such a minimum use of the DB?
Well fortunately we know that all social networks have the property
that Read >> Write.  Think about a dinner party, if everyone spoke at
the same time (so that write>=read) there'd be chaos.  We also know that
your sent email folder is way way smaller than your received folders
(unless you're a spam bot).

These empirical observations suggest that we're onto a good thing. If
our server had infinite local memory we could envisage an ideal
solution that would require only one read of the db for every write.  Of
course we don't have that, so we need to work out a good dynamic
equilibrium by using the architectural variables that ROC gives us.

OK, lets get concrete and consider how we compute A(x),  where
"A(x)" is shorthand notation for the full resource: myteets/[x].

Lets create an endpoint with a grammar something like...

<grammar>res:/mytweets/<group id="userid"><regex
type="anything"/></group></grammar>

Which we use to bind requests to an endpoint, lets call the physical
endpoint code: EP-A

EP-A has to implement a merged view of all tweets of the user and
their friends.  Before we can implement EP-A we have to implement an
endpoint for the friends of userid C(x), and to cut a long story short,
we're going to need an endpoint that can give us any given user's tweets
B(x), let's quickly consider B(x)...

We'd create a single endpoint EP-B with a grammar something like...

<grammar>res:/tweets/<group id="userid"><regex
type="anything"/></group></grammar>

EP-B in pseudo code, and assuming our persistence is an RDBMS, does
something like this...

userid=request.getArgumentValue("userid") //Get the userid argument
sql="SELECT * FROM tweets WHERE userid='$userid' ORDER DESC LIMIT 10"
rep=doDBQuery(sql)
//associate a virtual dependency with the resource...
attachGoldenThread("twitterGT:"+userid);
return rep

Don't get hung up on the code - it really doesn't matter and for sure
we'd change this as we go on.  This is just an illustration, the
important thing is that EP-B reifies a resource representation for
B(userid), the set of tweets of [userid].  It is cacheable and remains
valid for as long as twitterGT:userid is valid.  So, in an ideal
perfect server, if [userid] never made another tweet we'd never have to
hit the DB again (read=write=1).  We'll come back to the golden thread
stuff.

OK lets assume we can do something very similar for C(userid), the
friends of [userid].  Lets also assume that C(userid) includes [userid],
ie you are your own friend.  So returning to EP-A we can now pseudo code
up the endpoint...

userid=request.getArgumentValue("userid")
friends=SOURCE C(userid)
rep=null
for friend in friends
(
   rep+=SOURCE B(friend.id) //Source tweets of friend
)
sort(rep)   //Sort in time descending order
filter(rep, N)   //Include only first N tweets.
return rep

So rep is the representation of A(userid).  Notice that it is cacheable
since it is identified uniquely, and it depends upon each of the
member resources in the set C(userid) because we have sourced each B(x).
[pjr comment: Bear in mind that almost all B(x) will be cache hits -
but see below for discussion of why this is a somewhat naive first model
and would really need a differential resource approach].

The result is that this A(userid) is now computed and will be repeatedly
served from cache for as long as C(userid) is unchanged and, because
B(x) has golden thread twitterGT:x, as long as all the twitterGT:X are
valid.  (We're still on target for our ideal limit of DB read=write).

Since twitter clients use polling its clear that this is going to be
very efficient, since in steady-state we don't have to compute anything,
just send the resource. Better yet, if the client is well implemented
and it sends an ETAG of their local cache, we can just send a 304 not
modified.

OK lets consider a change of state in the system.  Lets still imagine
that we have a perfect infinite server. Our user issues a new tweet by
issuing a state transfer request to this resource ...

(D) SINK status/[userid]

This is a write channel.  Its independent of any of the other channels
we've talked about so far.  So lets assume we have a grammar...

<grammar>res:/status/<group id="userid"><regex
type="anything"/></group></grammar>

...bound to some endpoint implementation: EP-D. In pseudo-code that
might look like...

userid=request.getArgumentValue("userid")
primary=context.sourcePrimary(String.class)   //Source the incoming
transferred state as a String.
sql=INSERT INTO tweets VALUES ('$userid', '$primary');
updateDB(sql);
cutGoldenThread("twitterGT:"+userid);
return null;

Again, don't worry about the code, its just detail.  The key thing is
that state is received, our resource persistence mechanism is updated.

Before we return, notice that we cut the golden thread twitterGT:userid.
With this step we atomically expire all dependent resources. So
A(userid) and B(userid) are both instantaneously out-of-date.  Anyone
requesting A(userid) will force the recomputation.  More importantly
anyone who is a friend "f" of [userid] will also see their A(f) expire
since it depends on B(userid) which depends on the golden thread we just
cut.

If you're following you'll see that we're still tracking our read=write
objective. Since the first re-request of A(userid) (or B(userid)) will
create a new cacheable representation depending on the golden thread.
So the update cost is only one database read which covers everyone in
the system who is a friend of [userid].  Even a super friend (SF) like
Stephen Fry (SF!) would need no more reads.

OK you should be able to see the general principles - you can probably
also see that to implement a simple twitter system in a single NK server
is going to be of the order of 100 lines of code.

But we don't have ideal servers.  Not least our engineering limit is
that the HTTP client/server protocol is going to exceed the NIC capacity
of a single server.   Or we're going to have so much state at any given
time that it exceeds one server's local memory capacity.  Or the next
dominant compute cost, A(x)'s merging and sorting the friend feeds, will
start to exceed the CPU capacity.

So how would we scale this architecture across a cluster?  We break the
potentially infinite sets A(x), B(x), C(x) down into subsets and place
those subsets on their own physical implementation.

So we might have a dozen identical front-end servers all of which
apparently serve A,B,C.  (we'd have a network load balancer in
front to do IP affinity / load balancing across this tier).

Each front end would no longer have an instance of the EP-A, EP-B, EP-C
endpoints, instead we'd have an NKP client connection to each of 3
implementing servers.

A-Server - serves user personal tweet stream A(x)
B-Server - serves B(x) tweets of x.
C-Server - serves C(x) friends of x.

If we wanted to, we could subdivide A-Servers into A("A-C"),
A("D-F")...A("W-Z").  Where each A would handle only userid's in the
first-lettered subset specified, and so on for B's and C's.  We could
have load-balancing front-ends for the A,B,C's too so that the front-end
servers only have to connect to 3 consistent places.

We could keep adding servers with ever finer subset responsibility, in
the limit you would have one server per user id!  Sounds mad, but
imagine instead of twitter we were subdividing data analysis for the
Higgs Boson and each resource representation took serious compute power.

OK you should be seeing the picture. Just one more thing to consider.
The tiered computed resources percolate up to the edge of the cluster.
Each one will be locally cached in its local server.  The NKP protocol
ensures that dependency relations are preserved upwards - so the
front-end servers will depend on the same logical golden thread
consistency.  But, slightly subtly, the golden threads for the A's, B's
and C's are not necessarily tied together because they were implemented
locally at the leaf nodes of the request tree.

We can tie this up by implementing a simple golden thread expiry write
channel on every server in the cluster.  A request to this channel would
simply indicate the identity (g) of the golden thread that is expired,
its implementation would then call cutGoldenThread(g) - all local state
on that cluster node depending on g would be atomically expired.

When a user tweet happens, all we'd need to do is call the
expiry channel with the userid for each node in the cluster responsible
for that [userid] subset (eg A("A-C") etc). We don't care if there is
any state there, or how it may be used in composite resources etc.
Effectively we can apply a very fine scalpel to expire the minimal
subset of dependent state in the system.

We could ensure that our physical update to the DB is transactional, so
that we can do the cluster expiry as lazily as we like, knowing that  as
the distributed state is expired, the DB is locked on our resource. Only
when we've finished notifying the cluster would we commit the DB update,
at which point any requests for our resource issued during the expiry
phase, would get access to the DB to start rebuilding the resource
representations.

Given that this is just a social network and any change propagation
within a minute is probably sufficient, we don't have to be super
transactional and can allow dirty reads.  So we don't need to lock the
database.  We can also use timed dependent expiries for all our
representations so that they are valid within a given last use period
or'd with their dependencies.

I've run out of stamina to explain the detail of how we might implement
the search resource sets, but basically you'd have a resource that
contained a list of all tweets matching a given search.  It would be
very very expensive to query the DB for searches in real time so you'd
probably maintain a secondary index system too.  However even accessing
this in realtime would get costly, so instead, each time a tweet status
update came in you'd request another resource containing the map of all
current search terms (this will be relatively quite small).  You'd split
the tweet down into its constituent words, if they were in the search
map you'd add this tweet to the search item in the DB and expire the
search's golden thread, next time the search is requested it will
regenerate the search resource since it's no longer cached.

There are lots of necessarily missing details in this discussion.  On
reflection, splitting the sets as described, illustrates the point
about clustered Golden threads, but it is not necessarily the best
partitioning.  It might make more sense to have all the partial subset
A's, B's and C's endpoints implemented on each clustered server S and
responsible for all resources of the identified user subsets (that way
you probably minimize the use of the distributed GT pattern - since the
bulk of the collection of resources for a given user will be on the same
server instance).

You'd also quickly find the limits of the naive A(x) implementation,
imagine the cost (even with cache hits) for a superfriend with a million
followers. You'd have to bite the bullet and recognise that our single
read objective for the DB is too limiting.  Instead you'd have to
introduce a differential resource model DeltaC(x) the set of friends of
x who have recent tweets.  Which would require that the write channel
would update a secondary table holding this delta list. A(x) would
source the delta list and only merge that. Although even here with
second-order differential resources you'd still quickly find local
equilibriums and so cache the majority of the resource sets for the
majority of the time.

The implementation details aside, it should be clear that playing and
tuning the system with some real data would quickly allow us to tweak
the architecture to get a good balance and introduce any necessary 2nd
order resources, constraints and caching parameters.

Hopefully this discussion starts to show that with ROC you can
play with architectural and engineering variables to find a pragmatic
dynamic equilibria.  In this case, steady state interludes are
statistically quite frequent, (a property of read>>write system) so the
solution will often discover regions of the resource space that approach
the ideal minimum of read=write.

One final thought. Going back to last week's post about demonstrable
value of ROC being greatest when requiring system changes...

Now that twitter is going commercial, and will soon start inserting
advertisements, their resource model will include demographic profile
resource sets of us poor proletariat.  To implement this change in the
system above, all we'd need to do is change (C) to a dynamic composite
resource set, the union of friends and instantaneous advertizers
targeting you.  In effect recode EP-C. Nothing else needs to change,
especially not the resource identifiers and architectural relations.
More sophisticated targeting might do content analysis on the A(x) but
again this remains internal to EP-A.

Linked Data Reprise
-------------------

I wrote the twitter discussion earlier in the week.  I just reviewed it
before sending and realize I've done it again with linked-resources (see
Newsletter Vol 1 Issue 22).  In the discussion I've just assumed that
linked data (composite linked-resources) are a given.

My implicit assumption is that each of the A's,B's and C's would mostly
consist of resource references - so for example the  B's would actually
be a list of resource references to individual tweets (which would have
a unique identifier in the set tweet/x (and need an endpoint to reify
them etc)).  To reify the final output resource, the linked resources
would be evaluated and included into the composite whole.  In this case,
since we'd probably want XML outputs, I'd probably do that with XRL.

----

Something of a monster newsletter this week.  By the time you've read to
this point you'll either be asleep or it'll be next Friday and there'll
be another one to fall asleep over.

Have a great weekend!

The 1060 Team
 new topic  post reply  To find out about new replies to this post as they occur
please subscribe to one of these feeds:
AtomRSS moderate 
© 2003-2006, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.