Using Clusters and Entities

Suppose we create a simpleton to search a datapool for objects in the pool that fit a certain predicate and then attach a new, dynamically generated, attribute to those entities signifying that they belong within the group of entities satisfying that predicate. Given such a simpleton, dynamic cluster creation becomes trivial, and many other processes become straightforward.

All the different things we're trying to do with pools of entities (which, by the way, might be expanded in future to include even simpletons) comes down to a search for entities with various attributes.

Consequently, our next piece of code should be a pool search engine. Of course, in keeping with the bootstrapping philosophy, such a "search engine" needn't be anything more complicated than a linear list scanner at first. However it needs to have a well-defined interface so that we can replace it with something more sophisticated later. Once we have a pool search engine, no matter how primitive and inefficient, a lot of apparently unrelated problems become trivial to solve.

User Clusters

Clusters can be used for categorization. The user will need to have frequently accessed categories readily available and easy to modify (by adding or deleting entities within those categories). Those categories are naturally implemented as clusters. Consequently, cluster objects should support addEntity() and removeEntity() and should have Attributes and satisfy predicates, just as entities do.

The user can create clusters. Every time the user does a search, a cluster will be automatically produced to hold the results of that search. The user can then choose to explicitly name and save that cluster in a special "category space". This category space would be separate from the normal display and would let the user search for entities based on what category (that is, cluster) it is likely to be in, just as we search for files in directories now. The user can also rename categories, split categories, merge categories, put categories in other categories, move categories around, and navigate in category space.

After some time the user would have built up lots and lots of ways of "slicing the data salami" just as now we can have lots and lots of directories. The difference of course being that for clusters (categories) there's no mutual exclusion since two clusters can share entities without either one being contained in the other.

A Trash Cluster

Purging of entities need not necessarily be permanent (and probably should not, since the user will take some time to trust the system's judgment). Instead, the purgers simply attach another attribute to the entity in question. The set of all such entities then becomes the "trash bin" for the system. It is a cluster like any other with the exception that (a) when the display starts up none of the entities with that attribute (that is, in that cluster) are displayed and (b) in times of memory shortage, the "dirtiest" and oldest trash is thrown out first (note: not necessarily out of the disk, just out of memory---although the same observation applies to the disk too, if it also fills). The user is free to rummage through the trash, see it displayed as a cluster on the screen, search it, and do all the other things that can be done with any other cluster.

Interface Clusters

Cluster (x, y, z) position information is quite important to the interface builders. Of course they could derive it from the entity data alone but it's more convenient for them to get it in one set of data. That data is already sitting in the pool anyway, it's just a question of identifying a collection of objects in the pool according to yet another set of criteria.

The interface needs a way to get all entities and clusters, together with their current markup. Of course it's better if all those entities and clusters already have been seen by the mappers, so that seems a natural way to specify the set of entities they get. The interface specifies that it wants the cluster of all entities marked by the mappers, and that takes care of the case of providing to the interface all the mapped entities at once without having to impose a getAllEntities() method on an info pool and thereby exposing the seamy side of cluster management to the interface.

The interface also needs a way to get all changed entities and clusters since last it asked for entities and clusters (so that it can update the display on demand from the user, or autonomously on some adjustable schedule, or both). That set of entities and clusters would be all the ones that were changed since the last time the interface asked for all entities and clusters. So those entities also need to be in a cluster. Which means that whenever an entity is updated it should be added to the "touched" cluster. Or, more simply, any entity bound to a simpleton could be immediately added to the "touched" cluster, whether it's actually changed or not.

Also, the interface will need to get those entities and clusters remotely if it is written to run independently of the rest of the system.

Cluster Types

The interface needs the "cluster" of all entities to start the display, then it needs ongoing "clusters" of entities that have changed since last it requested the overall cluster.
The interface needs clusters to record the results of user-initiated searches for later caching, reuse, and analysis.
The user model produces clusters of hotlisted entities and other collections of entities ordered by user interest; also it can produce clusters of highly predictive (or low predictive) attributes of entities.
The engine forms clusters around lighthouse entities.
The engine uses clusters to group incoming entities (mainly new pages) around the hot entities.
The engine should be continually analyzing the hot entities together with their attribute values to try to predict what attributes highly correlate with a entity being hot. Entities with those attributes are then put into yet another cluster---a sort of self-made cache of likely hot entities and clusters.