Simpletons read data from a data pool, operate on it somehow, and write the data, which is the result of the operation performed, to a data pool. Simpletons could also read data from multiple pools. Simpletons must be uniquely identified. All data produced by a simpleton must have the simpleton's identifier associated with it. All simpletons have to keep reading an area of shared memory to find out what their input and output pools are.
Simpletons never attempt to find another simpleton or talk to another simpleton. The only objects they know about are the simpleton registry and their input and output data pools. They may make assumptions (as limited as possible) about the kind of data in their input pool(s). In the latter case, simpletons upstream of this one must promise to provide the kind of data expected by this simpleton.
public interface Simpleton
{
private Object readData(DataPoolID);
private void writeData(DataPoolID);
private void markData(PageID);
}
Data Pools
DataPools store data. They must provide a means for adding data to the pool, removing data from it, reading data from it. Such access should be synchronized so that it does not corrupt the data in case multiple simpletons attempt to read or write data at the same time. On shutdown, datapools save their state to disk and are capable of recovering correctly on startup. There should also be multiple options available for reading: one should be random browsing of the pool, another should return pages based on the time at which they were added to the pool, so that it may be possible for a simpleton to go through the whole pool (which may not be possible with returning random pages).
Data pools do not have any knowledge of what kind of data they store (like what class it is an object of, and so on) and are prohibited from doing anything other than dealing with the objects generically.
public class DataPool
{
public Object getNextItem(Object currentItem);
public Object getRandomItem();
public Object putItem(Object);
private void saveState();
private void recoverState();
}
Simpleton Information In Memory
There should be a shared memory space that all simpletons access. It should store, for each simpleton, its input pool(s) and output pool. This could be a class that manages this data and provides synchronized methods to access it. One possibility:
public class SimpletonRegistry
{
//should there be multiple pools?
public DataPoolID[] getUpstreamPools(SimpletonID);
public DataPoolID getDownstreamPool();
public void registerSimpleton(SimpletonID);
//how does it know where to put it?
}
Pages
Pages store information about a page---the page content, and all attributes that any simpleton might care to add. It should be possible to create or remove attributes dynamically and access or modify the values of the attributes.
Pages make no semantic inference about their attributes. To ensure this, pages can just store a set of attribute objects of the Attribute class.
public class Page
{
//a hash table of attributes indexed by attribute ID
private Hashtable attributeTable;
public Attribute getAttribute(AttributeID);
public void addAttribute(Attribute);
public void removeAttribute(AttributeID);
}
Attributes
Attributes are pairs of the form (AttributeID, AttributeValue). It should be possible to access and modify attribute values. In addition, attributes store information about their weight (which may be used by other simpletons in their computations) and relations with other attributes (also used by simpletons).
public class Attribute
{
AttributeID id;
AttributeWeight weight;
AttributeRelationTable relationsWithOtherAttributes;
SimpletonID creatorSimpleton;
Time creationTime;
Simpleton lastUpdateSimpleton;
Time lastUpdateTime;
public void setValue(Object);
public Object getValue();
public AttributeID getID();
}
Clusters
Clusters store information about which pages are related. This information is used to display related pages (that is, pages in the same cluster) together. Pages can be added to clusters, removed from clusters, and so on. Every page, when added to a cluster, also gets an attribute that stores the ClusterID of the cluster to which this page belongs. A page may belong to multiple clusters, or to no clusters.
public class Cluster
{
ClusterID id;
ClusterWeight weight;
Time timeOfCreation;
Vector pagesInCluster; //a list of PageIDs
public void addPage(PageID);
public void removePage(PageID);
public Vector getPagesInCluster();
}