Information Helpers

Given a potentially infinite stream of items where each item is chosen from a space with a potentially infinite number of dimensions of variation, the helper's task is to use information provided implicitly by the user's previous choices to classify each item in the stream into one of three groups:

items the user has definitely liked and wishes to see more of,
items the user has definitely disliked and wishes to see no more of,
items the helper hasn't yet presented to the user.

Call a decision procedure that distinguishes between the first two classes a classifier. This classifier need not be correct on all items; probabilistically good behavior is acceptable.

In versions of the problem where the information helper is trying to guess rules of explanation for some phenomena, or versions where the helper is trying to correct user errors, the stream of items is produced by the user's own sequence of actions.

The information helper problem arises in its most virulent form whenever:

the user cannot articulate the item classifier
the user can articulate the item classifier, but not in terms of the dimensions of variation known to the system
the user can articulate the item classifier in terms of the dimensions of variation known to the system, but either can't be bothered to do so or chooses to keep it secret
the user can easily articulate the item classifier in terms of the dimensions of variation known to the system, but that classifier is constantly changing.

Most classification programs focus on problems where the item classifier is known, fairly easily articulated, and static. For such problems, "articulating the classifier" is equivalent to constructing a program to solve the problem. Various kinds of adaptive programs (neural networks, genetic algorithms, simulated annealing, production systems) address one or more versions of the more complex problem.

The information helper's task is to construct a (potentially dynamic) classifier that sufficiently separates the first two classes. Items not fitting into either category fall into a grey area where the system has not yet had the benefit of the user's judgment and it must attempt classification based solely on the similarity of those as yet unclassified items with items in either one of the two sets already implicitly defined by the user's previous actions. To determine this similarity, the helper must have a measure of semantic distance between items; that is it must construct a (potentially dynamic) metric on the space of items. (Note: Doug Hofstadter's programs like Copycat and Tabletop focus on problems where this similarity is completed by analogy.)

For purposes of this paper, a successful helper is one whose cumulative positive hit rate asymptotically approaches, say, fifty percent or better, and whose cumulative negative hit rate asymptotically approaches, say, ninety percent or better. Half the time it correctly classifies new items the user wishes to see and most of the time it correctly classifies items the user doesn't wish to see. These two parameters should of course be under user control, but a (50, 90) split seems like a minimal measure of success since it should save users at least half of the time they would otherwise have to spend on the classification task.

The helper's task is made hard by several constraints:

Limits on computational resources, or time, or knowledge of important dimensions of variation (or all three) usually make a complete cross-product analysis of all items and all dimensions of variation impossible.
The potential variability of the dimensions of greatest importance makes a complete cross-product analysis at any one time, even if one were available, not completely usable for future classification requests.
The unwillingness or inability of the user to explicitly pre-identify important dimensions of variation or to classify by hand any of the items in the stream except in a post-hoc implicit fashion based on their actions.
The extremely small sample of the space of possible items seen at any one time makes all inferences suspect. Over the entire life of the system it is only likely to see a tiny fraction of all possible items. From such meager information, coupled with deductions made from the user's reactions to previous decisions, it must construct a (potentially dynamic) metric on the space of items and use it classify future items. There may not be enough information for it to make any useful inductions.
Variation among users makes information sharing about previous classifications not necessarily useful. For example, in news classification, one user may like some news articles that another user doesn't much care for; in a guessing game, one user may generate explanatory rules that are very different from the kinds of rules another user may generate.

Although these constraints make the helper's task difficult, they don't necessarily make it impossible because of several other constraints:

Although the space of items is very large (or infinite), the helper will only ever see an extremely small fraction of that space, thus computation over an infinite space is unnecessary.
Although users could be interested in an arbitrarily large and continuously variable set of dimensions of variation, often they stick to only a few and return to them regularly. Most of us only do a few different things and we usually do them in tried and true ways. This assumption is of proven value in computer science and is the basis of strategies like caching.
There is usually a fair amount of regularity in the class of items accepted, or a fair amount of regularity of difference between those accepted and those rejected, even though the user may not be consciously aware of such regularity. Just because we don't see a relatively simple pattern doesn't mean it isn't there to be seen.
The helper need not perform at the level of an expert human helper to be helpful. The economics of modern computing make worthwhile any shift of the burden of classification more onto the computer's shoulders and away from the user's shoulders as long as it is accomplished with only a modest investment of computer resources.
In situations where information sharing among users of copies of the helper is possible, such sharing might often be worthwhile. While all users are different, there are still several commonalities between them. Sharing the fruits of those commonalities could free up resources that each helper copy could shift to detecting the idiosyncrasies of its particular user.