The Future

Science is a match that man has just got alight. He thought he was in a room---in moments of devotion, a temple---and that his light would be reflected from and display walls inscribed with wonderful secrets and pillars carved with philosophical systems wrought into harmony. It is a curious sensation, now that the preliminary sputter is over and the flame burns up clear, to see his hands lit and just a glimpse of himself and the patch he stands on visible, and around him, in place of all that human comfort and beauty he anticipated---darkness still.
H. G. Wells, The Rediscovery of the Unique

The web is likely to keep growing in size, diversity, and volatility until no amount of snapshot web analysis will be sufficient. The average user will be further hampered by a severe bandwidth crunch compared to commercial sites able to afford high-bandwidth lines. Further, as the web grows ever more commercial, as its population continues to explode, and as distributed computing matures, ever more users and webagents will be roaming the net, thereby adding to the bandwidth crunch. Eventually, users are likely to be asked to pay not just for phone line rental, but also for server time (perhaps measured in megabytes accessed times the cycles needed to process and deliver those megabytes).

It's impossible for the average user to keep up with these costs, so more and more of the burden of weblinkage must inevitably fall on the commercial search engines---they have the bandwidth and the horsepower to do continuous and extensive searching and mapping. Unfortunately, there is at present no motivation for commercial searchers to give discovered linkage information directly to users. That may change, however, after the web grows for another few years and the current search strategy collapses entirely. When users pose queries and routinely get a million responses they are unlikely to find search engines useful. It's also likely that even the search engines will give up on indexing the whole web. Already most of them have moved to sampling only.

Presumably, some servers will remain free to all users, being supported by advertising dollars, but at peak times such free sites will always have many more requests than they can service, particularly if they are popular. (The web is self-limiting in that regard; the more popular a page is the less likely it is for anyone to ever actually see it.) Finally, as the web grows, the webmaps will grow with it, making the creation of even partial neighborhood maps impractical by normal users. In sum, eventually all significant webmapping will likely have to be done by commercial sites who do it for profit.

The next question is how will money flow from the users of a mapping service to the mappers. If a mapper's revenue comes solely from advertising (as is true for the search engine companies today), there is no profit in allowing automated searchers to hit their site. For one thing, the download volume goes way up, and for another automated searchers don't read ads. Eventually, the search engine companies will probably disallow automated searchers. Since the web will eventually grow so large that no user-created search engine could do an adequate job, this means that mapping companies will likely charge users for their services and charge major sites advertising dollars to find some connection---however tenuous---between their sites and popular sites.

Presumably, users of a particular mapping service will be able to download default neighborhood maps for common search categories (say: car dealers, food, books, entertainment) and prune that linkage map for places they find most interesting. Once they have a core set of neighborhoods they can instruct their webagent to search the web for sites like those currently in their neighborhood map. Users can thus incrementally build up a linkage map of everything on the web that they might be interested in.

Other Applications

Personalizing search can be extended to any data. If a relational database of car parts, say, keeps track of each user and the kinds of queries that user poses, it could tailor its responses to each user. Such a query engine would have a much better chance of satisfying its users, no matter how garbled their present request.

Laying out multi-dimensional information in a two-dimensional map also works for many information management tasks: reading mail, reading news, deciding what movies to watch or what music to listen to. Had we such a system handling our mail for instance, then when a new message comes in we would have a rough idea what it's about at a glance. Such a map would be even more useful for news.

Having variant definitions of locality depending on context is of more general consequence as well. An electronic Yellow Pages, for instance, should list all businesses by each street, neighborhood, and mall; by the time needed to get to them from our current location; by whether they're in a safe neighborhood; by their relation to various landmarks; by whether they're currently having a sale; by whether they accept checks, cash, or credit; by their hours of operation; by their nearness to restaurants, gas stations, public restrooms, and malls; by their costliness, reliability, revenue, experience, and returns policy. Any of these dimensions of variation could be important to us at one time or another.

Businesses in the Yellow Pages could also be mapped based on what past customers have to say about them. Similarly, web locality may also be defined in terms of who tends to visit which site and in which order. If we already know something about the users of a neighborhood map then their peregrinations within the map can suggest what things should be related more strongly (or less so). For example, in a music or movie or paintings database, finding out that one user likes a certain set of artists could be used to modify the database's linkages between artists for some other user. It could also be used to give music recommendations to other users who happen to like some subset of those related artists. Here, the definition of locality is based purely on who likes whom.

A datamanager should map all pages, whether on the desktop or on the web, into a colorful, three-dimensional, and personalized space where page clustering happens semi-autonomously. Such a map could eventually become an interface to all the world's data---including your computer's local pages, your mail and news, and any odds and ends the system picks up that it judges may be relevant to you. The entire world then becomes a universe of pages, all focused on you. Your system roams that world looking for things that you might find interesting and either point you to them or copy them for you if they are considered too important and too transient to miss. Besides webpages, mail, news, and local pages, those pages could be anything: ftp pages, gopher pages, logs of who was logged on to a particular machine in China at 3am on Friday, and logs of processes that mysteriously appear on your machine at 3am from an unknown machine in China. Finally, you might want to make your view of reality---for that is effectively what this is---sharable (or perhaps even salable).

Conclusion

In October, 1991, before the web existed, a paper on the future of electronic publishing contained the claim that:

"As computer power becomes more widespread each user's computer may run hundreds of ferret programs continuously, all separately exploring the world's data for useful information. When a ferret returns it may have to face dozens of filters who try to prevent them from adding the data found to the user's personal information base. Data that enough filters judge to be important or relevant is passed to the mapmaker to be linked into the user's personal map of what's important, where it is, and how it relates to other information in the personal map."

This quote made the cover of ACM Computer Reviews the following year, but no one knew when such a prediction might come to pass. As 1998 fades, however, machines are a hundred times faster than they were a decade ago, a thousand times more capacious, and 60 million people are on the net. The world that existed in 1991---the small world of mainly academic and military users of computer networks---is gone forever. What's still missing though is any sense of place on the web, any widespread use of adaptive software to help manage the information overload problem, and any real use of information visualization to aid common data mangement tasks. The system outlined here is a small step along that road.