next up previous
Next: WordSieve Up: Real Time User Context Previous: Real Time User Context

Introduction

By monitoring the user to infer a task context, personal information agents can assist users in intelligent ways, such as performing autonomous web searches, or suggesting knowledge resources that were useful in similar past contexts. We are investigating how personal information agents can capture contextual information as users seek information, and can use that information to suggest resources consulted in similar contexts in the past.

To provide real-time context-based retrieval, we have developed a new algorithm, WordSieve, which generates vector representations of documents as the user accesses them without requiring comprehensive statistics about word distributions in the documents accessed. Instead, working with a relatively small memory (in current tests, at most 650 unique words), it identifies task-specific keywords from document access sequences. WordSieve exploits information the user provides implicitly by virtue of accessing similar documents together. It does this by building access profiles which identify terms occurring frequently in sequences of document accesses and which are expected to be useful for distinguishing sets of documents related to the same task. In this way, WordSieve exploits extra knowledge about the document access context to generate indices that reflect the task context.

In our experiments, WordSieve outperformed term frequency/inverse document frequency (TFIDF) [9], a popular indexing algorithm, at matching documents to hand-coded vector representations of the task contexts in which they were originally consulted, where the task context representations are term vectors representing a specific search task given to the user. This paper presents WordSieve's architecture and performance.


next up previous
Next: WordSieve Up: Real Time User Context Previous: Real Time User Context
Travis Bauer
2002-01-25