next up previous
Next: Level 3: Words Absent Up: The WordSieve Architecture Previous: Level 1: MostFrequentWords

Level 2: Words Occurring in Document Sequences

The nodes in level 2 identify words that tend to occur together in sequences of document accesses. In our experiments, this level contained 500 nodes. Because this level does not have direct access to the stream of text, it is only sensitized to words corresponding to nodes in level 1.

Each node in this layer is associated with a word and two real values, excitement and priming. The excitement of the node increases as a word continues to have high excitement values in level 1. Priming determines how fast the excitement of the node can change. Both excitement and priming are allowed to have values between 0.0 and 1.0. At each pass, all the words in the word sieve are presented to this level sequentially. For each word presented to this level, every node's excitement decays by 0.5% and each node's priming decays by 0.1%. Then, if a node is already sensitized to the given word, its priming is increased by 1.75 $\times$ the decay amount.

The node's excitement is increased as a function of its priming. If the given word has not yet been trapped by this level, it probabilistically replaces a node in a manner similar to that described above.

While level 1 is sensitized to the terms that are currently frequent in the text stream, level 2 keeps a record of the words that have tended to occur frequently at different times. Level 2 remembers words even after they stop occurring in the text stream, and only slowly forgets them. Nodes for non-discriminators will get high values at this level, as will nodes for terms which partition the set.


next up previous
Next: Level 3: Words Absent Up: The WordSieve Architecture Previous: Level 1: MostFrequentWords
Travis Bauer
2002-01-25