Package Contract
The fingerprint package provides a framework for creation of ferret and filter fingerprints based on features of representative pages in the space of pages.
The FingerprintID class provides a unique identifier for each ferret or filter fingerprint function created. It consists of a method to compare two fingerprint IDs to check if they are the same. The FingerprintFunction class consists of an ID and methods to check if a page is a candidate page for the space of pages based on the deciding function. These methods are implemented in subclasses of the FingerprintFunction class.
The FerretFingerprintFunction has a decidingFunction() method which, given a page, creates a Feature instance of the same type as the Feature with which this fingerprint is associated. The new Feature instance has its feature value computed for the new page. This newly created feature is then compared with the stored Feature this fingerprint is associated with. The similarity between the features is checked against the fingerprint's threshold. If it exceeds the threshold, the page is acceptable.
The FilterFingerprint function works similarly, except that it has a larger number of features. Each web page has all the types of Features created for it, all of which are compared against the representative Features stored with the fingerprint. If a majority of them have similarity measures greater than the threshold, the page is acceptable.
In addition, the FilterFingerprintFunction also has a method for determining whether it is a suitable filter to handle a particular page. This is done to prevent all filters from working on the page, which would waste time. A filter is determined to be suitable if the DocumentNode which was used as a seed site for the new page is sufficiently similar to the set of representative pages on which this FilterFingerprintFunction is based.
Package-Level CRC
Collaborators:
Ferrets, filters, and their advisors use classes in the fingerprint package.
The mapmaker and webpage database also use some Fingerprint information.
Responsibilities:
Provide a framework for different kinds of fingerprint functions.