parser package


Package Contract

The Parser package provides the packages feature, ferrets, filters, mapmaker, and webpageDB with classes to fill in the values of the variables in the PageAttribute object or to compare two PageAttribute object variable values.

Package-Level CRC

Collaborators:
feature, ferrets, filters, mapmaker, webpageDB

Responsibilities:
The Parser package gets information of a web page as much as possible. Such information is used by various classes in the feature and the mapmaker packages. If any of the variables in the PageAttribute class doesn't have a value yet after a ferreting and a filtering, the mapmaker package fills the variable by using any class of the parser package.

Class-Level CRC

The Parser package contains the following classes:
* AddFilterAnnotations
* AddPageURL
* AddLanguageWritten
* AddSeedSite
* AddNumApplets
* AddURLsOnPage
* AddNumFrames
* AddWordVector
* AddNumImages
* AddWordsOnTitle
* AddPageContent
* AddPageSize
* PageAttribute

Class PageAttribute

* Responsibilities:
PageAttribute is the data-holder shared by the packages ferrets, filter, mapmaker, and webpageDB. The data flow is as follows:
Each new PageAttribute object is created and used by the Ferrets; then the PageAttribute object is passed to the Filter to fill in the filterAnnotation Vector;
then the PageAttribute object is passed to the Mapmaker. the Mapmaker will instantiate an Analyzer object to fill in appropriate values into the PageAttribute variables.
* Variables and Methods:
private URL pageURL;
private PageID seedSite;
private int size;
private Vector urlsOnPage;
private String title;
private int numImages;
private int numApplets;
private int numFrames;
private String languageWrittenIn;
private Vector filterAnnotations;
private Vector wordList;
private Vector metaWords;
private Vector wordsInTitle;
private DataInputStream htmlPage;
public PageAttribute()
the constructor sets default values into private variables of the new PageAttribute

public void setPageURL(URL pageUrl)
create a new URL object from the string pageAddress
this.pageURL = pageUrl;

public void setURLsOnPage(Vector urlsOnPage)
insert URLs on one specific page into Vector urlsOnPage
this.urlsOnPage = urlsOnPage;

public void setHtml(InputStream inputStream)
set the HTML content

public void setSize(int size)
find out the size of the page
this.size = size;

public void setNumberOfImages(int numImages)
set the value of the number of images in a specific page
this.numImages = numImages;

public void setWordsInTitle(Vector wordsInTitle)
set the value of the number of words in title in a specific page
this.wordsInTitle = wordsInTitle;

public void setNumberOfApplets(int numApplets)
set the value of the number of applets in a specific page
this.numApplets = numApplets;

public void setNumberOfFrames(int numFrames)
set the value of the number of frames in a specific page
this.numFrames = numFrames;

public void setLanguage(String language)
set the language the page is written
this.languageWrittenIn = language;

public void setAnnotations(Vector annotations)
set the annotations into vector
this.filterAnnotations = annotations;

public DataInputStream getHTML()
get the contents of a HTML page

public URL getPageURL()
get the URL of a certain page
return this.pageURL;

public Vector getURLsOnPage()
get the URLsOnPage Vector contains all the links of a page

public int getSize()
get the size of a certain page

public int numberOfImages()
get the number indicates the number of Images of a certain page

public Vector getWordsInTitle()
get a vector of the words in the title of a certain page

public int numberOfApplets()
get the number indicates the number of Applets of a certain page

public int numberOfFrames()
get the number indicates the number of Frames of a certain page

public String getLanguage()
get the language written of a certain page

public Vector getAnnotations()
get the filterAnnotation vector

public PageID getSeedSite()
get the PageID of the seedSite

Class AddPageURL

* Responsibilities:
When each new PageAttribute object is created, the AddPageURL is called to either set the PageURL to PageAttribute or compare if two PageURLs from two PageAttribute objects are the same.
* Variables and Methods:

public addPageURL(PageAttribute pageAttribute, URL pageurl)
sets the given URL into the given PageAttribute object.

public boolean compare(PageAttribute pageAttribute, URL pageUrl)
compares the given URL (pageUrl) to the URL of the given PageAttribute.

Class AddPageSize

* Responsibilities:
When each new PageAttribute object is created, the AddPageURL is called to either set the size to PageAttribute or compare if the size from two PageAttribute objects are the same.
* Variables and Methods:

public addPageSize(PageAttribute pageAttribute)
counts bytes of the actual html page of the given PageAttribute.

public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)
compares the size of pageAttribute1 to that of pageAttribute2.

Class AddSeedSite

* Responsibilities:
- Insert the seedSite value into one specific PageAttribute
- Compare seedSite of two PageAttribute
* Variables and Methods:

public addSeedSite(PageAttribute pageAttribute, URL seedsite)
sets the given seedsite into the given PageAttribute.

public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)
compares the seedSite of the pageAttribute1 to that of the pageAttribute2.

Class AddNumImages

* Responsibilities:
- Compute the number of Images in one specific webpage.
- Compare NumImages of two PageAttribute.
* Variables and Methods:

public PageAttribute addNumImages(PageAttribute pageAttribute)
counts the number of images in the webpage of the given PageAttribute

public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)
compares the number of images of the two PateAttributes.

Class AddNumApplets

* Responsibilities:
- Compute the number of Applets in one specific webpage.
- Compare NumApplets of two PageAttribute.
* Variables and Methods:

public addNumApplets(PageAttribute pageAttribute)
counts the number of applets in the webpage of the given PageAttribute.

public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)
compares the number of applets of the two PageAttributes.

Class AddNumFrames

* Responsibilities:
- Compute the number of Frames in one specific webpage.
- Compare NumFrames of two PageAttribute.
* Variables and Methods:

public addNumFrames(PageAttribute pageAttribute)
counts the number of frames in the webpage of the given PageAttribute.

public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)
compares the number of frames of the two PageAttributes.

Class AddURLsOnPage

* Responsibilities:
- Insert URLs of one specific webpage into one specific PageAttribute object.
- Compare URLs in two PageAttributes.
* Variables and Methods:

public addURLsOnPage(PageAttribute pageAttribute, URL url)
inserts the URLs of the given URL into the given PageAttribute.

public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2) compares URLs in the two PageAttributes.

Class AddLanguageWritten

* Responsibilities:
- Figure out which language one specific page is written in.
- Compare language of two PageAttribute.
* NOTE: This class has not been implemented yet. We need to come up with a method to create the class.
* Variables and Methods:
public addLanguageWritten(PageAttribute pageAttribute)
public boolean compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)

Class AddPageContent

* Responsibilities:
- Set the content of one specific HTML page (type BufferdInputStream).
- Compare contents of two PageAttribute.
* Variables and Methods:
public addPageContent(PageAttribute pageAttribute)
public boolean compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)

Class AddFilterAnnotations

* Responsibilities:
- Set the vector of Annotations to one specific PageAttribute.
- Compare Annotations of two PageAttribute.
* Variables and Methods:
public addFilterAnnotations(PageAttribute pageAttribute, Vector filterAnnotations)
public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)

Class AddWordsOnTitle

* Responsibilities:
- Set the vector of words on title of a specific HTML page.
- Compare wordsInTitle of two PageAttribute.
* Variables and Methods:
public addWordsOnTitle(PageAttribute pageAttribute, Vector wordList)
public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)

Class AddWordVector

* Responsibilities:
- Set the vector of salient words uniquely representing one specific PageAttribute.
- Compare WordVectors of two PageAttribute.
* Variables and Methods:
public PageAttribute addWordVector(PageAttribute pageAttribute, Vector wordVector)

public double compare(PageAttribute pageAttribute1, PageAttribute pageAttribute2)
last | | to sitemap | | up one level | | next