Feature engineering for a symbolic approach to text classification.

Sam. Scott
Most text classification research to date has used the standard "bag of words" model for text representation inherited from the word-based indexing techniques used in information retrieval research. There have been a number of past attempts to find better representations, but very few positive results have been found. Most of this previous work, however, has concentrated on retrieval rather than classification tasks, and none has involved symbolic learning algorithms. This thesis investigates a number of...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.