net.ontopia.topicmaps.classify (Ontopia)

Interface Summary
Interface	Description
ClassifiableContentIF	INTERNAL: Interface that holds the identifier and the actual content of a classifiable resource.
ClassifyPluginIF	INTERNAL: Interface implemented by code that is able to locate classifiable content for topics.
DelimiterTrimmerIF	INTERNAL:
DocumentAnalyzerIF	INTERNAL:
FormatModuleIF	INTERNAL: Interface that encapsulates the support for a given document format.
HttpServletRequestAwareIF	INTERNAL: Interface implemented by ClassifyPluginIFs that want access to the current HTTP request in a servlet environment.
TermAnalyzerIF	INTERNAL:
TermNormalizerIF	INTERNAL:
TermStemmerIF	INTERNAL: A stemmer produces the stem of a word from a form of the word.
TextHandlerIF	INTERNAL: Callback interface used by format modules to tell the classification framework about the structure of classifiable content.
TokenizerIF	INTERNAL:

Class Summary
Class	Description
AbstractDocumentAnalyzer	INTERNAL:
BlackList	INTERNAL:
CharacterAnalyzer	INTERNAL:
Chew	PUBLIC: Command-line tool for extracting keywords from a document.
ClassifiableContent	INTERNAL:
ClassifyUtils	INTERNAL:
CompoundAnalyzer	INTERNAL:
ConferencePlugin	INTERNAL:
DefaultPlugin	INTERNAL:
DefaultTokenizer	INTERNAL:
DistanceAnalyzer	INTERNAL:
Document	INTERNAL:
DocumentClassifier	INTERNAL:
DocumentTokenizer	INTERNAL:
DowncaseNormalizer	INTERNAL:
FormatModule	INTERNAL:
FrequencyAnalyzer	INTERNAL: A frequency table giving the frequency with which a particular word is used in a particular language.
HTMLFormatModule	INTERNAL:
JunkNormalizer	INTERNAL:
Language	INTERNAL: Object representing a particular language.
OOXMLPowerpointFormatModule	INTERNAL: A format module for the OOXML PresentationML format.
OOXMLWordFormatModule	INTERNAL: A format module for the OOXML WordProcessingML format.
PDFFormatModule	INTERNAL:
PlainTextFormatModule	INTERNAL:
PowerPointFormatModule	INTERNAL:
RegexpTermAnalyzer	INTERNAL: A term analyzer which recognizes certain kinds of terms using regexps and adjusts their scores accordingly.
Region	INTERNAL:
RegionBooster	INTERNAL:
RelativeScore	INTERNAL:
SimpleClassifier	PUBLIC: A simple top-level API for classifying content.
SnowballStemmer	INTERNAL:
SpecialCharNormalizer	INTERNAL:
StopList	INTERNAL: A set of words considered "stop words" in a particular language.
Term	PUBLIC: Represents a concept which occurs in the classified content.
TermDatabase	PUBLIC: A collection of terms representing the result of classifying a piece of content.
TextBlock	INTERNAL:
Token	INTERNAL:
TokenVisitor	INTERNAL:
TologRulePlugin	INTERNAL:
TopicContentPlugin	INTERNAL: Classifier plugin which produces content from the topic itself.
TopicContentPlugin.TopicAsContent
TopicMapAnalyzer	INTERNAL:
TopicMapAnalyzer.AssociationType
TopicMapClassification	INTERNAL:
Variant	PUBLIC: Represents a form of a term as it occurred in classified content.
WebChew	INTERNAL:
WordFormatModule	INTERNAL: A format module for the old binary Word format.
XMLFormatModule	INTERNAL:

Package net.ontopia.topicmaps.classify Description

To classify content, use the SimpleClassifier class. Note that most of the APIs are INTERNAL, and so may change at any time.

If you need more flexibility, it is possible to use the INTERNAL APIs directly. Below is example code showing how to output a ranked list of the terms found in a particular document.

    // load the topic map
    TopicMapIF topicmap = ImportExportUtils.getReader(args[0]).read();

    // create classifier
    TopicMapClassification tcl = new TopicMapClassification(topicmap);

    // read document
    ClassifiableContentIF cc = ClassifyUtils.getClassifiableContent(args[1]);

    // classify document
    tcl.classify(cc);

    // dump the ranked terms
    TermDatabase tdb = tcl.getTermDatabase();
    tdb.dump(50);