See: Description
Interface | Description |
---|---|
ClassifiableContentIF |
INTERNAL: Interface that holds the identifier and the actual
content of a classifiable resource.
|
ClassifyPluginIF |
INTERNAL: Interface implemented by code that is able to locate
classifiable content for topics.
|
DelimiterTrimmerIF |
INTERNAL:
|
DocumentAnalyzerIF |
INTERNAL:
|
FormatModuleIF |
INTERNAL: Interface that encapsulates the support for a given
document format.
|
HttpServletRequestAwareIF |
INTERNAL: Interface implemented by ClassifyPluginIFs that want
access to the current HTTP request in a servlet environment.
|
TermAnalyzerIF |
INTERNAL:
|
TermNormalizerIF |
INTERNAL:
|
TermStemmerIF |
INTERNAL: A stemmer produces the stem of a word from a form of the
word.
|
TextHandlerIF |
INTERNAL: Callback interface used by format modules to tell the
classification framework about the structure of classifiable
content.
|
TokenizerIF |
INTERNAL:
|
Class | Description |
---|---|
AbstractDocumentAnalyzer |
INTERNAL:
|
BlackList |
INTERNAL:
|
CharacterAnalyzer |
INTERNAL:
|
Chew |
PUBLIC: Command-line tool for extracting keywords from a document.
|
ClassifiableContent |
INTERNAL:
|
ClassifyUtils |
INTERNAL:
|
CompoundAnalyzer |
INTERNAL:
|
ConferencePlugin |
INTERNAL:
|
DefaultPlugin |
INTERNAL:
|
DefaultTokenizer |
INTERNAL:
|
DistanceAnalyzer |
INTERNAL:
|
Document |
INTERNAL:
|
DocumentClassifier |
INTERNAL:
|
DocumentTokenizer |
INTERNAL:
|
DowncaseNormalizer |
INTERNAL:
|
FormatModule |
INTERNAL:
|
FrequencyAnalyzer |
INTERNAL: A frequency table giving the frequency with which a
particular word is used in a particular language.
|
HTMLFormatModule |
INTERNAL:
|
JunkNormalizer |
INTERNAL:
|
Language |
INTERNAL: Object representing a particular language.
|
OOXMLPowerpointFormatModule |
INTERNAL: A format module for the OOXML PresentationML format.
|
OOXMLWordFormatModule |
INTERNAL: A format module for the OOXML WordProcessingML format.
|
PDFFormatModule |
INTERNAL:
|
PlainTextFormatModule |
INTERNAL:
|
PowerPointFormatModule |
INTERNAL:
|
RegexpTermAnalyzer |
INTERNAL: A term analyzer which recognizes certain kinds of terms
using regexps and adjusts their scores accordingly.
|
Region |
INTERNAL:
|
RegionBooster |
INTERNAL:
|
RelativeScore |
INTERNAL:
|
SimpleClassifier |
PUBLIC: A simple top-level API for classifying content.
|
SnowballStemmer |
INTERNAL:
|
SpecialCharNormalizer |
INTERNAL:
|
StopList |
INTERNAL: A set of words considered "stop words" in a particular
language.
|
Term |
PUBLIC: Represents a concept which occurs in the classified
content.
|
TermDatabase |
PUBLIC: A collection of terms representing the result of
classifying a piece of content.
|
TextBlock |
INTERNAL:
|
Token |
INTERNAL:
|
TokenVisitor |
INTERNAL:
|
TologRulePlugin |
INTERNAL:
|
TopicContentPlugin |
INTERNAL: Classifier plugin which produces content from the topic
itself.
|
TopicContentPlugin.TopicAsContent | |
TopicMapAnalyzer |
INTERNAL:
|
TopicMapAnalyzer.AssociationType | |
TopicMapClassification |
INTERNAL:
|
Variant |
PUBLIC: Represents a form of a term as it occurred in classified
content.
|
WebChew |
INTERNAL:
|
WordFormatModule |
INTERNAL: A format module for the old binary Word format.
|
XMLFormatModule |
INTERNAL:
|
To classify content, use the SimpleClassifier class. Note that most of the APIs are INTERNAL, and so may change at any time.
If you need more flexibility, it is possible to use the INTERNAL APIs directly. Below is example code showing how to output a ranked list of the terms found in a particular document.
// load the topic map TopicMapIF topicmap = ImportExportUtils.getReader(args[0]).read(); // create classifier TopicMapClassification tcl = new TopicMapClassification(topicmap); // read document ClassifiableContentIF cc = ClassifyUtils.getClassifiableContent(args[1]); // classify document tcl.classify(cc); // dump the ranked terms TermDatabase tdb = tcl.getTermDatabase(); tdb.dump(50);