java.lang.Object
org.apache.lucene.analysis.cn.smart.WordSegmenter

class WordSegmenter extends Object
Segment a sentence of Chinese text into words.
  • Field Details

  • Constructor Details

    • WordSegmenter

      WordSegmenter()
  • Method Details

    • segmentSentence

      public List<SegToken> segmentSentence(String sentence, int startOffset)
      Segment a sentence into words with HHMMSegmenter
      Parameters:
      sentence - input sentence
      startOffset - start offset of sentence
      Returns:
      List of SegToken
    • convertSegToken

      public SegToken convertSegToken(SegToken st, String sentence, int sentenceStartOffset)
      Process a SegToken so that it is ready for indexing.

      This method calculates offsets and normalizes the token with SegTokenFilter.

      Parameters:
      st - input SegToken
      sentence - associated Sentence
      sentenceStartOffset - offset into sentence
      Returns:
      Lucene SegToken