java.lang.Object
org.apache.lucene.analysis.lv.LatvianStemmer

public class LatvianStemmer extends Object
Light stemmer for Latvian.

This is a light version of the algorithm in Karlis Kreslin's PhD thesis A stemming algorithm for Latvian with the following modifications:

  • Only explicitly stems noun and adjective morphology
  • Stricter length/vowel checks for the resulting stems (verb etc suffix stripping is removed)
  • Removes only the primary inflectional suffixes: case and number for nouns ; case, number, gender, and definitiveness for adjectives.
  • Palatalization is only handled when a declension II,V,VI noun suffix is removed.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    (package private) static class 
     
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) static final LatvianStemmer.Affix[]
     
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private int
    numVowels(char[] s, int len)
    Count the vowels in the string, we always require at least one in the remaining stem to accept it.
    int
    stem(char[] s, int len)
    Stem a latvian word.
    private int
    unpalatalize(char[] s, int len)
    Most cases are handled except for the ambiguous ones: s -> š t -> š d -> ž z -> ž

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • LatvianStemmer

      public LatvianStemmer()
  • Method Details

    • stem

      public int stem(char[] s, int len)
      Stem a latvian word. returns the new adjusted length.
    • unpalatalize

      private int unpalatalize(char[] s, int len)
      Most cases are handled except for the ambiguous ones:
      • s -> š
      • t -> š
      • d -> ž
      • z -> ž
    • numVowels

      private int numVowels(char[] s, int len)
      Count the vowels in the string, we always require at least one in the remaining stem to accept it.