Class SpellChecker

java.lang.Object
org.apache.lucene.search.spell.SpellChecker
All Implemented Interfaces:
Closeable, AutoCloseable

public class SpellChecker extends Object implements Closeable
Spell Checker class (Main class).
(initially inspired by the David Spencer code).

Example Usage:

  SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
  // To index a field of a user index:
  spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
  // To index a file containing words:
  spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
  String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
 
  • Field Details

    • DEFAULT_ACCURACY

      public static final float DEFAULT_ACCURACY
      The default minimum score to use, if not specified by calling setAccuracy(float) .
      See Also:
    • F_WORD

      public static final String F_WORD
      Field name for each word in the ngram index.
      See Also:
    • spellIndex

      Directory spellIndex
      the spell index
    • bStart

      private float bStart
      Boost value for start and end grams
    • bEnd

      private float bEnd
    • searcher

      private IndexSearcher searcher
    • searcherLock

      private final Object searcherLock
    • modifyCurrentIndexLock

      private final Object modifyCurrentIndexLock
    • closed

      private volatile boolean closed
    • accuracy

      private float accuracy
    • sd

      private StringDistance sd
    • comparator

      private Comparator<SuggestWord> comparator
  • Constructor Details

    • SpellChecker

      public SpellChecker(Directory spellIndex, StringDistance sd) throws IOException
      Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.
      Parameters:
      spellIndex - the spell index directory
      sd - the StringDistance measurement to use
      Throws:
      IOException - if Spellchecker can not open the directory
    • SpellChecker

      public SpellChecker(Directory spellIndex) throws IOException
      Use the given directory as a spell checker index with a LevenshteinDistance as the default StringDistance. The directory is created if it doesn't exist yet.
      Parameters:
      spellIndex - the spell index directory
      Throws:
      IOException - if spellchecker can not open the directory
    • SpellChecker

      public SpellChecker(Directory spellIndex, StringDistance sd, Comparator<SuggestWord> comparator) throws IOException
      Use the given directory as a spell checker index with the given StringDistance measure and the given Comparator for sorting the results.
      Parameters:
      spellIndex - The spelling index
      sd - The distance
      comparator - The comparator
      Throws:
      IOException - if there is a problem opening the index
  • Method Details

    • setSpellIndex

      public void setSpellIndex(Directory spellIndexDir) throws IOException
      Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.
      Parameters:
      spellIndexDir - the spell directory to use
      Throws:
      AlreadyClosedException - if the Spellchecker is already closed
      IOException - if spellchecker can not open the directory
    • setComparator

      public void setComparator(Comparator<SuggestWord> comparator)
      Sets the Comparator for the SuggestWordQueue.
      Parameters:
      comparator - the comparator
    • getComparator

      public Comparator<SuggestWord> getComparator()
      Gets the comparator in use for ranking suggestions.
      See Also:
    • setStringDistance

      public void setStringDistance(StringDistance sd)
      Sets the StringDistance implementation for this SpellChecker instance.
      Parameters:
      sd - the StringDistance implementation for this SpellChecker instance
    • getStringDistance

      public StringDistance getStringDistance()
      Returns the StringDistance instance used by this SpellChecker instance.
      Returns:
      the StringDistance instance used by this SpellChecker instance.
    • setAccuracy

      public void setAccuracy(float acc)
      Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY
      Parameters:
      acc - The new accuracy
    • getAccuracy

      public float getAccuracy()
      The accuracy (minimum score) to be used, unless overridden in suggestSimilar(String, int, IndexReader, String, SuggestMode, float), to decide whether a suggestion is included or not.
      Returns:
      The current accuracy setting
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug) throws IOException
      Suggest similar words.

      As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

      I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

      Parameters:
      word - the word you want a spell check done on
      numSug - the number of suggested words
      Returns:
      String[]
      Throws:
      IOException - if the underlying index throws an IOException
      AlreadyClosedException - if the Spellchecker is already closed
      See Also:
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug, float accuracy) throws IOException
      Suggest similar words.

      As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

      I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

      Parameters:
      word - the word you want a spell check done on
      numSug - the number of suggested words
      accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
      Returns:
      String[]
      Throws:
      IOException - if the underlying index throws an IOException
      AlreadyClosedException - if the Spellchecker is already closed
      See Also:
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode) throws IOException
      Throws:
      IOException
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode, float accuracy) throws IOException
      Suggest similar words (optionally restricted to a field of an index).

      As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

      I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

      Parameters:
      word - the word you want a spell check done on
      numSug - the number of suggested words
      ir - the indexReader of the user index (can be null see field param)
      field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.
      suggestMode - (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)
      accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
      Returns:
      String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
      Throws:
      IOException - if the underlying index throws an IOException
      AlreadyClosedException - if the Spellchecker is already closed
    • add

      private static void add(BooleanQuery.Builder q, String name, String value, float boost)
      Add a clause to a boolean query.
    • add

      private static void add(BooleanQuery.Builder q, String name, String value)
      Add a clause to a boolean query.
    • formGrams

      private static String[] formGrams(String text, int ng)
      Form all ngrams for a given word.
      Parameters:
      text - the word to parse
      ng - the ngram length e.g. 3
      Returns:
      an array of all ngrams in the word and note that duplicates are not removed
    • clearIndex

      public void clearIndex() throws IOException
      Removes all terms from the spell check index.
      Throws:
      IOException - If there is a low-level I/O error.
      AlreadyClosedException - if the Spellchecker is already closed
    • exist

      public boolean exist(String word) throws IOException
      Check whether the word exists in the index.
      Parameters:
      word - word to check
      Returns:
      true if the word exists in the index
      Throws:
      IOException - If there is a low-level I/O error.
      AlreadyClosedException - if the Spellchecker is already closed
    • indexDictionary

      public final void indexDictionary(Dictionary dict, IndexWriterConfig config, boolean fullMerge) throws IOException
      Indexes the data from the given Dictionary.
      Parameters:
      dict - Dictionary to index
      config - IndexWriterConfig to use
      fullMerge - whether or not the spellcheck index should be fully merged
      Throws:
      AlreadyClosedException - if the Spellchecker is already closed
      IOException - If there is a low-level I/O error.
    • getMin

      private static int getMin(int l)
    • getMax

      private static int getMax(int l)
    • createDocument

      private static Document createDocument(String text, int ng1, int ng2)
    • addGram

      private static void addGram(String text, Document doc, int ng1, int ng2)
    • obtainSearcher

      private IndexSearcher obtainSearcher()
    • releaseSearcher

      private void releaseSearcher(IndexSearcher aSearcher) throws IOException
      Throws:
      IOException
    • ensureOpen

      private void ensureOpen()
    • close

      public void close() throws IOException
      Close the IndexSearcher used by this SpellChecker
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - if the close operation causes an IOException
      AlreadyClosedException - if the SpellChecker is already closed
    • swapSearcher

      private void swapSearcher(Directory dir) throws IOException
      Throws:
      IOException
    • createSearcher

      IndexSearcher createSearcher(Directory dir) throws IOException
      Creates a new read-only IndexSearcher
      Parameters:
      dir - the directory used to open the searcher
      Returns:
      a new read-only IndexSearcher
      Throws:
      IOException - f there is a low-level IO error
    • isClosed

      boolean isClosed()
      Returns true if and only if the SpellChecker is closed, otherwise false.
      Returns:
      true if and only if the SpellChecker is closed, otherwise false.