Class FSTCompiler<T>

java.lang.Object
org.apache.lucene.util.fst.FSTCompiler<T>

public class FSTCompiler<T> extends Object
Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).

NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

The parameterized type T is the output type. See the subclasses of Outputs.

FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.

  • Field Details

    • DIRECT_ADDRESSING_MAX_OVERSIZING_FACTOR

      static final float DIRECT_ADDRESSING_MAX_OVERSIZING_FACTOR
      See Also:
    • dedupHash

      private final NodeHash<T> dedupHash
    • fst

      final FST<T> fst
    • NO_OUTPUT

      private final T NO_OUTPUT
    • minSuffixCount1

      private final int minSuffixCount1
    • minSuffixCount2

      private final int minSuffixCount2
    • doShareNonSingletonNodes

      private final boolean doShareNonSingletonNodes
    • shareMaxTailLength

      private final int shareMaxTailLength
    • lastInput

      private final IntsRefBuilder lastInput
    • frontier

      private FSTCompiler.UnCompiledNode<T>[] frontier
    • lastFrozenNode

      long lastFrozenNode
    • numBytesPerArc

      int[] numBytesPerArc
    • numLabelBytesPerArc

      int[] numLabelBytesPerArc
    • fixedLengthArcsBuffer

      final FSTCompiler.FixedLengthArcsBuffer fixedLengthArcsBuffer
    • arcCount

      long arcCount
    • nodeCount

      long nodeCount
    • binarySearchNodeCount

      long binarySearchNodeCount
    • directAddressingNodeCount

      long directAddressingNodeCount
    • allowFixedLengthArcs

      final boolean allowFixedLengthArcs
    • directAddressingMaxOversizingFactor

      final float directAddressingMaxOversizingFactor
    • directAddressingExpansionCredit

      long directAddressingExpansionCredit
    • bytes

      final BytesStore bytes
  • Constructor Details

    • FSTCompiler

      public FSTCompiler(FST.INPUT_TYPE inputType, Outputs<T> outputs)
      Instantiates an FST/FSA builder with default settings and pruning options turned off. For more tuning and tweaking, see FSTCompiler.Builder.
    • FSTCompiler

      private FSTCompiler(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doShareSuffix, boolean doShareNonSingletonNodes, int shareMaxTailLength, Outputs<T> outputs, boolean allowFixedLengthArcs, int bytesPageBits, float directAddressingMaxOversizingFactor)
  • Method Details