Package org.apache.lucene.util.hnsw
Class HnswGraphBuilder<T>
java.lang.Object
org.apache.lucene.util.hnsw.HnswGraphBuilder<T>
- Type Parameters:
T
- the type of vector
Builder for HNSW graph. See
HnswGraph
for a gloss on the algorithm and the meaning of the
hyperparameters.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final int
private static final long
Default random seed for level generation *private final HnswGraphSearcher<T>
(package private) final OnHeapHnswGraph
static final String
A name for the HNSW component for the info-stream *private InfoStream
private final int
private final double
private final SplittableRandom
static long
Random seed for level generation; public to expose for testing *private final NeighborArray
private final VectorSimilarityFunction
private final VectorEncoding
private final RandomAccessVectorValues
private final RandomAccessVectorValues
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
HnswGraphBuilder
(RandomAccessVectorValues vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) Reads all the vectors from a VectorValues, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph. -
Method Summary
Modifier and TypeMethodDescriptionprivate void
addDiverseNeighbors
(int level, int node, NeighborQueue candidates) void
addGraphNode
(int node, RandomAccessVectorValues values) void
addGraphNode
(int node, T value) Inserts a doc with vector value to the graphprivate void
addVectors
(RandomAccessVectorValues vectorsToAdd) build
(RandomAccessVectorValues vectorsToAdd) Reads all the vectors from two copies of a random access VectorValues.static HnswGraphBuilder<?>
create
(RandomAccessVectorValues vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) private boolean
diversityCheck
(int candidate, float score, NeighborArray neighbors) private int
findWorstNonDiverse
(NeighborArray neighbors) Find first non-diverse neighbour among the list of neighbors starting from the most distant neighboursgetGraph()
private static int
getRandomGraphLevel
(double ml, SplittableRandom random) private T
getValue
(int node, RandomAccessVectorValues values) private boolean
isDiverse
(float[] candidate, NeighborArray neighbors, float score) private boolean
isDiverse
(int candidate, NeighborArray neighbors, float score) private boolean
isDiverse
(BytesRef candidate, NeighborArray neighbors, float score) private boolean
isWorstNonDiverse
(int candidateIndex, float[] candidateVector, NeighborArray neighbors) private boolean
isWorstNonDiverse
(int candidateIndex, BytesRef candidateVector, NeighborArray neighbors) private boolean
isWorstNonDiverse
(int candidateIndex, NeighborArray neighbors) private void
popToScratch
(NeighborQueue candidates) private long
printGraphBuildStatus
(int node, long start, long t) private void
selectAndLinkDiverse
(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel) void
setInfoStream
(InfoStream infoStream) Set info-stream to output debugging information *
-
Field Details
-
DEFAULT_RAND_SEED
private static final long DEFAULT_RAND_SEEDDefault random seed for level generation *- See Also:
-
HNSW_COMPONENT
A name for the HNSW component for the info-stream *- See Also:
-
randSeed
public static long randSeedRandom seed for level generation; public to expose for testing * -
M
private final int M -
beamWidth
private final int beamWidth -
ml
private final double ml -
scratch
-
similarityFunction
-
vectorEncoding
-
vectors
-
random
-
graphSearcher
-
hnsw
-
infoStream
-
vectorsCopy
-
-
Constructor Details
-
HnswGraphBuilder
private HnswGraphBuilder(RandomAccessVectorValues vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) throws IOException Reads all the vectors from a VectorValues, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
vectors
- the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.M
- – graph fanout parameter used to calculate the maximum number of connections a node can have – M on upper layers, and M * 2 on the lowest level.beamWidth
- the size of the beam search to use when finding nearest neighbors.seed
- the seed for a random number generator used during graph construction. Provide this to ensure repeatable construction.- Throws:
IOException
-
-
Method Details
-
create
public static HnswGraphBuilder<?> create(RandomAccessVectorValues vectors, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) throws IOException - Throws:
IOException
-
build
Reads all the vectors from two copies of a random access VectorValues. Providing two copies enables efficient retrieval without extra data copying, while avoiding collision of the returned values.- Parameters:
vectorsToAdd
- the vectors for which to build a nearest neighbors graph. Must be an independent accessor for the vectors- Throws:
IOException
-
addVectors
- Throws:
IOException
-
setInfoStream
Set info-stream to output debugging information * -
getGraph
-
addGraphNode
Inserts a doc with vector value to the graph- Throws:
IOException
-
addGraphNode
- Throws:
IOException
-
getValue
- Throws:
IOException
-
printGraphBuildStatus
private long printGraphBuildStatus(int node, long start, long t) -
addDiverseNeighbors
- Throws:
IOException
-
selectAndLinkDiverse
private void selectAndLinkDiverse(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel) throws IOException - Throws:
IOException
-
popToScratch
-
diversityCheck
private boolean diversityCheck(int candidate, float score, NeighborArray neighbors) throws IOException - Parameters:
candidate
- the vector of a new candidate neighbor of a node nscore
- the score of the new candidate and node n, to be compared with scores of the candidate and n's neighborsneighbors
- the neighbors selected so far- Returns:
- whether the candidate is diverse given the existing neighbors
- Throws:
IOException
-
isDiverse
- Throws:
IOException
-
isDiverse
private boolean isDiverse(float[] candidate, NeighborArray neighbors, float score) throws IOException - Throws:
IOException
-
isDiverse
private boolean isDiverse(BytesRef candidate, NeighborArray neighbors, float score) throws IOException - Throws:
IOException
-
findWorstNonDiverse
Find first non-diverse neighbour among the list of neighbors starting from the most distant neighbours- Throws:
IOException
-
isWorstNonDiverse
- Throws:
IOException
-
isWorstNonDiverse
private boolean isWorstNonDiverse(int candidateIndex, float[] candidateVector, NeighborArray neighbors) throws IOException - Throws:
IOException
-
isWorstNonDiverse
private boolean isWorstNonDiverse(int candidateIndex, BytesRef candidateVector, NeighborArray neighbors) throws IOException - Throws:
IOException
-
getRandomGraphLevel
-