Package org.apache.lucene.document
Document
for indexing and searching.
The document package provides the user level logical representation of content to be indexed and searched. The
package also provides utilities for working with Document
s and IndexableField
s.
Document and IndexableField
A Document
is a collection of IndexableField
s. A
IndexableField
is a logical representation of a user's content that needs to be indexed or stored.
IndexableField
s have a number of properties that tell Lucene how to treat the content (like indexed, tokenized,
stored, etc.) See the Field
implementation of IndexableField
for specifics on these properties.
Note: it is common to refer to Document
s having Field
s, even though technically they have
IndexableField
s.
Working with Documents
First and foremost, a Document
is something created by the user application. It is your job
to create Documents based on the content of the files you are working with in your application (Word, txt, PDF, Excel or any other format.)
How this is done is completely up to you. That being said, there are many tools available in other projects that can make
the process of taking a file and converting it into a Lucene Document
.
The DateTools
is a utility class to make dates and times searchable. IntPoint
, LongPoint
,
FloatPoint
and DoublePoint
enable indexing
of numeric values (and also dates) for fast range queries using PointRangeQuery
-
Class Summary Class Description BigIntegerPoint An indexed 128-bitBigInteger
field.BinaryDocValuesField Field that stores a per-documentBytesRef
value.BinaryPoint An indexed binary field for fast range filters.BinaryRangeDocValues BinaryRangeDocValuesField BinaryRangeFieldRangeQuery DateTools Provides support for converting dates to strings and vice-versa.Document Documents are the unit of indexing and search.DocumentStoredFieldVisitor AStoredFieldVisitor
that creates aDocument
from stored fields.DoubleDocValuesField Syntactic sugar for encoding doubles as NumericDocValues viaDouble.doubleToRawLongBits(double)
.DoublePoint An indexeddouble
field for fast range filters.DoublePointMultiRangeBuilder Builder for multi range queries for DoublePointsDoubleRange An indexed Double Range field.DoubleRangeDocValuesField DocValues field for DoubleRange.DoubleRangeSlowRangeQuery FeatureDoubleValuesSource ADoubleValuesSource
instance which can be used to read the values of a feature from aFeatureField
for documents.FeatureDoubleValuesSource.FeatureDoubleValues FeatureField Field
that can be used to store static scoring factors into documents.FeatureField.FeatureFunction FeatureField.FeatureTokenStream FeatureField.LogFunction FeatureField.SaturationFunction FeatureField.SigmoidFunction FeatureQuery FeatureSortField Sorts using the value of a specified feature name from aFeatureField
.Field Expert: directly create a field for a document.Field.BinaryTokenStream Field.StringTokenStream FieldType Describes the properties of a field.FloatDocValuesField Syntactic sugar for encoding floats as NumericDocValues viaFloat.floatToRawIntBits(float)
.FloatPoint An indexedfloat
field for fast range filters.FloatPointMultiRangeBuilder Builder for multi range queries for FloatPointsFloatPointNearestNeighbor KNN search on top of N dimensional indexed float points.FloatPointNearestNeighbor.Cell FloatPointNearestNeighbor.NearestHit FloatPointNearestNeighbor.NearestVisitor FloatRange An indexed Float Range field.FloatRangeDocValuesField DocValues field for FloatRange.FloatRangeSlowRangeQuery HalfFloatPoint An indexedhalf-float
field for fast range filters.InetAddressPoint An indexed 128-bitInetAddress
field.InetAddressRange An indexed InetAddress Range FieldIntPoint An indexedint
field for fast range filters.IntPointMultiRangeBuilder Builder for multi range queries for IntPointsIntRange An indexed Integer Range field.IntRangeDocValuesField DocValues field for IntRange.IntRangeSlowRangeQuery LatLonBoundingBox An indexed 2-Dimension Bounding Box field for the Geospatial Lat/Lon Coordinate systemLatLonDocValuesBoxQuery Distance query forLatLonDocValuesField
.LatLonDocValuesDistanceQuery Distance query forLatLonDocValuesField
.LatLonDocValuesField An per-document location field.LatLonDocValuesPointInPolygonQuery Polygon query forLatLonDocValuesField
.LatLonPoint An indexed location field.LatLonPointDistanceComparator Compares documents by distance from an origin pointLatLonPointDistanceFeatureQuery LatLonPointDistanceQuery Distance query forLatLonPoint
.LatLonPointInPolygonQuery Finds all previously indexed points that fall within the specified polygons.LatLonPointSortField Sorts by distance from an origin location.LatLonShape An geo shape utility class for indexing and searching gis geometries whose vertices are latitude, longitude values (in decimal degrees).LatLonShapeBoundingBoxQuery Finds all previously indexed geo shapes that intersect the specified bounding box.LatLonShapeLineQuery Finds all previously indexed geo shapes that intersect the specified arbitraryLine
.LatLonShapePolygonQuery Finds all previously indexed geo shapes that intersect the specified arbitrary.LazyDocument Defers actually loading a field's value until you ask for it.LongDistanceFeatureQuery LongPoint An indexedlong
field for fast range filters.LongPointMultiRangeBuilder Builder for multi range queries for LongPointsLongRange An indexed Long Range field.LongRangeDocValuesField DocValues field for LongRange.LongRangeSlowRangeQuery NumericDocValuesField Field that stores a per-documentlong
value for scoring, sorting or value retrieval.RangeFieldQuery Query class for searchingRangeField
types by a definedPointValues.Relation
.ShapeField A base shape utility class used for both LatLon (spherical) and XY (cartesian) shape fields.ShapeField.DecodedTriangle Represents a encoded triangle usingShapeField.decodeTriangle(byte[], DecodedTriangle)
.ShapeField.Triangle polygons are decomposed into tessellated triangles usingTessellator
these triangles are encoded and inserted as separate indexed POINT fieldsShapeQuery Base query class for all spatial geometries:LatLonShape
andXYShape
.ShapeQuery.RelationScorerSupplier utility class for implementing constant score logic specific to INTERSECT, WITHIN, and DISJOINTSortedDocValuesField Field that stores a per-documentBytesRef
value, indexed for sorting.SortedNumericDocValuesField Field that stores a per-documentlong
values for scoring, sorting or value retrieval.SortedNumericDocValuesRangeQuery SortedSetDocValuesField Field that stores a set of per-documentBytesRef
values, indexed for faceting,grouping,joining.SortedSetDocValuesRangeQuery StoredField A field whose value is stored so thatIndexSearcher.doc(int)
andIndexReader.document()
will return the field and its value.StringField A field that is indexed but not tokenized: the entire String value is indexed as a single token.TextField A field that is indexed and tokenized, without term vectors.XYShape A cartesian shape utility class for indexing and searching geometries whose vertices are unitless x, y values.XYShapeBoundingBoxQuery Finds all previously indexed cartesian shapes that intersect the specified bounding box.XYShapeLineQuery Finds all previously indexed cartesian shapes that intersect the specified arbitraryXYLine
.XYShapePolygonQuery Finds all previously indexed cartesian shapes that intersect the specified arbitrary cartesianXYPolygon
. -
Enum Summary Enum Description DateTools.Resolution Specifies the time granularity.Field.Store Specifies whether and how a field should be stored.RangeFieldQuery.QueryType Used byRangeFieldQuery
to check how each internal or leaf node relates to the query.ShapeField.QueryRelation Query Relation Types