Package com.fasterxml.jackson.core.sym
Class ByteQuadsCanonicalizer
- java.lang.Object
-
- com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer
-
public final class ByteQuadsCanonicalizer extends java.lang.Object
Replacement forBytesToNameCanonicalizer
which aims at more localized memory access due to flattening of name quad data. Performance improvement modest for simple JSON document data binding (maybe 3%), but should help more for larger symbol tables, or for binary formats like Smile.Hash area is divided into 4 sections:
- Primary area (1/2 of total size), direct match from hash (LSB)
- Secondary area (1/4 of total size), match from
hash (LSB) >> 1
- Tertiary area (1/8 of total size), match from
hash (LSB) >> 2
- Spill-over area (remaining 1/8) with linear scan, insertion order
int
s, where 1 - 3 ints contain 1 - 12 UTF-8 encoded bytes of name (null-padded), and last int is offset in_names
that contains actual name Strings.- Since:
- 2.6
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
ByteQuadsCanonicalizer.TableInfo
Immutable value class used for sharing information as efficiently as possible, by only require synchronization of reference manipulation but not access to contents.
-
Field Summary
Fields Modifier and Type Field Description private int
_count
Total number of Strings in the symbol table; only used for child tables.private boolean
_failOnDoS
Flag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).private int[]
_hashArea
Primary hash information area: consists of2 * _hashSize
entries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).private boolean
_hashShared
Flag that indicates whether underlying data structures for the main hash area are shared or not.private int
_hashSize
Number of slots for primary entries within_hashArea
; which is at most1/8
of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).private boolean
_intern
Whether canonical symbol Strings are to be intern()ed before added to the table or not.private int
_longNameOffset
Offset within_hashArea
that follows main slots and contains quads for longer names (13 bytes or longer), and points to the first available int that may be used for appending quads of the next long name.private java.lang.String[]
_names
Array that containsString
instances matching entries in_hashArea
.private ByteQuadsCanonicalizer
_parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.private int
_secondaryStart
Offset within_hashArea
where secondary entries startprivate int
_seed
Seed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance.private int
_spilloverEnd
Pointer to the offset within spill-over area where there is room for more spilled over entries (if any).private java.util.concurrent.atomic.AtomicReference<ByteQuadsCanonicalizer.TableInfo>
_tableInfo
Member that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table.private int
_tertiaryShift
Constant that determines size of buckets for tertiary entries:1 << _tertiaryShift
is the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by4 + _tertiaryShift
).private int
_tertiaryStart
Offset within_hashArea
where tertiary entries startprivate static int
DEFAULT_T_SIZE
Initial size of the primary hash area.(package private) static int
MAX_ENTRIES_FOR_REUSE
Let's only share reasonably sized symbol tables.private static int
MAX_T_SIZE
Let's not expand symbol tables past some maximum size; with unique (~= random) names.private static int
MIN_HASH_SIZE
No point in trying to construct tiny tables, just need to resize soon.private static int
MULT
private static int
MULT2
private static int
MULT3
-
Constructor Summary
Constructors Modifier Constructor Description private
ByteQuadsCanonicalizer(int sz, boolean intern, int seed, boolean failOnDoS)
Constructor used for creating per-JsonFactory
"root" symbol tables: ones used for merging and sharing common symbolsprivate
ByteQuadsCanonicalizer(ByteQuadsCanonicalizer parent, boolean intern, int seed, boolean failOnDoS, ByteQuadsCanonicalizer.TableInfo state)
Constructor used when creating a child instance
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
_appendLongName(int[] quads, int qlen)
private int
_calcOffset(int hash)
(package private) static int
_calcTertiaryShift(int primarySlots)
private boolean
_checkNeedForRehash()
private int
_findOffsetForAdd(int hash)
Method called to find the location within hash table to add a new symbol in.private java.lang.String
_findSecondary(int origOffset, int q1)
private java.lang.String
_findSecondary(int origOffset, int q1, int q2)
private java.lang.String
_findSecondary(int origOffset, int hash, int[] q, int qlen)
private java.lang.String
_findSecondary(int origOffset, int q1, int q2, int q3)
protected void
_reportTooManyCollisions()
private int
_resizeAndFindOffsetForAdd(int hash)
private int
_spilloverStart()
Helper method that calculates start of the spillover areaprivate boolean
_verifyLongName(int[] q, int qlen, int spillOffset)
private boolean
_verifyLongName2(int[] q, int qlen, int spillOffset)
private void
_verifySharing()
java.lang.String
addName(java.lang.String name, int q1)
java.lang.String
addName(java.lang.String name, int[] q, int qlen)
java.lang.String
addName(java.lang.String name, int q1, int q2)
java.lang.String
addName(java.lang.String name, int q1, int q2, int q3)
int
bucketCount()
Returns number of primary slots table has currentlyint
calcHash(int q1)
int
calcHash(int[] q, int qlen)
int
calcHash(int q1, int q2)
int
calcHash(int q1, int q2, int q3)
static ByteQuadsCanonicalizer
createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.protected static ByteQuadsCanonicalizer
createRoot(int seed)
java.lang.String
findName(int q1)
java.lang.String
findName(int[] q, int qlen)
java.lang.String
findName(int q1, int q2)
java.lang.String
findName(int q1, int q2, int q3)
int
hashSeed()
ByteQuadsCanonicalizer
makeChild(int flags)
Factory method used to create actual symbol table instance to use for parsing.boolean
maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries.private void
mergeChild(ByteQuadsCanonicalizer.TableInfo childState)
private void
nukeSymbols(boolean fill)
Helper method called to empty all shared symbols, but to leave arrays allocatedint
primaryCount()
Method mostly needed by unit tests; calculates number of entries that are in the primary slot set.private void
rehash()
void
release()
Method called by the using code to indicate it is done with this instance.int
secondaryCount()
Method mostly needed by unit tests; calculates number of entries in secondary bucketsint
size()
int
spilloverCount()
Method mostly needed by unit tests; calculates number of entries in shared spillover areaint
tertiaryCount()
Method mostly needed by unit tests; calculates number of entries in tertiary bucketsjava.lang.String
toString()
int
totalCount()
-
-
-
Field Detail
-
DEFAULT_T_SIZE
private static final int DEFAULT_T_SIZE
Initial size of the primary hash area. Each entry consumes 4 ints (16 bytes), and secondary area is same as primary; so default size will use 2kB of memory (plus 64x4 or 64x8 (256/512 bytes) for references to Strings, and Strings themselves).- See Also:
- Constant Field Values
-
MAX_T_SIZE
private static final int MAX_T_SIZE
Let's not expand symbol tables past some maximum size; with unique (~= random) names. Size is in- See Also:
- Constant Field Values
-
MIN_HASH_SIZE
private static final int MIN_HASH_SIZE
No point in trying to construct tiny tables, just need to resize soon.- See Also:
- Constant Field Values
-
MAX_ENTRIES_FOR_REUSE
static final int MAX_ENTRIES_FOR_REUSE
Let's only share reasonably sized symbol tables. Max size set to 3/4 of 8k; this corresponds to 256k main hash index. This should allow for enough distinct names for almost any case, while preventing ballooning for cases where names are unique (or close thereof).- See Also:
- Constant Field Values
-
_parent
private final ByteQuadsCanonicalizer _parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.
-
_tableInfo
private final java.util.concurrent.atomic.AtomicReference<ByteQuadsCanonicalizer.TableInfo> _tableInfo
Member that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table. Child tables do NOT use the reference.
-
_seed
private final int _seed
Seed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance. This is done for security reasons, to avoid potential DoS attack via hash collisions.
-
_intern
private boolean _intern
Whether canonical symbol Strings are to be intern()ed before added to the table or not.NOTE: non-final to allow disabling intern()ing in case of excessive collisions.
-
_failOnDoS
private final boolean _failOnDoS
Flag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).- Since:
- 2.4
-
_hashArea
private int[] _hashArea
Primary hash information area: consists of2 * _hashSize
entries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).
-
_hashSize
private int _hashSize
Number of slots for primary entries within_hashArea
; which is at most1/8
of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).
-
_secondaryStart
private int _secondaryStart
Offset within_hashArea
where secondary entries start
-
_tertiaryStart
private int _tertiaryStart
Offset within_hashArea
where tertiary entries start
-
_tertiaryShift
private int _tertiaryShift
Constant that determines size of buckets for tertiary entries:1 << _tertiaryShift
is the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by4 + _tertiaryShift
).Default value is 2, for buckets of 4 slots; grows bigger with bigger table sizes.
-
_count
private int _count
Total number of Strings in the symbol table; only used for child tables.
-
_names
private java.lang.String[] _names
-
_spilloverEnd
private int _spilloverEnd
Pointer to the offset within spill-over area where there is room for more spilled over entries (if any). Spill over area is within fixed-size portion of_hashArea
.
-
_longNameOffset
private int _longNameOffset
-
_hashShared
private boolean _hashShared
Flag that indicates whether underlying data structures for the main hash area are shared or not. If they are, then they need to be handled in copy-on-write way, i.e. if they need to be modified, a copy needs to be made first; at this point it will not be shared any more, and can be modified.This flag needs to be checked both when adding new main entries, and when adding new collision list queues (i.e. creating a new collision list head entry)
-
MULT
private static final int MULT
- See Also:
- Constant Field Values
-
MULT2
private static final int MULT2
- See Also:
- Constant Field Values
-
MULT3
private static final int MULT3
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ByteQuadsCanonicalizer
private ByteQuadsCanonicalizer(int sz, boolean intern, int seed, boolean failOnDoS)
Constructor used for creating per-JsonFactory
"root" symbol tables: ones used for merging and sharing common symbols- Parameters:
sz
- Initial primary hash area sizeintern
- Whether Strings contained should beString.intern()
edseed
- Random seed valued used to make it more difficult to cause collisions (used for collision-based DoS attacks).
-
ByteQuadsCanonicalizer
private ByteQuadsCanonicalizer(ByteQuadsCanonicalizer parent, boolean intern, int seed, boolean failOnDoS, ByteQuadsCanonicalizer.TableInfo state)
Constructor used when creating a child instance
-
-
Method Detail
-
createRoot
public static ByteQuadsCanonicalizer createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.
-
createRoot
protected static ByteQuadsCanonicalizer createRoot(int seed)
-
makeChild
public ByteQuadsCanonicalizer makeChild(int flags)
Factory method used to create actual symbol table instance to use for parsing.
-
release
public void release()
Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information.
-
mergeChild
private void mergeChild(ByteQuadsCanonicalizer.TableInfo childState)
-
size
public int size()
-
bucketCount
public int bucketCount()
Returns number of primary slots table has currently
-
maybeDirty
public boolean maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.
-
hashSeed
public int hashSeed()
-
primaryCount
public int primaryCount()
Method mostly needed by unit tests; calculates number of entries that are in the primary slot set. These are "perfect" entries, accessible with a single lookup
-
secondaryCount
public int secondaryCount()
Method mostly needed by unit tests; calculates number of entries in secondary buckets
-
tertiaryCount
public int tertiaryCount()
Method mostly needed by unit tests; calculates number of entries in tertiary buckets
-
spilloverCount
public int spilloverCount()
Method mostly needed by unit tests; calculates number of entries in shared spillover area
-
totalCount
public int totalCount()
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
findName
public java.lang.String findName(int q1)
-
findName
public java.lang.String findName(int q1, int q2)
-
findName
public java.lang.String findName(int q1, int q2, int q3)
-
findName
public java.lang.String findName(int[] q, int qlen)
-
_calcOffset
private final int _calcOffset(int hash)
-
_findSecondary
private java.lang.String _findSecondary(int origOffset, int q1)
-
_findSecondary
private java.lang.String _findSecondary(int origOffset, int q1, int q2)
-
_findSecondary
private java.lang.String _findSecondary(int origOffset, int q1, int q2, int q3)
-
_findSecondary
private java.lang.String _findSecondary(int origOffset, int hash, int[] q, int qlen)
-
_verifyLongName
private boolean _verifyLongName(int[] q, int qlen, int spillOffset)
-
_verifyLongName2
private boolean _verifyLongName2(int[] q, int qlen, int spillOffset)
-
addName
public java.lang.String addName(java.lang.String name, int q1)
-
addName
public java.lang.String addName(java.lang.String name, int q1, int q2)
-
addName
public java.lang.String addName(java.lang.String name, int q1, int q2, int q3)
-
addName
public java.lang.String addName(java.lang.String name, int[] q, int qlen)
-
_verifySharing
private void _verifySharing()
-
_findOffsetForAdd
private int _findOffsetForAdd(int hash)
Method called to find the location within hash table to add a new symbol in.
-
_resizeAndFindOffsetForAdd
private int _resizeAndFindOffsetForAdd(int hash)
-
_checkNeedForRehash
private boolean _checkNeedForRehash()
-
_appendLongName
private int _appendLongName(int[] quads, int qlen)
-
calcHash
public int calcHash(int q1)
-
calcHash
public int calcHash(int q1, int q2)
-
calcHash
public int calcHash(int q1, int q2, int q3)
-
calcHash
public int calcHash(int[] q, int qlen)
-
rehash
private void rehash()
-
nukeSymbols
private void nukeSymbols(boolean fill)
Helper method called to empty all shared symbols, but to leave arrays allocated
-
_spilloverStart
private final int _spilloverStart()
Helper method that calculates start of the spillover area
-
_reportTooManyCollisions
protected void _reportTooManyCollisions()
-
_calcTertiaryShift
static int _calcTertiaryShift(int primarySlots)
-
-