Class HistogramDiffIndex<S extends Sequence>

  • Type Parameters:
    S - type of the base sequence.

    final class HistogramDiffIndex<S extends Sequence>
    extends java.lang.Object
    Support HistogramDiff by computing occurrence counts of elements.

    Each element in the range being considered is put into a hash table, tracking the number of times that distinct element appears in the sequence. Once all elements have been inserted from sequence A, each element of sequence B is probed in the hash table and the longest common subsequence with the lowest occurrence count in A is used as the result.

    • Field Detail

      • maxChainLength

        private final int maxChainLength
      • region

        private final Edit region
      • keyShift

        private final int keyShift
        Number of low bits to discard from a key to index table.
      • recs

        private long[] recs
        Describes a unique element in sequence A. The records in this table are actually 3-tuples of:
        • index of next record in this table that has same hash code
        • index of first element in this occurrence chain
        • occurrence count for this element (length of locs list)
        The occurrence count is capped at MAX_CNT, as the field is only a few bits wide. Elements that occur more frequently will have their count capped.
      • recCnt

        private int recCnt
        Number of elements in recs; also is the unique element count.
      • next

        private int[] next
        For ptr, next[ptr - ptrShift] has subsequent index. For the sequence element ptr, the value stored at location next[ptr - ptrShift] is the next occurrence of the exact same element in the sequence. Chains always run from the lowest index to the largest index. Therefore the array will store next[1] = 2, but never next[2] = 1. This allows a chain to terminate with 0, as 0 would never be a valid next element. The array is sized to be region.getLengthA() and element indexes are converted to array indexes by subtracting ptrShift, which is just a cached version of region.beginA.
      • recIdx

        private int[] recIdx
        For element ptr in A, index of the record in recs array. The record at recs[recIdx[ptr - ptrShift]] is the record describing all occurrences of the element appearing in sequence A at position ptr. The record is needed to get the occurrence count of the element, or to locate all other occurrences of that element within sequence A. This index provides constant-time access to the record, and avoids needing to scan the hash chain.
      • ptrShift

        private int ptrShift
        Value to subtract from element indexes to key next array.
      • lcs

        private Edit lcs
      • cnt

        private int cnt
      • hasCommon

        private boolean hasCommon
    • Method Detail

      • findLongestCommonSequence

        Edit findLongestCommonSequence()
      • scanA

        private boolean scanA()
      • tryLongestCommonSequence

        private int tryLongestCommonSequence​(int bPtr)
      • recCreate

        private static long recCreate​(int next,
                                      int ptr,
                                      int cnt)
      • recNext

        private static int recNext​(long rec)
      • recPtr

        private static int recPtr​(long rec)
      • recCnt

        private static int recCnt​(long rec)
      • tableBits

        private static int tableBits​(int sz)