Class SimilarityRenameDetector
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate intFile size threshold (in bytes) for detecting renames.private static final intNumber of bits we need to express an index into src or dst list.All destinations to consider looking for a rename.private static final intprivate long[]Matrix of all examined file pairs, and their scores.private ContentSource.Pairprivate intScore a pair must exceed to be considered a rename.private static final intprivate booleanSkip content renames for binary files.All sources to consider for copies or renames.private booleanSet if anySimilarityIndex.TableFullExceptionoccurs. -
Constructor Summary
ConstructorsConstructorDescriptionSimilarityRenameDetector(ContentSource.Pair reader, List<DiffEntry> srcs, List<DiffEntry> dsts) -
Method Summary
Modifier and TypeMethodDescriptionprivate intcompactDstList(List<DiffEntry> in) compactSrcList(List<DiffEntry> in) (package private) voidprivate static intdecodeFile(int v) (package private) static intdstFile(long value) (package private) static longencode(int score, int srcIdx, int dstIdx) private static longencodeFile(int idx) private SimilarityIndexhash(ObjectLoader objectLoader) private static boolean(package private) boolean(package private) static intprivate static intscore(long value) (package private) voidsetBigFileThreshold(int threshold) (package private) voidsetRenameScore(int score) (package private) voidsetSkipBinaryFiles(boolean value) private longsize(DiffEntry.Side side, DiffEntry ent) (package private) static intsrcFile(long value)
-
Field Details
-
BITS_PER_INDEX
private static final int BITS_PER_INDEXNumber of bits we need to express an index into src or dst list.This must be 28, giving us a limit of 2^28 entries in either list, which is an insane limit of 536,870,912 file names being considered in a single rename pass. The other 8 bits are used to store the score, while staying under 127 so the long doesn't go negative.
- See Also:
-
INDEX_MASK
private static final int INDEX_MASK- See Also:
-
SCORE_SHIFT
private static final int SCORE_SHIFT- See Also:
-
reader
-
srcs
All sources to consider for copies or renames.A source is typically a
DiffEntry.ChangeType.DELETEchange, but could be another type when trying to perform copy detection concurrently with rename detection. -
dsts
All destinations to consider looking for a rename.A destination is typically an
DiffEntry.ChangeType.ADD, as the name has just come into existence, and we want to discover where its initial content came from. -
matrix
private long[] matrixMatrix of all examined file pairs, and their scores.The upper 8 bits of each long stores the score, but the score is bounded to be in the range (0, 128] so that the highest bit is never set, and all entries are therefore positive.
List indexes to an element of
srcsanddstsare encoded as the lower two groups of 28 bits, respectively, but the encoding is inverted, so that 0 is expressed as(1 << 28) - 1. This sorts lower list indices later in the matrix, giving precedence to files whose names sort earlier in the tree. -
renameScore
private int renameScoreScore a pair must exceed to be considered a rename. -
bigFileThreshold
private int bigFileThresholdFile size threshold (in bytes) for detecting renames. Files larger than this size will not be processed for renames. -
skipBinaryFiles
private boolean skipBinaryFilesSkip content renames for binary files. -
tableOverflow
private boolean tableOverflowSet if anySimilarityIndex.TableFullExceptionoccurs. -
out
-
-
Constructor Details
-
SimilarityRenameDetector
SimilarityRenameDetector(ContentSource.Pair reader, List<DiffEntry> srcs, List<DiffEntry> dsts)
-
-
Method Details
-
setRenameScore
void setRenameScore(int score) -
setBigFileThreshold
void setBigFileThreshold(int threshold) -
setSkipBinaryFiles
void setSkipBinaryFiles(boolean value) -
compute
- Throws:
IOExceptionCanceledException
-
getMatches
-
getLeftOverSources
-
getLeftOverDestinations
-
isTableOverflow
boolean isTableOverflow() -
compactSrcList
-
compactDstList
-
buildMatrix
- Throws:
IOExceptionCanceledException
-
nameScore
-
hash
private SimilarityIndex hash(ObjectLoader objectLoader) throws IOException, SimilarityIndex.TableFullException -
size
- Throws:
IOException
-
score
private static int score(long value) -
srcFile
static int srcFile(long value) -
dstFile
static int dstFile(long value) -
encode
static long encode(int score, int srcIdx, int dstIdx) -
encodeFile
private static long encodeFile(int idx) -
decodeFile
private static int decodeFile(int v) -
isFile
-