Metrics
Similarity
To customize similarity calculation, implement your desired way by inheriting the Similarity class. The comparer calculates the similarity between two files by calling the cmp
member function.
JCompare.metrics.Similarity
Base class for implementing similarity metrics.
This class provides a structure for implementing different similarity metrics.
Subclasses should override the cmp
method to provide their own similarity calculation logic.
Source code in JCompare/metrics.py
__init__
cmp
Compares two files for similarity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file1 |
tuple[str, str]
|
A tuple containing the full path and the relative path of the first file. |
required |
file2 |
tuple[str, str]
|
A tuple containing the full path and the relative path of the second file. |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method must be implemented in a subclass. |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The similarity score between the two files. The score is a float between 0 and 1, where 0 means completely dissimilar and 1 means identical. |
Source code in JCompare/metrics.py
JCompare.CompressionSimilarity
Bases: Similarity
A class used to compare the similarity between two files based on their compression ratio.
This class inherits from the Similarity
base class and overrides the cmp
method to provide a similarity metric based on the compression ratio of the files. The compression ratio is calculated using a specified compression algorithm.
Source code in JCompare/metrics.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
__init__
Initializes a new instance of the CompressionSimilarity class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
algorithm |
str
|
The compression algorithm to use. Supported algorithms are 'lzma', 'lzma2', 'zstd', and 'brotli'. |
required |
level |
int
|
The compression level. If not provided, a default level will be used based on the algorithm. |
None
|
chunk_size |
int
|
The size of the chunks to read from the files. Defaults to 64 kilobytes. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If an unsupported algorithm is given. |
Source code in JCompare/metrics.py
cmp
Compares two files for similarity based on the compression ratio.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file1 |
tuple[str, str]
|
A tuple containing the full path and the relative path of the first file. |
required |
file2 |
tuple[str, str]
|
A tuple containing the full path and the relative path of the second file. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If an unsupported algorithm is given. |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The similarity score between the two files. The score is a float between 0 and 1, where 0 means completely dissimilar and 1 means identical. |