Core API
Compare by Similarity
The similarity calculation is customizable; visit the metrics section for more information.
JCompare.similarity.find_similar_files_pairwise
find_similar_files_pairwise(folder1: Folder, folder2: Folder, threshold: float, same_parent_only: bool, comparer: Similarity, mode: int) -> dict[str, list[tuple[str, float]]]
Finds similar files between two folders in a pairwise manner.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder1 |
Folder
|
The first folder object, which contains the files to be compared. |
required |
folder2 |
Folder
|
The second folder object, which contains the files to be compared. |
required |
threshold |
float
|
The similarity threshold. Only pairs of files with a similarity score equal to or above this threshold will be included in the result. |
required |
same_parent_only |
bool
|
If set to True, only files with the same parent directory will be compared. |
required |
comparer |
Similarity
|
The similarity comparer object used to compare the files. |
required |
mode |
int
|
The mode of operation. If set to SYNC, the function will use synchronous I/O. If set to ASYNC, the function will use asynchronous I/O. If set to ASYNC_AND_MULTIPROCESS, the function will use both asynchronous I/O and multiprocessing. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If an invalid mode is given. |
Returns:
Type | Description |
---|---|
dict[str, list[tuple[str, float]]]
|
dict[str, list[tuple[str, float]]]: A dictionary where each key is the relative path of a file in the first folder and each value is a list of tuples. Each tuple contains the relative path of a similar file in the second folder and the similarity score. |
Source code in JCompare/similarity.py
JCompare.similarity.find_dissimilar_files_pairwise
find_dissimilar_files_pairwise(folder1: Folder, folder2: Folder, threshold: float, same_parent_only: bool, comparer: Similarity, mode: int) -> dict[str, Union[list[str], bool]]
Finds dissimilar files between two folders in a pairwise manner.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder1 |
Folder
|
The first folder object, which contains the files to be compared. |
required |
folder2 |
Folder
|
The second folder object, which contains the files to be compared. |
required |
threshold |
float
|
The similarity threshold. Only pairs of files with a similarity score below this threshold will be included in the result. |
required |
same_parent_only |
bool
|
If set to True, only files with the same parent directory will be compared. |
required |
comparer |
Similarity
|
The similarity comparer object used to compare the files. |
required |
mode |
int
|
The mode of operation. If set to SYNC, the function will use synchronous I/O. If set to ASYNC, the function will use asynchronous I/O. If set to ASYNC_AND_MULTIPROCESS, the function will use both asynchronous I/O and multiprocessing. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If an invalid mode is given. |
Returns:
Type | Description |
---|---|
dict[str, Union[list[str], bool]]
|
dict[str, Union[list[str], bool]]: A dictionary with three keys: 'folder1', 'folder2', and 'is_similar'. The value of 'folder1' is a list of the relative paths of the dissimilar files in the first folder. The value of 'folder2' is a list of the relative paths of the dissimilar files in the second folder. The value of 'is_similar' is a boolean indicating whether the two folders are similar. |
Source code in JCompare/similarity.py
Compare by Hash
JCompare.hash.find_identical_files
find_identical_files(folder1: Folder, folder2: Folder, same_parent_only: bool, hash_algorithm: tuple[str]) -> dict[str, list[str]]
Finds identical files between two folders based on their hash values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder1 |
Folder
|
The first folder object, which contains the files to be compared. |
required |
folder2 |
Folder
|
The second folder object, which contains the files to be compared. |
required |
same_parent_only |
bool
|
If set to True, only files with the same parent folder will be compared. |
required |
hash_algorithm |
tuple[str]
|
A tuple of strings specifying the names of the hash algorithms to use. |
required |
Returns:
Type | Description |
---|---|
dict[str, list[str]]
|
dict[str, list[str]]: A dictionary mapping the relative paths of the identical files in the first folder to lists of the relative paths of the identical files in the second folder. |
Source code in JCompare/hash.py
JCompare.hash.find_different_files
find_different_files(folder1: Folder, folder2: Folder, same_parent_only: bool, hash_algorithm: tuple[str]) -> dict[str, Union[list[str], bool]]
Finds different files between two folders based on their hash values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder1 |
Folder
|
The first folder object, which contains the files to be compared. |
required |
folder2 |
Folder
|
The second folder object, which contains the files to be compared. |
required |
same_parent_only |
bool
|
If set to True, only files with the same parent folder will be compared. |
required |
hash_algorithm |
tuple[str]
|
A tuple of strings specifying the names of the hash algorithms to use. |
required |
Returns:
Type | Description |
---|---|
dict[str, Union[list[str], bool]]
|
dict[str, Union[list[str], bool]]: A dictionary with keys "folder1", "folder2", and "is_identical". The values for "folder1" and "folder2" are lists of the relative paths of the different files in the respective folders. The value for "is_identical" is a boolean indicating whether the two folders are identical. |
Source code in JCompare/hash.py
Compare by Directory Structure
JCompare.mcs.find_identical_files_by_mcs
find_identical_files_by_mcs(folder1: Folder, folder2: Folder, ignore_directory_names: bool = False, path: None | tuple[tuple[str], tuple[str]] = None) -> list[dict[str, list[str]]]
Finds identical files between two folders based on the maximum common subtree (MCS).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder1 |
Folder
|
The first folder object, which contains the files to be compared. |
required |
folder2 |
Folder
|
The second folder object, which contains the files to be compared. |
required |
ignore_directory_names |
bool
|
If set to True, directory names will be ignored when comparing the folder structures. Defaults to False. |
False
|
path |
None | tuple[tuple[str], tuple[str]]
|
A tuple of two tuples, each containing the path to a subtree in the corresponding folder. If provided, only the specified subtrees will be compared. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[dict[str, list[str]]]
|
list[dict[str, list[str]]]: A list of dictionaries. Each dictionary represents a set of identical files in an MCS (there might be multiple), with the keys being the relative paths of the files in the first folder and the values being lists of the relative paths of the identical files in the second folder. |
Source code in JCompare/mcs.py
JCompare.mcs.find_different_files_by_mcs
find_different_files_by_mcs(folder1: Folder, folder2: Folder, ignore_directory_names: bool = False, path: None | tuple[tuple[str], tuple[str]] = None) -> list[dict[str, list[str] | str, bool]]
Finds different files between two folders based on the maximum common subtree (MCS).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder1 |
Folder
|
The first folder object, which contains the files to be compared. |
required |
folder2 |
Folder
|
The second folder object, which contains the files to be compared. |
required |
ignore_directory_names |
bool
|
If set to True, directory names will be ignored when comparing the folder structures. Defaults to False. |
False
|
path |
None | tuple[tuple[str], tuple[str]]
|
A tuple of two tuples, each containing the path to a subtree in the corresponding folder. If provided, only the specified subtrees will be compared. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[dict[str, list[str] | str, bool]]
|
list[dict[str, list[str] | str, bool]]: A list of dictionaries. Each dictionary represents a set of different files in an MCS (there might be multiple), with the keys being the relative paths of the files in the first and second folder and a boolean indicating whether the files are identical or not. |