Java:如何不仅按名称而且按大小和内容搜索文件夹中的重复文件?

我想创建一个 Java 应用程序来识别重复项。到目前为止,我只能通过名称找到重复项,但我还需要大小、文件类型,也许还需要内容。到目前为止,这是我的代码,使用HashMap:


public static void find(Map<String, List<String>> lists, File dir) {

    for (File f : dir.listFiles()) {

        if (f.isDirectory()) {

            find(lists, f);

        } else {

            String hash = f.getName() + f.length();

            List<String> list = lists.get(hash);

            if (list == null) {

                list = new LinkedList<String>();

                lists.put(hash, list);

            }

            list.add(f.getAbsolutePath());

        }

    }

}


慕斯王
浏览 127回答 4
4回答

白衣染霜花

我使用 MessageDigest 并检查了一些文件,并根据我在标题和描述中列出的所有标准找到了重复项。谢谢你们。private static MessageDigest messageDigest;static {&nbsp; &nbsp; try {&nbsp; &nbsp; &nbsp; &nbsp; messageDigest = MessageDigest.getInstance("SHA-512");&nbsp; &nbsp; } catch (NoSuchAlgorithmException e) {&nbsp; &nbsp; &nbsp; &nbsp; throw new RuntimeException("cannot initialize SHA-512 hash function", e);&nbsp; &nbsp; }}&nbsp; &nbsp;这是在重复搜索代码中实现后的结果public static void find(Map<String, List<String>> lists, File dir) {for (File f : dir.listFiles()) {&nbsp; if (f.isDirectory()) {&nbsp; &nbsp; find(lists, f);&nbsp; } else {&nbsp; &nbsp; &nbsp; try{&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; FileInputStream fi = new FileInputStream(f);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; byte fileData[] = new byte[(int) f.length()];&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fi.read(fileData);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fi.close();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //Crearea id unic hash pentru fisierul curent&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; String hash = new BigInteger(1, messageDigest.digest(fileData)).toString(16);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; List<String> list = lists.get(hash);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (list == null) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; list = new LinkedList<String>();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //Adăugați calea către listă&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; list.add(f.getAbsolutePath());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //Adauga lista actualizată la tabelul Hash&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lists.put(hash, list);&nbsp; &nbsp; &nbsp; }catch (IOException e) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; throw new RuntimeException("cannot read file " + f.getAbsolutePath(), e);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; }}}

动漫人物

如果 2 个文件具有相同的扩展名和相同的文件大小,则认为它们相等,这只是创建一个代表这种“平等”的对象的问题。所以,你会做这样的事情:public class FileEquality {&nbsp; &nbsp; private final String fileExtension;&nbsp; &nbsp; private final long fileSize;&nbsp; &nbsp; // constructor, toString, equals, hashCode, and getters here.}(并填写所有缺失的样板文件:Constructor、toString、equals、hashCode 和 getter。如果您愿意,请参阅Project Lombok 的 @Value以简化此操作)。fileName.lastIndexOf('.')您可以使用和从文件名获取文件扩展名fileName.substring(lastIndex)。使用 lombok,您只需编写:@lombok.Value public class FileEquality {&nbsp; &nbsp; String fileExtension;&nbsp; &nbsp; long fileSize;}然后使用FileEquality对象作为哈希图中的键而不是字符串。但是,仅仅因为你有,比如说,'foo.txt' 和 'bar.txt' 两者的大小恰好都是 500 字节并不意味着这 2 个文件是重复的。所以,你也想要涉及内容,但是,如果你扩展你的FileEquality类以包含文件的内容,那么会出现两件事:如果您无论如何都要检查内容,大小和文件扩展名有什么关系?foo.txt如果和的内容bar.jpg完全相同,那么它们就是重复的,不是吗?何必。您可以将内容传达为 a&nbsp;byte[],但请注意,编写适当的hashCode()和equals()实现(如果您想将此对象用作哈希映射的键,则需要这样做)变得有点棘手。幸运的是,lombok@Value会做对,所以我建议你使用它。这意味着整个文件内容都在 JVM 的进程内存中。除非您正在检查非常小的文件,否则您将耗尽内存。您可以通过不存储文件的全部内容,而是存储内容的散列来稍微抽象一下。Google 关于如何计算 java 文件的 sha-256 散列。将此哈希值放入您的中FileEquality,现在您可以避免内存问题。理论上可能有 2 个文件具有不同的内容,但它们哈希到完全相同的 sha-256 值,但这种情况的可能性是天文数字,更重要的是,sha-256 的设计使得故意在数学上不可行制作 2 个这样的文件来扰乱您的应用程序。因此,我建议您只信任哈希 :)当然,请注意,散列整个文件需要读取整个文件,因此如果您在包含 500GB 文件的目录上运行重复查找器,那么您的应用程序将至少需要读取 500GB,这将花一些时间。

偶然的你

我很久以前就做了这个应用程序,如果你想学习的话,我找到了它的一些源代码。此方法通过比较两个文件字节来工作。public static boolean checkBinaryEquality(File file1, File file2) {&nbsp; &nbsp; if(file1.length() != file2.length()) return false;&nbsp; &nbsp; try(FileInputStream f1 = new FileInputStream(file1); FileInputStream f2 = new FileInputStream(file2)){&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; byte bus1[] = new byte[1024],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;bus2[] = new byte[1024];&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // comparing files bytes one by one if we found unmatched results that means they are not equal&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while((f1.read(bus1)) >= 0) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; f2.read(bus2);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for(int i = 0; i < 1024;i++)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if(bus1[i] != bus2[i])&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // passed&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return true;&nbsp; &nbsp; } catch (IOException exp) {&nbsp; &nbsp; &nbsp; &nbsp; // problems occurred so let's consider them not equal&nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; }}将此方法与名称和扩展名检查结合起来,您就可以开始了。

慕码人8056858

复制粘贴示例创建一个扩展类Fileimport java.io.File;import java.io.FileInputStream;import java.io.IOException;import java.util.Arrays;public class MyFile extends File {&nbsp; &nbsp; private static final long serialVersionUID = 1L;&nbsp; &nbsp; public MyFile(final String pathname) {&nbsp; &nbsp; &nbsp; &nbsp; super(pathname);&nbsp; &nbsp; }&nbsp; &nbsp; @Override&nbsp; &nbsp; public boolean equals(final Object obj) {&nbsp; &nbsp; &nbsp; &nbsp; if (this == obj) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return true;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if (this.getClass() != obj.getClass()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; final MyFile other = (MyFile) obj;&nbsp; &nbsp; &nbsp; &nbsp; if (!Arrays.equals(this.getContent(), other.getContent())) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if (this.getName() == null) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (other.getName() != null) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; } else if (!this.getName().equals(other.getName())) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if (this.length() != other.length()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; return true;&nbsp; &nbsp; }&nbsp; &nbsp; @Override&nbsp; &nbsp; public int hashCode() {&nbsp; &nbsp; &nbsp; &nbsp; final int prime = 31;&nbsp; &nbsp; &nbsp; &nbsp; int result = prime;&nbsp; &nbsp; &nbsp; &nbsp; result = (prime * result) + Arrays.hashCode(this.getContent());&nbsp; &nbsp; &nbsp; &nbsp; result = (prime * result) + ((this.getName() == null) ? 0 : this.getName().hashCode());&nbsp; &nbsp; &nbsp; &nbsp; result = (prime * result) + (int) (this.length() ^ (this.length() >>> 32));&nbsp; &nbsp; &nbsp; &nbsp; return result;&nbsp; &nbsp; }&nbsp; &nbsp; private byte[] getContent() {&nbsp; &nbsp; &nbsp; &nbsp; try (final FileInputStream fis = new FileInputStream(this)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return fis.readAllBytes();&nbsp; &nbsp; &nbsp; &nbsp; } catch (final IOException e) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; e.printStackTrace();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return new byte[] {};&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}读取基本目录import java.io.File;import java.util.HashMap;import java.util.Iterator;import java.util.List;import java.util.Map;import java.util.Map.Entry;import java.util.Vector;public class FileTest {&nbsp; &nbsp; public FileTest() {&nbsp; &nbsp; &nbsp; &nbsp; super();&nbsp; &nbsp; }&nbsp; &nbsp; public static void main(final String[] args) {&nbsp; &nbsp; &nbsp; &nbsp; final Map<MyFile, List<MyFile>> duplicates = new HashMap<>();&nbsp; &nbsp; &nbsp; &nbsp; FileTest.handleDirectory(duplicates, new File("[path to base directory]"));&nbsp; &nbsp; &nbsp; &nbsp; final Iterator<Entry<MyFile, List<MyFile>>> iterator = duplicates.entrySet().iterator();&nbsp; &nbsp; &nbsp; &nbsp; while (iterator.hasNext()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; final Entry<MyFile, List<MyFile>> next = iterator.next();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (next.getValue().size() == 0) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; iterator.remove();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println(next.getKey().getName() + " - " + next.getKey().getAbsolutePath());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (final MyFile file : next.getValue()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println("&nbsp; &nbsp; &nbsp; &nbsp; ->" + file.getName() + " - " + file.getAbsolutePath());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; private static void handleDirectory(final Map<MyFile, List<MyFile>> duplicates, final File directory) {&nbsp; &nbsp; &nbsp; &nbsp; final File dir = directory;&nbsp; &nbsp; &nbsp; &nbsp; if (dir.isDirectory()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; final File[] files = dir.listFiles();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (final File file : files) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (file.isDirectory()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; FileTest.handleDirectory(duplicates, file);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; final MyFile myFile = new MyFile(file.getAbsolutePath());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (!duplicates.containsKey(myFile)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; duplicates.put(myFile, new Vector<>());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; duplicates.get(myFile).add(myFile);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java