读取多个xml文件java

我有 ~25000 个 XML 文件需要用 Java 读取。这是我的代码:


private static void ProcessFile() {

    try {



        File fXmlFile = new File("C:/Users/Emolk/Desktop/000010.xml");

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();

        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

        Document doc = dBuilder.parse(fXmlFile);



        doc.getDocumentElement().normalize();


        System.out.println("Root element :" + doc.getDocumentElement().getNodeName());


        NodeList nList = doc.getElementsByTagName("sindex");


        System.out.println("----------------------------");


        for (int temp = 0; temp < nList.getLength(); temp++) {


            Node nNode = nList.item(temp);


            System.out.println("");


            if (nNode.getNodeType() == Node.ELEMENT_NODE) {


                Element eElement = (Element) nNode;


                System.out.println("Name : " + eElement.getElementsByTagName("name").item(0).getTextContent());

                System.out.println("Count : " + eElement.getElementsByTagName("count").item(0).getTextContent());


                Entity CE = new Entity(eElement.getElementsByTagName("name").item(0).getTextContent(), Integer.parseInt(eElement.getElementsByTagName("count").item(0).getTextContent()));

                Entities.add(CE);

                System.out.println("Entity added! ");

            }

        }

        System.out.println(Entities);

        } catch (Exception e) {

        e.printStackTrace();

        }

}

我如何读取 25000 个文件而不是一个?


我尝试使用以下方法将所有 xml 文件连接在一起:https : //www.sobolsoft.com/howtouse/combine-xml-files.htm


但这给了我这个错误:


[Fatal Error] joined.xml:130:2: The markup in the document following the 

root element must be well-formed.


有只小跳蛙
浏览 228回答 2
2回答

阿波罗的战车

如果性能不是问题,那么您可以执行以下操作,&nbsp; &nbsp; import java.io.File;&nbsp; &nbsp; import java.util.List;&nbsp; &nbsp; import javax.xml.parsers.DocumentBuilder;&nbsp; &nbsp; import javax.xml.parsers.DocumentBuilderFactory;&nbsp; &nbsp; import org.w3c.dom.Document;&nbsp; &nbsp; import org.w3c.dom.NodeList;&nbsp; &nbsp; public class ReadFiles {&nbsp; &nbsp; &nbsp; &nbsp; public static void main(String[] args) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; File dir = new File("D:/Work"); //Directory where your file exists&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; File [] files = dir.listFiles();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for(File file : files) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if(file.isFile() && file.getName().endsWith(".xml")) { //You can validate file name with extension if needed&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ProcessFile(file, Entities);&nbsp; // Assumed you have declared Entities, may be list of other collection&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println(Entities);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; private static void ProcessFile(File fXmlFile, List<E> Entities) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; try {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Document doc = dBuilder.parse(fXmlFile);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; doc.getDocumentElement().normalize();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println("Root element :" + doc.getDocumentElement().getNodeName());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NodeList nList = doc.getElementsByTagName("sindex");&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println("----------------------------");&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (int temp = 0; temp < nList.getLength(); temp++) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Node nNode = nList.item(temp);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println("");&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (nNode.getNodeType() == Node.ELEMENT_NODE) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Element eElement = (Element) nNode;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println("Name : " + eElement.getElementsByTagName("name").item(0).getTextContent());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println("Count : " + eElement.getElementsByTagName("count").item(0).getTextContent());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Entity CE = new Entity(eElement.getElementsByTagName("name").item(0).getTextContent(), Integer.parseInt(eElement.getElementsByTagName("count").item(0).getTextContent()));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Entities.add(CE);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println("Entity added! ");&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } catch (Exception e) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; e.printStackTrace();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }

小唯快跑啊

要读取多个文件,您应该使用某种循环进行迭代。您可以扫描目录中的所有有效文件。File folder = new File("path/to/directory");File[] files = folder.listFiles();for (int i = 0; i < files.length; i++) {&nbsp; &nbsp; // you can also filter for .xml if needed&nbsp; &nbsp; if (files[i].isFile()) {&nbsp; &nbsp; &nbsp; &nbsp; // parse the file&nbsp; &nbsp; }}接下来,您需要决定如何解析文件:顺序或并行。由于您使用多个线程来解析文件,因此 Parallel 会快很多。一根线您可以重用您已经编写的代码,并遍历文件:for (File file : files) {&nbsp; &nbsp; processFile(file, yourListOfEntities);}多线程:获取一个ScheduledExecutorService并提交多个任务。ExecutorService service = Executors.newFixedThreadPool(5);for (File file : files) {&nbsp; &nbsp; service.execute(() -> processFile(file, yourListOfEntities));}这里有一个重要的注意事项: 的默认实现ArrayList不是线程安全的,所以你应该(因为List被多个线程使用)同步对它的访问:List<Entity> synchronizedList = Collections.synchronizedList(yourListOfEntities);此外,DocumentBuilder不是线程安全的,应该为每个线程创建一次(如果你只是调用你的方法,你就对了)。如果您考虑优化它,则此注释仅适用于这种情况。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java