处理 xml 文件时的 UTF8 编码无效

使用 java 编写脚本来检测有问题的行。AtomicInteger lineno = new AtomicInteger();Path path = Paths.get("... .xml");Files.lines(path, StandardCharsets.ISO_8859_1)    .forEach(line -> {        int no = lineno.incrementAndGet();        byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);        try {            new String(b, StandardCharsets.UTF_8);        } catch (Exception e) {            System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());            //throw new IllegalStateException(e);        }    });人们可能会认为这是一个数据错误。一般来说，它也可能是错误的缓冲读取：当一个多字节序列在缓冲区边界上被破坏时；然后可能会出现两个错误的半序列。在标准库代码中不太可能。为了确保代码new String(...)不会被 JVM 丢弃，可能：int sowhat = Files.lines(path, StandardCharsets.ISO_8859_1)    .mapToInt(line -> {        int no = lineno.incrementAndGet();        byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);        try {            return new String(b, StandardCharsets.UTF_8).length();        } catch (Exception e) {            System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());            throw new IllegalStateException(e); // Must throw or return int        }    }).sum();System.out.println("Ignore this: " + sowhat);人们可能会认为这是一个数据错误。一般来说，它也可能是错误的缓冲读取：当一个多字节序列在缓冲区边界上被破坏时；然后可能会出现两个错误的半序列。在标准库代码中不太可能。为了确保代码new String(...)不会被 JVM 丢弃，可能：int sowhat = Files.lines(path, StandardCharsets.ISO_8859_1)    .mapToInt(line -> {        int no = lineno.incrementAndGet();        byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);        try {            return new String(b, StandardCharsets.UTF_8).length();        } catch (Exception e) {            System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());            throw new IllegalStateException(e); // Must throw or return int        }    }).sum();非法的 XML 字符（在 1.0 版中）？[#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]int sowhat = Files.lines(path, StandardCharsets.ISO_8859_1)    .mapToInt(line -> {        int no = lineno.incrementAndGet();        byte[] b = line.getBytes(StandardCharsets.ISO_8859_1);        if (!legal(b)) {            System.out.printf("[%d] %s%n%s%n", no, line, e.getMessage());            throw new IllegalStateException(e); // Must throw or return int        }    }).sum();static boolean legal(byte[] bytes) {    String s = new String(bytes, StandardCharsets.UTF_8);    for (char ch : s.toCharArray()) {        int x = ch;        if ((0 <= x && x <= 8)               // ASCII control chars                || (0xB <= x && x <= 0xC)                || (0xE <= x && x <= 0x1F)                || (0x7f <= x && x <= 0x84)  // DEL + Unicode control chars                || (0x86 <= x && x <= 0x9F)) {            return false;        }    }    return true;}如果这不起作用，我已经让你足够长的时间了。拆分文件并验证零件。

处理 xml 文件时的 UTF8 编码无效

2回答