猿问

将包含 ISO 8859-1 十六进制字符代码的字符串转换为 UTF-8 java

我有一个字符串,我相信它包含一些 ISO-8859-1 十六进制字符代码

String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n"

我想把它改成这样,

Áo thun bé gái cột dây xanh biển

我试过这种方法但没有运气

byte[] isoBytes = doc.getBytes("ISO-8859-1");
System.out.println(new String(isoBytes, "UTF-8"));

转换它的正确方法是什么?非常感谢您的帮助!


慕运维8079593
浏览 123回答 3
3回答

米琪卡哇伊

假设#nnnn;序列是普通的旧 Unicode 字符表示,我建议采用以下方法。class Cvt {&nbsp; &nbsp; static String convert(String in) {&nbsp; &nbsp; &nbsp; &nbsp; String str = in;&nbsp; &nbsp; &nbsp; &nbsp; int curPos = 0;&nbsp; &nbsp; &nbsp; &nbsp; while (curPos < str.length()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int j = str.indexOf("#x", curPos);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (j < 0) // no more #x&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; curPos = str.length();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int k = str.indexOf(';', curPos + 2);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (k < 0) // unterminated #x&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; curPos = str.length();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else { // convert #xNNNN;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int n = Integer.parseInt(str.substring(j+2, k), 16);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; char[] ch = { (char)n };&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; str = str.substring(0, j) + new String(ch) + str.substring(k+1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; curPos = j + 1; // after ch&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; return str;&nbsp; &nbsp; }&nbsp; &nbsp; static public void main(String... args) {&nbsp; &nbsp; &nbsp; &nbsp; String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";&nbsp; &nbsp; &nbsp; &nbsp; System.out.println(convert(doc));&nbsp; &nbsp; }}这与之前答案的方法非常相似,除了假设字符是 Unicode 代码点而不是 8859-1 代码点。输出是女婴蓝色领带 T 恤

米脂

在这种情况下,代码确实会掩盖需求。要求有点不确定,但似乎是对类似于 HTML 和 XML 的专用 Unicode 字符实体引用进行解码,如评论中所述。正则表达式引擎的优势超过理解模式语言所需的任何学习的情况也很少见。String input = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";// Hex digits between "#x" and ";" are a Unicode codepoint valueString text = java.util.regex.Pattern.compile("(#x([0-9A-Fa-f]+);)")&nbsp; &nbsp; .matcher(input)&nbsp; &nbsp; // group 2 is the matched input between the 2nd ( in the pattern and its paired )&nbsp; &nbsp; .replaceAll(x -> new String(Character.toChars(Integer.parseInt(x.group(2), 16))));System.out.println(text);匹配器函数查找候选字符串以替换与模式匹配的字符串。replaceAll 函数将它们替换为计算出的 Unicode 代码点。由于 Unicode 代码点可能被编码为两个char(UTF-16) 值,因此所需的替换字符串必须从char[].

MM们

Java 中的字符串没有十六进制语法。如果您需要支持该字符串格式,我会制作一个辅助函数来解析该格式并构建一个字节数组,然后将其解析为 ISO-8859-1。import java.io.ByteArrayOutputStream;public class translate {&nbsp; &nbsp; private static byte[] parseBytesWithHexLiterals(String s) throws Exception {&nbsp; &nbsp; &nbsp; &nbsp; final ByteArrayOutputStream baos = new ByteArrayOutputStream();&nbsp; &nbsp; &nbsp; &nbsp; while (!s.isEmpty()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (s.startsWith("#x")) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; s = s.substring(2);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while (s.charAt(0) != ';') {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int i = Integer.parseInt(s.substring(0, 2), 16);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; baos.write(i);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; s = s.substring(2);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; baos.write(s.substring(0, 1).getBytes("US-ASCII")[0]);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; s = s.substring(1);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; return baos.toByteArray();&nbsp; &nbsp; }&nbsp; &nbsp; public static void main(String[] args) throws Exception {&nbsp; &nbsp; &nbsp; &nbsp; String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";&nbsp; &nbsp; &nbsp; &nbsp; byte[] parsedAsISO88591 = parseBytesWithHexLiterals(doc);&nbsp; &nbsp; &nbsp; &nbsp; doc = new String(parsedAsISO88591, "ISO-8859-1");&nbsp; &nbsp; &nbsp; &nbsp; System.out.println(doc); // Print out the string, which is in Unicode internally.&nbsp; &nbsp; &nbsp; &nbsp; byte[] asUTF8 = doc.getBytes("UTF-8"); // Get a UTF-8 version of the string.&nbsp; &nbsp; }}
随时随地看视频慕课网APP

相关分类

Java
我要回答