使用 Java 将字符串操作为 XML

我从 PDF 中提取了如下字符串格式的数据。(请注意不均匀的间距和换行符)。


 Virtual Salary                                 25,100.00   EIS EE Contr.                                       7.90

 Virtual Car Allowance                           1,600.00   EPF Employee Contr.                             2,937.00

 Payment Received(Oversea)                       4,265.01   SOCSO Employee Contr.                              19.75

如何将此字符串转换为 XML,如下所示。


public void testMethod()

    {

        String extractedTestFromPDF=

                 " Virtual Salary                                 25,100.00   EIS EE Contr.                                       7.90\n"+

                 "\t Virtual Car Allowance                           1,600.00   EPF Employee Contr.                             2,937.00\n"+

                 " Payment Received(Oversea)                       4,265.01   SOCSO Employee Contr.                              19.75\n";


    }

期望 XML:


<xml>

<Data>

    <Allowance>Virtual Salary</Allowance>

    <Allowance_Amount>25,100.00</Allowance_Amount>

</Data>

<Data>

    <Allowance>EIS EE Contr.</Allowance>

    <Allowance_Amount>7.90</Allowance_Amount>

</Data>

<Data>

    <Allowance>Virtual Car Allowance</Allowance>

    <Allowance_Amount>1,600.00</Allowance_Amount>

</Data>

...

</xml>


ITMISS
浏览 184回答 1
1回答

湖上湖

String fixedSizetoXML(String extractedTestFromPDF) {&nbsp; &nbsp; String[] lines = extractedTestFromPDF.split("\\R");&nbsp; &nbsp; Pattern pattern = Pattern.compile("^\\s*(\\S.{20})\\s\\s+([-\\d,\\.]+)\\s+.*$");&nbsp; &nbsp; //&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (--------)&nbsp; &nbsp; &nbsp; &nbsp;(-----------)&nbsp; &nbsp; return "<?xml verion="1.0">\n<Xml>\n"&nbsp; &nbsp; &nbsp; &nbsp; + Stream.of(lines)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .map(pattern::matcher)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .filter(m::find)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .map(m -> String.format("<Data>\n"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; + "&nbsp; &nbsp; <Allowance>%s</Allowance>\n"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; + "&nbsp; &nbsp; <Allowance_Amount>%s</Allowance_Amount>\n"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; + "</Data>\n",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; m.group(1).trim(), m.group(2)))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .collect(Collectors.joining(""))&nbsp; &nbsp; &nbsp; &nbsp; + "<Xml>\n";}我冒昧地添加了一条 XML 预处理指令<?xml ...>,并为清楚起见更改xml为Xml.这些是具有固定长度字段的记录。计算位置并不完全安全,看到制表符\t并考虑特殊字符:é可能是一个字符,但也e加上一个特殊的零宽度´,我改用正则表达式模式。数量前至少需要两个空白字符。Java 7String fixedSizetoXML(String extractedTestFromPDF) {&nbsp; &nbsp; String[] lines = extractedTestFromPDF.split("\\R");&nbsp; &nbsp; Pattern pattern = Pattern.compile("^\\s*(\\S.{20})\\s\\s+([-\\d,\\.]+)\\s+.*$");&nbsp; &nbsp; //&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (--------)&nbsp; &nbsp; &nbsp; &nbsp;(-----------)&nbsp; &nbsp; StringBuilder sb = new StringBuilder(lines.length * 64);&nbsp; &nbsp; sb.append("<?xml verion="1.0">\n<Xml>\n");&nbsp; &nbsp; for (String line : lines) {&nbsp; &nbsp; &nbsp; &nbsp; Matcher m = pattern.matcher(line);&nbsp; &nbsp; &nbsp; &nbsp; if (m.find()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; String data = String.format("<Data>\n"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; + "&nbsp; &nbsp; <Allowance>%s</Allowance>\n"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; + "&nbsp; &nbsp; <Allowance_Amount>%s</Allowance_Amount>\n"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; + "</Data>\n",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; m.group(1).trim(), m.group(2));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sb.append(data);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; sb.append("<Xml>\n");&nbsp; &nbsp; return sb.toString();}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java