使用 StringUtils 的 substringBetween() 方法获取两个标签之间的文本

我有一个输入,如:


<address>

    <addressLine>280 Flinders Mall</addressLine>

    <geoCodeGranularity>PROPERTY</geoCodeGranularity>

</address>

<address type="office">

    <addressLine>IT Park</addressLine>

    <geoCodeGranularity>office Space</geoCodeGranularity>

</address>

我想捕获地址标签之间的所有内容。


我试过:


File file = new File("test.html");

String testHtml = FileUtils.readFileToString(file); 

String title = StringUtils.substringBetween(testHtml, "<address>", "</address>");

这并不适用于所有情况,因为地址标签可能包含一些属性。请帮助如何获取此类字符串的文本。


MYYA
浏览 1723回答 3
3回答

梦里花落0921

一般来说,你应该不使用正则表达式来解析HTML / XML的内容。相反,使用像 XPath 这样的解析器。鉴于您似乎无法使用解析器,我们可以使用模式匹配器尝试以下选项:int count = 0;String input = "<address>\n<addressLine>280 Flinders Mall</addressLine>\n&nbsp; &nbsp; <geoCodeGranularity>PROPERTY</geoCodeGranularity>\n</address>\n<address type=\"office\">\n&nbsp; &nbsp; <addressLine>IT Park</addressLine>\n&nbsp; &nbsp; <geoCodeGranularity>office Space</geoCodeGranularity>\n</address>";String pattern = "<address[^>]*>(.*?)</address>";Pattern r = Pattern.compile(pattern, Pattern.DOTALL);Matcher m = r.matcher(input);while (m.find( )) {&nbsp; &nbsp; count += m.group(1).length();&nbsp; &nbsp; System.out.println("Found value: " + m.group(1) );}System.out.println("count = " + count);&nbsp;&nbsp;这会为<address>您的示例数据中的两个标签找到 198 的计数。要使用 a 进行这项工作,BufferedReader您可能必须确保一次读取一个完整的<address>标签。

BIG阳

您可以将文件转换为字符串,并可以确定所需子字符串的开始和结束索引,如下所示:import java.io.File;import java.io.IOException;import java.nio.file.Files;import java.nio.file.Paths;public class Address {&nbsp; &nbsp; public static void main(String[] args) throws IOException {&nbsp; &nbsp; &nbsp; &nbsp; // Complete File Path&nbsp; &nbsp; &nbsp; &nbsp; File dir =&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; new File("\\..\\..\\Test.html");&nbsp; &nbsp; &nbsp; &nbsp; // Convert File Data As String&nbsp; &nbsp; &nbsp; &nbsp; String data =&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; new String(&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Files.readAllBytes(Paths&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .get(dir&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .getAbsolutePath())));&nbsp; &nbsp; &nbsp; &nbsp; // For Loop to get all the <address> tags in the file.&nbsp; &nbsp; &nbsp; &nbsp; for (int index = data.indexOf("<address"); index >= 0;) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // Start Index&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int startIndex = data.indexOf(">", index + 1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ++startIndex;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // End Index&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int indexOfEnd = data.indexOf("</address>", startIndex + 1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; String attributesString = data.substring(startIndex, indexOfEnd);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // Replace below line with desired logic with calling trim() on the String attributesString&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println(attributesString);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // Next Address will be after the end of first address&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index = data.indexOf("<address", indexOfEnd + 1);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}

慕神8447489

while (scan.hasNextLine()) {&nbsp; &nbsp; &nbsp; &nbsp; parser = scan.nextLine();&nbsp; &nbsp; &nbsp; &nbsp; // System.out.println(parser);&nbsp; &nbsp; &nbsp; &nbsp; if (parser.equals("<adress>")) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parser = scan.nextLine();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // System.out.println(parser);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int startPosition = parser.indexOf("<adressLine>") + "<adressLine>".length();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int endPosition = parser.indexOf("</adressLine>", startPosition);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; idNumber = parser.substring(startPosition, endPosition);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parser = scan.nextLine();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int startPosition1 = parser.indexOf("<geoCodeGranularity>") + "<geoCodeGranularity>".length();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int endPosition1 = parser.indexOf("</geoCodeGranularity>", startPosition1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; time = parser.substring(startPosition1, endPosition1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parser = scan.nextLine();…… 算法一定是这样的。如果你阅读文件。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java