与preg_match_all匹配的 PHP

我的任务是从HTML中提取数据,我需要为HTML中的每组p标签获取数据数组。下面是一个示例。


<p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 63px; white-space: nowrap;">Title </p>

<p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 349px; white-space: nowrap;">1234 </p>

<p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 461px; white-space: nowrap;">$30 </p>

<p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 563px; white-space: nowrap;">$10,000,000 </p>

<p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 777px; white-space: nowrap;">3,000,000 </p>

此 HTML 将重复多次,使“标题”和“1234”标签保持不变,然后在某个点切换到不同的标签。“顶部”和“左侧”值将在整个 HTML 中不断变化。我有能力循环访问现有的“Title”和“1234”标签,以匹配这部分内容。


$title_label = 'Title';

$number_label = '1234';

preg_match_all('%\d{2}px; white-space: nowrap;">$title_label </p>%', $html_content, $array_match);

$array_cost_name = $array_match[1];

$array_return_name = $array_match[2];

$array_number_name = $array_match[3];

然后,我需要 3 个数组来包含最后 3 个标签字段。对于提供的示例 HTML,我希望“$30”、“$10,000,000”和“3,000,000”是每个数组的第一个值。


我不知道如何编写正则表达式来处理这种情况。任何人都可以帮忙吗?


冉冉说
浏览 154回答 3
3回答

森林海

正则表达式不是执行此任务的正确工具,XML解析器要容易得多:$html = '<p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 63px; white-space: nowrap;">Title </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 349px; white-space: nowrap;">1234 </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 461px; white-space: nowrap;">$30 </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 563px; white-space: nowrap;">$10,000,000 </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 777px; white-space: nowrap;">3,000,000 </p>';$doc = new DOMDocument();$doc->loadHTML($html);$xml = simplexml_import_dom($doc);$parts = $xml->xpath('//p[@class="ft01"]/text()'); // find all texts inside p tags, with class ft01$array_cost_name = (string) $parts[2];$array_return_name = (string) $parts[3];$array_number_name = (string) $parts[4];echo $array_cost_name ; // $30echo $array_return_name ; // $10,000,000echo $array_number_name ; // 3,000,000

守候你守候我

你可以使用一个简单的全局正则表达式&nbsp;/ace:不换行;”>(.*) <\/p>/&nbsp;或沿线的任何内容来获取您要查找的组,然后删除前 2 个项目以仅获取最后 3 个项目。下面是一个示例和一个用于测试它的链接。$html_content = '<p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 63px; white-space: nowrap;">Title </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 349px; white-space: nowrap;">1234 </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 461px; white-space: nowrap;">$30 </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 563px; white-space: nowrap;">$10,000,000 </p><p class="ft01" style="margin: 0; padding: 0; font-size: 16px; font-family: Times; color: #000000; position: absolute; top: 103px; left: 777px; white-space: nowrap;">3,000,000 </p>';preg_match_all('/ace: nowrap;">(.*) <\/p>/', $html_content, $array_match);$array_match = array_slice($array_match[0], 2); ;print_r($array_match);http://sandbox.onlinephpfunctions.com/code/5ac69d44ff8168b4b21133c46dfa9c6db6986b6a

喵喔喔

通过正则表达式,您可以尝试以下方式:\preg_match_all('/<p.*>(.*)<\/p>/',&nbsp;$html,&nbsp;$out); $result&nbsp;=&nbsp;$out[1];这将捕获标记之间的所有字符。<p></p>
打开App,查看更多内容
随时随地看视频慕课网APP