使用 PHP DOMXpath 解析 HTML

我想使用 PHP 和 DOMXpath 从外部网站检索事件链接和文本。外部网站html结构如下;


<!-- first -->

<div class="col-sm-12 col-lg-3 me recording-item">

    <div class="recording-item-inner">

        <a class="col-sm-12 recording-name" href="/recordings/191">

        <div class="info">

            <b>Daily Event</b><br>

            <small>29 Jun 2020</small>

        </div></a>

    </div>

</div>

<!-- second -->

<div class="col-sm-12 col-lg-3 me recording-item">

    <div class="recording-item-inner">

        <a class="col-sm-12 recording-name" href="/recordings/190">

        <div class="info">

            <b>Daily Event B</b><br>

            <small>26 Jun 2020</small>

        </div></a>

    </div>

</div>

<!-- third -->

<div class="col-sm-12 col-lg-3 me recording-item">

    <div class="recording-item-inner">

        <a class="col-sm-12 recording-name" href="/recordings/189">

        <div class="info">

            <b>Daily Event C</b><br>

            <small>22 Jun 2020</small>

        </div></a>

    </div>

</div>

我正在尝试检索最新的 5 个事件名称、日期和链接。目前我可以使用下面的代码获取最新的(单个)事件。


<?php

function getEvents()

{


    $page = file_get_contents('https://example.com/events');

    $rootUrl = 'https://example.com';


    @$doc = new DOMDocument();

    @$doc->loadHTML($page);


    $xpath = new DomXPath($doc);


    $nodeList = $xpath->query("//div[@class='recording-item']");

    $node = $nodeList->item(0);


    $href = $xpath->evaluate("string(//div[@class='recording-item-inner']/a/@href)");

    $eventUrl = $rootUrl . $href;


    return $eventUrl;


}

?>

我如何修改此代码,以便它检索 5 个最近的事件详细信息并打印出简单的项目列表;


<ul>

  <li>Event 1 - [name], [date], [href]</li>

  <li>Event 2 - [name], [date], [href]</li>

  <li>Event 3 - [name], [date], [href]</li>

  <li>Event 4 - [name], [date], [href]</li>

  <li>Event 5 - [name], [date], [href]</li>

</ul>


慕雪6442864
浏览 137回答 1
1回答

ITMISS

这是可以做到的,但由于 xpath 支持有限,它不是最优雅的解决方案。从 开始$nodeList;鉴于您的示例 xml 只有 3 个事件,此代码将输出有关前两个事件的所需信息。显然,您可以根据实际代码修改它:$nodeList = $xpath->query('//div[./div[@class="recording-item-inner"]]//div[@class="info"]');$i = 1;echo htmlspecialchars("<ul>", ENT_QUOTES);echo "<br>";foreach($nodeList as $result) {&nbsp;&nbsp; &nbsp;if ($i++ > 2) break;&nbsp; &nbsp;echo htmlspecialchars("<li>", ENT_QUOTES);&nbsp; &nbsp;echo "Event 1 - " . $result->childNodes[1]->textContent . ",&nbsp; &nbsp;";&nbsp; &nbsp;echo $result->childNodes[4]->textContent . ",&nbsp; &nbsp;";&nbsp; &nbsp;echo $result->parentNode->getAttribute('href');&nbsp; &nbsp;echo htmlspecialchars("</li>", ENT_QUOTES);&nbsp; &nbsp;&nbsp; &nbsp;echo "<br>";&nbsp; &nbsp;}echo htmlspecialchars("</ul>", ENT_QUOTES);输出:<ul><li>Event 1 - Daily Event, 29 Jun 2020, /recordings/191</li><li>Event 1 - Daily Event B, 26 Jun 2020, /recordings/190</li></ul>
打开App,查看更多内容
随时随地看视频慕课网APP