如何使用 simplehtmldom 从此页面提取数据

我对 simplehtmldom 不熟悉，除了知道避免它之外。因此，我将提出一个使用 PHP 内置 DOM 类的解决方案：<?phplibxml_use_internal_errors(true);// get the HTML$html = file_get_contents("https://benthamopen.com/browse-by-title/B/1/");// create a DOM object and load it up$dom = new DomDocument();$dom->loadHtml($html);// create an XPath object and query it$xpath = new DomXPath($dom);$elements = $xpath->query("//div[@style='padding:10px;']");// loop through the matchesforeach ($elements as $el) {    // skip elements without ISSN    $text = trim($el->textContent);    if (strpos($text, "ISSN") !== 0) {        continue;    }    // get the first div inside this thing    $div = $el->getElementsByTagName("div")[0];    // dump it out    printf("%s %s %s \n", str_replace("ISSN: ", "", $text), $div->getAttribute("data-title"), $div->getAttribute("data-url"));}XPath 的内容可能有点让人不知所措，但对于像这样的简单搜索，它与 CSS 选择器没有太大区别。希望评论能解释一切，如果没有，请告诉我！输出：1874-1207 The Open Biomedical Engineering Journal https://benthamopen.com/TOBEJ/home/ 1874-1967 The Open Biology Journal https://benthamopen.com/TOBIOJ/home/ 1874-091X The Open Biochemistry Journal https://benthamopen.com/TOBIOCJ/home/ 1875-0362 The Open Bioinformatics Journal https://benthamopen.com/TOBIOIJ/home/ 1875-3183 The Open Biomarkers Journal https://benthamopen.com/TOBIOMJ/home/ 2665-9956 The Open Biomaterials Science Journal https://benthamopen.com/TOBMSJ/home/ 1874-0707 The Open Biotechnology Journal https://benthamopen.com/TOBIOTJ/home/

如何使用 simplehtmldom 从此页面提取数据

1回答