猿问

使用 DOM 文档将多个 HTML 正文另存为一个

我有一个包含多个标签的字符串。我想获取所有内容并将它们连接到一个有效的结构。例如:<html><body><div>Content</div></body></html>


<html><body><div>Content</div></body></html>

<html><body><div>Content</div></body></html>

<html><body><div>Content</div></body></html>

应该是:


<html>

    <body>

        <div>Content</div>

        <div>Content</div>

        <div>Content</div>

    </body>

</html>

我当前的代码如下所示:


    libxml_use_internal_errors(true);

    $newDom = new DOMDocument();


    $newBody = "";


    $newDom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));


    $bodyTags = $newDom->getElementsByTagName("body");


    foreach($bodyTags as $body) {

        $newBody .= $newDom->saveHTML($body);

    }

$newBody现在包含所有正文标签:


<body><div>Content</div></body>

<body><div>Content</div></body>

<body><div>Content</div></body>

如何只保存每个正文标签的HTML内容?$newBody


编辑:


基于@NigelRen的答案,这是我的解决方案:


    libxml_use_internal_errors(true);

    $newDom = new DOMDocument();


    $newBody = '';

    $newDom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));


    $bodyTags = $newDom->getElementsByTagName("body");


    foreach($bodyTags as $body) {

        foreach ($body->childNodes as $node)   {

            $newBody .= $newDom->saveHTML($node);

        }

    }


    $newDom = new DOMDocument();

    $newDom->loadHTML(mb_convert_encoding($newBody, 'HTML-ENTITIES', 'UTF-8'));

    $newBody = $newDom->saveHTML();


料青山看我应如是
浏览 130回答 1
1回答

ABOUTYOU

这很尴尬,因为当您使用它时,它将尝试修复原始文档中的HTML。这创建了一个结构,它不是你所认为的。loadHTML()但是,如果您有文档的基本大纲,则以下内容会将标记的内容复制到新文档(代码中的注释)...<body>$html = '<html><body><div>Content1</div></body></html><html><body><div>Content2</div></body></html><html><body><div>Content3</div></body></html>';libxml_use_internal_errors(true);$newDom = new DOMDocument();// New document with final code$newBody = new DOMDocument();$newDom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));// Set up basic template for new doucument$newBody->loadHTML("<html><body /></html>");// Find where to add any new content$addBody = $newBody->getElementsByTagName("body")[0];// Find the existing content to add$bodyTags = $newDom->getElementsByTagName("body");foreach($bodyTags as $body) {&nbsp; &nbsp; // Add all of the contents of the <body> tag into the new document&nbsp; &nbsp; foreach ( $body->childNodes as $node )&nbsp; &nbsp;{&nbsp; &nbsp; &nbsp; &nbsp; // Import the node to copy to the new document and add it in&nbsp; &nbsp; &nbsp; &nbsp; $addBody->appendChild($newBody->importNode($node, true));&nbsp; &nbsp; }}echo $newBody->saveHTML();这给了...<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"><html><body><div>Content1</div><div>Content2</div><div>Content3</div></body></html>限制是不会保留标记之外的任何内容和标记的任何属性。<body><body>
随时随地看视频慕课网APP
我要回答