使用JS从网页中删除评论之间的元素

我正在尝试从此网页收集数据:


https://www.biharjobportal.com/bihar-police-constable-bharti/

我设法使用此代码从网站上删除了所有 GoogleAds 因为它有一个类名,所以很容易:


 var theaders = document.getElementsByClassName('adsbygoogle');

for (var i=theaders.length-1; i >=0; i--)

{

    theaders[i].parentElement.removeChild(theaders[i]);

}

但是该网页有这个没有 IDS、类名等的元素。(请参见屏幕截图):

http://img1.sycdn.imooc.com/648abfda0001b09206540312.jpg

我只知道要删除的元素在这些评论之间:


     <!-- WP QUADS Content Ad Plugin v. 2.0.17  -->


    **codes to remove (as in the picture)**


    <!-- WP QUADS Content Ad Plugin v. 2.0.17  -->

我尝试使用 XPATH 删除所有此类项目,但什么也没发生,这是我写的代码:


    var badTableEval = document.evaluate (

    "/html/body/div[1]/div/div[1]/main/article/div/div/ul[3]",

    document.documentElement,

    null,

    XPathResult.FIRST_ORDERED_NODE_TYPE,

    null

);


if (badTableEval  &&  badTableEval.singleNodeValue) {

    var badTable  = badTableEval.singleNodeValue;

    badTable.parentNode.removeChild (badTable);

}

如何从网页中删除所有这些元素?

胡说叔叔
浏览 109回答 1
1回答

慕田峪4524236

您可以通过这种方式检测文档中的评论(参见代码片段)。现在由您来设计一些巧妙的函数来删除注释之间的元素。. 好的,您要求它,包括一种删除相等注释之间元素的方法。const root = document.querySelector("body");const allEls = [...root.childNodes];const IS_COMMENT = 8;allEls.forEach((el, i) => {&nbsp; if (el.nodeType === IS_COMMENT) {&nbsp; &nbsp; // we have a comment. Find the (index of) next equal comment in [allEls]&nbsp; &nbsp; // from this point on&nbsp; &nbsp; const subset = allEls.slice(i + 1);&nbsp; &nbsp; const hasEqualNextComment = subset&nbsp; &nbsp; &nbsp; .findIndex(elss =>&nbsp; &nbsp; &nbsp; &nbsp; elss.nodeType === IS_COMMENT &&&nbsp; &nbsp; &nbsp; &nbsp; elss.textContent.trim() === el.textContent.trim());&nbsp; &nbsp; // if an equal comment has been found, remove every element between&nbsp;&nbsp; &nbsp; // the two comment elements&nbsp; &nbsp; if (hasEqualNextComment > -1) {&nbsp; &nbsp; &nbsp; subset.slice(1, hasEqualNextComment - 1)&nbsp; &nbsp; &nbsp; &nbsp; .forEach(elss =>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elss.parentNode && elss.parentNode.removeChild(elss));&nbsp; &nbsp; }&nbsp; }});body {&nbsp; font: normal 12px/15px verdana, arial;&nbsp; margin: 2rem;}<!-- WP QUADS Content Ad Plugin v. 2.0.17&nbsp; --><ul>&nbsp; <li>item 1</li>&nbsp; <li>item 2</li>&nbsp; <li>item 3</li></ul><!-- WP QUADS Content Ad Plugin v. 2.0.17&nbsp; --><!-- other comment --><ul>&nbsp; <li>item 4</li>&nbsp; <li>item 5</li>&nbsp; <li>item 6</li></ul><!-- other comment: the above is kept --><!-- something 2 remove --><div>item 7</div><!--something 2 remove--><div>item 8</div><p>&nbsp; <b>The result should show item 4 - item 6, item 8 and the&nbsp;&nbsp; &nbsp; text within this paragraph</b>.&nbsp; <br><i>Note</i>: this will only work for top level comments&nbsp;&nbsp; within the given [root] (so, not for comments that nested&nbsp;&nbsp; within elements).&nbsp; <br>Also you may have to clean multiline-comments&nbsp; from line endings for comparison.</p>
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

JavaScript