忽略preg_replace中的html标记

如何忽略此preg_replace中的html标记。我有搜索的foreach函数,所以如果有人搜索“apple span”,preg_replace也会对span和html中断应用跨度:


preg_replace("/($keyword)/i","<span class=\"search_hightlight\">$1</span>",$str);

提前致谢!


凤凰求蛊
浏览 511回答 3
3回答

临摹微笑

我假设您应该基于DOMDocument和DOMXPath而不是使用正则表达式来创建函数。即使那些功能非常强大,您也会遇到一些问题,例如您所描述的问题,这些问题并非(总是)容易且可靠地使用正则表达式来解决。一般说法是:不要用正则表达式解析HTML。记住这一点是一个很好的规则,虽然它与任何规则一样,并不总是适用,但值得一提的是。XPath允许您查找仅包含文本中所有搜索项的所有文本,忽略所有XML元素。然后你只需要将这些文本包装成<span>你已经完成的。编辑:最后一些代码;)首先,它xpath用于定位包含搜索文本的元素。我的查询看起来像这样,这可能写得更好,我不是超级xpath pro:'//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..'$search包含要搜索的文本,不包含任何"(引用)字符(这会破坏它,如果需要引号,请参阅清理/清理xpath属性以获取变通方法)。此查询将返回包含文本节点的所有父节点,这些节点组合在一起将是包含搜索词的字符串。因为这样的列表不容易进一步处理,我创建了一个TextRange表示DOMText节点列表的类。在文本节点列表上执行字符串操作非常有用,就好像它们是一个字符串一样。这是例程的基本框架:$str = '...'; # some XML$search = 'text that span';printf("Searching for: (%d) '%s'\n", strlen($search), $search);$doc = new DOMDocument;$doc->loadXML($str);$xp = new DOMXPath($doc);$anchor = $doc->getElementsByTagName('body')->item(0);if (!$anchor){&nbsp; &nbsp; throw new Exception('Anchor element not found.');}// search elements that contain the search-text$r = $xp->query('//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..', $anchor);if (!$r){&nbsp; &nbsp; throw new Exception('XPath failed.');}// process search resultsforeach($r as $i => $node){&nbsp; &nbsp;&nbsp; &nbsp; $textNodes = $xp->query('.//child::text()', $node);&nbsp; &nbsp; // extract $search textnode ranges, create fitting nodes if necessary&nbsp; &nbsp; $range = new TextRange($textNodes);&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; $ranges = array();&nbsp; &nbsp; while(FALSE !== $start = strpos($range, $search))&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; $base = $range->split($start);&nbsp; &nbsp; &nbsp; &nbsp; $range = $base->split(strlen($search));&nbsp; &nbsp; &nbsp; &nbsp; $ranges[] = $base;&nbsp; &nbsp; };&nbsp; &nbsp; // wrap every each matching textnode&nbsp; &nbsp; foreach($ranges as $range)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; foreach($range->getNodes() as $node)&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $span = $doc->createElement('span');&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $span->setAttribute('class', 'search_hightlight');&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $node = $node->parentNode->replaceChild($span, $node);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $span->appendChild($node);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}对于我的示例XML:<html>&nbsp; &nbsp; <body>&nbsp; &nbsp; &nbsp; &nbsp; This is some <span>text</span> that span across a page to search in.&nbsp; &nbsp; and more text that span</body></html>它产生以下结果:<html>&nbsp; &nbsp; <body>&nbsp; &nbsp; &nbsp; &nbsp; This is some <span><span class="search_hightlight">text</span></span><span class="search_hightlight"> that span</span> across a page to search in.&nbsp; &nbsp; and more <span class="search_hightlight">text that span</span></body></html>这表明这甚至允许查找分布在多个标签上的文本。对于正则表达式来说,这并不容易。你可以在这里找到完整的代码:http://codepad.viper-7.com/U4bxbe(包括TextRange我从答案示例中取出的类)。由于该网站使用的旧版LIBXML版本,它在viper键盘上无法正常工作。它适用于我的LIBXML版本20707.我创建了一个关于此问题的相关问题:XPath查询结果顺序。警告提示:此示例使用二进制字符串search(strpos)和相关偏移量来分割文本节点和该DOMText::splitText函数。这可能导致错误的偏移,因为函数需要UTF-8字符偏移。正确的方法是使用mb_strpos获取UTF-8基础值。该示例无论如何都有效,因为它只使用US-ASCII具有与UTF-8示例数据相同的偏移量。对于现实生活情况,$search字符串应采用UTF-8编码,mb_strpos应使用而不是strpos:&nbsp;while(FALSE !== $start = mb_strpos($range, $search, 0, 'UTF-8'))

慕容3067478

我不得不将以下内容用于XPath,否则它没有找到没有子节点的匹配节点:“// * [contains(。,'$ search')] / * [FALSE = contains(。,'$ search') ] / .. | // * [包含(。,'$ search')和count(*)= 0]“
打开App,查看更多内容
随时随地看视频慕课网APP