如何实现流的 Boyer-Moore 搜索

我将如何着手实施Boyer-Moore Search for streams?我了解如何为我们知道整个字符串长度的给定字符串实现这一点。但是,如果我们不知道字符串的大小(即它是任意长度的字节流)怎么办?

我正在尝试在 PHP 中实现它,因此 PHP 中的任何代码都会有所帮助。

这是我在 PHP 中实现的 Boyer-Moore 搜索:

function BoyerMooreSearch($haystack, $needle) {

    $needleLen = strlen($needle);

    $haystackLen = strlen($haystack);

    $table = computeSkipTable($needle);


    for ($i = $needleLen - 1; $i < $haystackLen;) { 

        $t = $i;


        for ($j = $needleLen - 1; $needle[$j] == $haystack[$i]; $j--, $i--) { 

            if($j == 0) {

                return $i;

            }

        }


        $i = $t;


        if(array_key_exists($haystack[$i], $table)) {

            $i = $i + max($table[$haystack[$i]], 1);

        } else {

            $i += $needleLen;

        }

    }

    return false;

}


function computeSkipTable($string) {

    $len = strlen($string);

    $table = [];


    for ($i=0; $i < $len; $i++) { 

        $table[$string[$i]] = $len - $i - 1; 

    }


    return $table;

}

如果我们给它一个 haystack string"barfoobazquix"和 needle string 就可以正常工作"foo",它会3按预期返回。但是,如果输入 haystack 是一个分成 4 字节块的流怎么办。第一个块是"barf",它不会返回任何匹配项,第二个块是 ,"ooba"它也不会返回任何匹配项,依此类推......


在这种情况下,我们永远无法"foo"在当前实现的流的任何缓冲块中找到子字符串。我正在努力调整当前的实现,以便即使搜索字符串被拆分成多个块也能正常工作。


皈依舞
浏览 175回答 1
1回答

慕的地10843

请注意,您只使用$needleLen流中的最新字符。所以你可以维护一个$needleLen由字符组成的滑动窗口,并根据需要推进它。此外,$haystackLen现在是未知的,应该用 EOF 检查代替。O(n)下面的代码效率低下,因为无论我们是按字符滑动窗口n还是仅按 1 个滑动窗口,滑动窗口总是需要时间。为了实现最佳的滑动复杂度,$window应该将其实现为循环字符缓冲区。// sample callvar_dump(BoyerMooreSearch(STDIN, 'foo'));function BoyerMooreSearch($resource, $needle) {&nbsp; &nbsp; $needleLen = strlen($needle);&nbsp; &nbsp; $table = computeSkipTable($needle);&nbsp; &nbsp; // build the first window&nbsp; &nbsp; if (($window = buildWindow($resource, $needleLen)) === false) {&nbsp; &nbsp; &nbsp; &nbsp; // special case: the text is shorter than the pattern&nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; }&nbsp; &nbsp; $i = 0;&nbsp; &nbsp; while (true) {&nbsp; &nbsp; &nbsp; &nbsp; // check for a match&nbsp; &nbsp; &nbsp; &nbsp; $j = $needleLen - 1;&nbsp; &nbsp; &nbsp; &nbsp; while ($j >= 0 && $needle[$j] == $window[$j]) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $j--;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if ($j < 0) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return $i;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; // determine slide distance&nbsp; &nbsp; &nbsp; &nbsp; $t = $i;&nbsp; &nbsp; &nbsp; &nbsp; $last = substr($window, -1);&nbsp; &nbsp; &nbsp; &nbsp; if (array_key_exists($last, $table)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $i = $i + max($table[$last], 1);&nbsp; &nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $i += $needleLen;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; // slide the window&nbsp; &nbsp; &nbsp; &nbsp; if (($window = slideWindow($window, $resource, $i - $t)) === false) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; return false;}/**&nbsp;* Initializes the sliding window to length $len.&nbsp;*&nbsp;* @return string A string of length $len or false if the $resource contains&nbsp;* less than $len characters.&nbsp;*/function buildWindow ($resource, $len) {&nbsp; &nbsp; $result = '';&nbsp; &nbsp; while ($len--) {&nbsp; &nbsp; &nbsp; &nbsp; $result .= fgetc($resource);&nbsp; &nbsp; }&nbsp; &nbsp; return feof($resource) ? false : $result;}/**&nbsp;* Slides $window by $count positions into $resource.&nbsp;*&nbsp;* @return string The new window or false on EOF.&nbsp;*/function slideWindow(&$window, $resource, $count) {&nbsp; &nbsp; $window = substr($window, $count) // discard the first $count chars&nbsp; &nbsp; &nbsp; &nbsp; . fread($resource, $count);&nbsp; &nbsp;// and read $count new chars&nbsp; &nbsp; return feof($resource) ? false : $window;}
打开App,查看更多内容
随时随地看视频慕课网APP