在 php 中使用回溯使用 ASCII 字符代码压缩字符串

我想可以使用 chars ASCII 代码来压缩字符串。我想使用数字模式来压缩它们。因为 ASCII 代码是数字,所以我想在 ASCII 字符代码列表中查找子模式。

理论

这将是我找到的每个模式的格式:

  • [nnn][n][nn],其中:

    • [nnn] 是第一个字符的 ASCII 代码,来自具有相同模式的组编号。

    • [n] 是特定模式/规则的自定义编号(我将在下面解释更多)。

    • [nn] 显示此规则发生的次数。

数字模式没有具体确定。但让我举一些例子:

  1. 相同的字符

  2. 线性增长(每个数字/ascii 都比以前大,其中一个)

  3. 线性减少(每个数字/ascii 都比以前小,其中一个)

现在让我们看看一些情况:

  • “adeflk”变为“097.1.01-100.2.03-108.3.02”

    • 相同的字符,线性增长3倍,线性下降2倍。

  • “rrrrrrrrrrrr”变为“114.1.11”

    • 相同的字符十一次。

  • “tsrqpozh”变为“116.3.06-122.1.01-104.1.01”

    • 线性减少六次,相同的字符,相同的字符。

我添加了点(“.”)和破折号(“-”),以便您可以轻松看到它们。

事实上,我们没有看到好的结果(压缩)。我想将此算法用于大字符串。并添加更多规则(数字模式),我们增加了更改以使结果比原始结果更短。我知道现有的压缩解决方案。我想要这个解决方案,因为结果只有数字,它对我有帮助。

慕仙森
浏览 143回答 2
2回答

白猪掌柜的

搜索渐变/中缀重复并不适合压缩自然语言。使用基于字典的方法压缩自然语言要容易得多(与压缩数据捆绑在一起的动态字典,以及在参考集上训练的预编译字典),因为即使是 ASCII 编码中的重复序列通常也不遵循任何规则。微不足道的几何图案,但当仅观察单个字符的序数表示时,显得相当随机。也就是说,您的算法如此慢的原因是因为您正在探索所有可能的模式,这会导致输入长度的运行时间呈指数增长,准确地说是O(5^n)。对于您自己设定的在一组 5 个任意规则中找到理想压缩的目标来说,这已经是尽可能好的了。如果有的话,您只能将运行时间复杂度降低一个常数因子,但无法摆脱指数运行时间。换句话说,即使您应用了完美的优化,也只会将您可以处理的最大输入长度增加 30-50%,然后您就不可避免地会再次遇到超时。@noam 的解决方案甚至不尝试找到理想的模式,而只是贪婪地使用第一个匹配模式来消耗输入。因此,它会错误地忽略更好的匹配,但作为回报,它也只需查看每个输入字符一次,从而导致线性运行时间复杂度O(n)。当然,您当前的解决方案中有一些细节使得解决起来更加容易,只需基于对规则的简单观察即可。但请注意,当您尝试添加更多规则时,这些假设将会被打破。具体来说,如果您明智地选择尝试规则的顺序,则可以避免大部分回溯:首先尝试开始一个新的几何图案,并且仅当它与前面至少3个字符ord(n[i])=ord(n[0])+i匹配时才接受为匹配。尝试延续当前的几何图案。尝试继续当前的渐变模式。尝试开始新的渐变,并且仅当它与前面至少 2 个字符ord(n[i])=ord(n[0])+i匹配时才接受匹配。尝试最后开始/继续简单的重复,并始终接受。一旦输入中的字符被任何这些规则接受(意味着它已被序列消耗),您将不再需要从它回溯或检查它的任何其他规则,因为您已经找到了最佳的表示形式它。您仍然需要重新检查添加到序列中的每个后续字符的规则,因为可能需要渐变规则的后缀作为几何规则的前缀。一般来说,规则集中允许这样做的模式是这样的事实:对于具有较高优先级的每个规则,该规则的匹配项不能在任何后续规则中具有更好的匹配项。如果您愿意,您可以轻松地为您的集合中的每对可能规则进行正式证明。如果您想测试您的实现,您应该专门测试诸如ABDHIK. 即使与H当前运行的几何数列相匹配ABDH,但将其作为新的几何数列的起点HIK无疑是更好的选择。

慕虎7371278

我对你的问题提出了一个初步的解决方案。请注意:你永远不会得到只有一个字母的序列,因为每两个连续的字母都是具有一定差异的“线性增长”。我的解决方案不是很干净。例如,您可以将$matches和组合$rules到一个数组中。我的解决方案是幼稚和贪婪的。例如,在示例中adeflk,序列def是 3 的序列,但因为我的解决方案是贪婪的,所以它会考虑ad作为 2 的序列,并ef作为另一个 2 的序列。话虽如此,您仍然可以改进我的代码。该代码很难测试。您可能应该使用 OOP 并将代码划分为许多易于单独测试的小方法。<?phpfunction compress($string, $rules, $matches) {&nbsp; &nbsp; if ($string === '') {&nbsp; &nbsp; &nbsp; &nbsp; return getBestMatch($matches);&nbsp; &nbsp; }&nbsp; &nbsp; $currentCharacter = $string[0];&nbsp; &nbsp; $matchFound = false;&nbsp; &nbsp; foreach ($rules as $index => &$rule) {&nbsp; &nbsp; &nbsp; &nbsp; if ($rule['active']) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $soFarLength = strlen($matches[$index]);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if ($soFarLength === 0) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $matchFound = true;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $matches[$index] = $currentCharacter;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } elseif ($rule['callback']($currentCharacter, $matches[$index])) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $matches[$index] .= $currentCharacter;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $matchFound = true;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $rule['active'] = false;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; if ($matchFound) {&nbsp; &nbsp; &nbsp; &nbsp; return compress(substr($string, 1), $rules, $matches);&nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; return getBestMatch($matches) . startNewSequence($string);&nbsp; &nbsp; }}function getBestMatch($matches) {&nbsp; &nbsp; $rule = -1;&nbsp; &nbsp; $length = -1;&nbsp; &nbsp; foreach ($matches as $index => $match) {&nbsp; &nbsp; &nbsp; &nbsp; if (strlen($match) > $length) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $length = strlen($match);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $rule = $index;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; if ($length <= 0) {&nbsp; &nbsp; &nbsp; &nbsp; return '';&nbsp; &nbsp; }&nbsp; &nbsp; return ord($matches[$rule][0]) . '.' . $rule . '.' . $length . "\n";}function startNewSequence($string) {&nbsp; &nbsp; $rules = [&nbsp; &nbsp; &nbsp; &nbsp; // rule number 1 - all characters are the same&nbsp; &nbsp; &nbsp; &nbsp; 1 => [&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'active' => true,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'callback' => function ($a, $b) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return $a === substr($b, -1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; ],&nbsp; &nbsp; &nbsp; &nbsp; // rule number 2 - ASCII code of current letter is one more than the last letter ("linear growth")&nbsp; &nbsp; &nbsp; &nbsp; 2 => [&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'active' => true,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'callback' => function ($a, $b) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return ord($a) === (1 + ord(substr($b, -1)));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; ],&nbsp; &nbsp; &nbsp; &nbsp; // rule number 3 - ASCII code is a geometric progression. The ord() of each character increases with each step.&nbsp; &nbsp; &nbsp; &nbsp; 3 => [&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'active' => true,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'callback' => function ($a, $b) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (strlen($b) == 1) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return ord($a) > ord($b);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $lastCharOrd = ord(substr($b, -1));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $oneBeforeLastCharOrd = ord(substr($b, -2, 1));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $lastDiff = $lastCharOrd - $oneBeforeLastCharOrd;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $currentOrd = ord($a);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return ($currentOrd - $lastCharOrd) === ($lastDiff + 1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; ],&nbsp; &nbsp; &nbsp; &nbsp; // rule number 4 - ASCII code of current letter is one less than the last letter ("linear decrease")&nbsp; &nbsp; &nbsp; &nbsp; 4 => [&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'active' => true,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'callback' => function ($a, $b) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return ord($a) === (ord(substr($b, -1)) - 1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; ],&nbsp; &nbsp; &nbsp; &nbsp; // rule number 5 - ASCII code is a negative geometric progression. The ord() of each character decreases by one&nbsp; &nbsp; &nbsp; &nbsp; // with each step.&nbsp; &nbsp; &nbsp; &nbsp; 5 => [&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'active' => true,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'callback' => function ($a, $b) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (strlen($b) == 1) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return ord($a) < ord($b);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $lastCharOrd = ord(substr($b, -1));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $oneBeforeLastCharOrd = ord(substr($b, -2, 1));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $lastDiff = $lastCharOrd - $oneBeforeLastCharOrd;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $currentOrd = ord($a);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return ($currentOrd - $lastCharOrd) === ($lastDiff - 1);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; ],&nbsp; &nbsp; ];&nbsp; &nbsp; $matches = [&nbsp; &nbsp; &nbsp; &nbsp; 1 => '',&nbsp; &nbsp; &nbsp; &nbsp; 2 => '',&nbsp; &nbsp; &nbsp; &nbsp; 3 => '',&nbsp; &nbsp; &nbsp; &nbsp; 4 => '',&nbsp; &nbsp; &nbsp; &nbsp; 5 => '',&nbsp; &nbsp; ];&nbsp; &nbsp; return compress($string, $rules, $matches);}echo startNewSequence('tsrqpozh');
打开App,查看更多内容
随时随地看视频慕课网APP