Python:将字母数字字符串可逆地编码为整数

我想将字符串(由字母数字字符组成)转换为整数,然后将此整数转换回字符串:

string --> int --> string

换句话说,我想用整数表示一个字母数字字符串。

我找到了一个可行的解决方案,我将其包含在答案中,但我认为这不是最佳解决方案,而且我对其他想法/方法感兴趣。

请不要仅仅因为已经存在很多类似的问题而将其标记为重复,我特别想要一种将字符串转换为整数的简单方法,反之亦然

这应该适用于包含字母数字字符的字符串,即包含数字和字母的字符串。


开满天机
浏览 281回答 3
3回答

侃侃尔雅

回想一下,字符串可以编码为字节,然后可以编码为整数。然后可以反转编码以获取字节后跟原始字符串。此编码器用于binascii生成与charel-f 的答案中相同的整数编码。我相信它是相同的,因为我对其进行了广泛的测试。from binascii import hexlify, unhexlifyclass BytesIntEncoder:&nbsp; &nbsp; @staticmethod&nbsp; &nbsp; def encode(b: bytes) -> int:&nbsp; &nbsp; &nbsp; &nbsp; return int(hexlify(b), 16) if b != b'' else 0&nbsp; &nbsp; @staticmethod&nbsp; &nbsp; def decode(i: int) -> int:&nbsp; &nbsp; &nbsp; &nbsp; return unhexlify('%x' % i) if i != 0 else b''如果您使用的是 Python <3.6,请删除可选的类型注释。快速测试:>>> s = 'Test123'>>> b = s.encode()>>> bb'Test123'>>> BytesIntEncoder.encode(b)23755444588720691>>> BytesIntEncoder.decode(_)b'Test123'>>> _.decode()'Test123'

慕后森

假设字符集只是字母数字,即 az AZ 0-9,这需要每个字符 6 位。因此,使用 8 位字节编码在理论上是对内存的低效使用。此答案将输入字节转换为 6 位整数序列。它使用按位运算将这些小整数编码为一个大整数。这是否真的转化为现实世界的存储效率是由 来衡量的sys.getsizeof,对于更大的字符串更有可能。此实现自定义了字符集选择的编码。例如,如果您只使用string.ascii_lowercase(5 位)而不是string.ascii_uppercase + string.digits(6 位),则编码将相应地高效。单元测试也包括在内。import stringclass BytesIntEncoder:&nbsp; &nbsp; def __init__(self, chars: bytes = (string.ascii_letters + string.digits).encode()):&nbsp; &nbsp; &nbsp; &nbsp; num_chars = len(chars)&nbsp; &nbsp; &nbsp; &nbsp; translation = ''.join(chr(i) for i in range(1, num_chars + 1)).encode()&nbsp; &nbsp; &nbsp; &nbsp; self._translation_table = bytes.maketrans(chars, translation)&nbsp; &nbsp; &nbsp; &nbsp; self._reverse_translation_table = bytes.maketrans(translation, chars)&nbsp; &nbsp; &nbsp; &nbsp; self._num_bits_per_char = (num_chars + 1).bit_length()&nbsp; &nbsp; def encode(self, chars: bytes) -> int:&nbsp; &nbsp; &nbsp; &nbsp; num_bits_per_char = self._num_bits_per_char&nbsp; &nbsp; &nbsp; &nbsp; output, bit_idx = 0, 0&nbsp; &nbsp; &nbsp; &nbsp; for chr_idx in chars.translate(self._translation_table):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; output |= (chr_idx << bit_idx)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; bit_idx += num_bits_per_char&nbsp; &nbsp; &nbsp; &nbsp; return output&nbsp; &nbsp; def decode(self, i: int) -> bytes:&nbsp; &nbsp; &nbsp; &nbsp; maxint = (2 ** self._num_bits_per_char) - 1&nbsp; &nbsp; &nbsp; &nbsp; output = bytes(((i >> offset) & maxint) for offset in range(0, i.bit_length(), self._num_bits_per_char))&nbsp; &nbsp; &nbsp; &nbsp; return output.translate(self._reverse_translation_table)# Testimport itertoolsimport randomimport unittestclass TestBytesIntEncoder(unittest.TestCase):&nbsp; &nbsp; chars = string.ascii_letters + string.digits&nbsp; &nbsp; encoder = BytesIntEncoder(chars.encode())&nbsp; &nbsp; def _test_encoding(self, b_in: bytes):&nbsp; &nbsp; &nbsp; &nbsp; i = self.encoder.encode(b_in)&nbsp; &nbsp; &nbsp; &nbsp; self.assertIsInstance(i, int)&nbsp; &nbsp; &nbsp; &nbsp; b_out = self.encoder.decode(i)&nbsp; &nbsp; &nbsp; &nbsp; self.assertIsInstance(b_out, bytes)&nbsp; &nbsp; &nbsp; &nbsp; self.assertEqual(b_in, b_out)&nbsp; &nbsp; &nbsp; &nbsp; # print(b_in, i)&nbsp; &nbsp; def test_thoroughly_with_small_str(self):&nbsp; &nbsp; &nbsp; &nbsp; for s_len in range(4):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for s in itertools.combinations_with_replacement(self.chars, s_len):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; s = ''.join(s)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b_in = s.encode()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._test_encoding(b_in)&nbsp; &nbsp; def test_randomly_with_large_str(self):&nbsp; &nbsp; &nbsp; &nbsp; for s_len in range(256):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; num_samples = {s_len <= 16: 2 ** s_len,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;16 < s_len <= 32: s_len ** 2,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;s_len > 32: s_len * 2,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;s_len > 64: s_len,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;s_len > 128: 2}[True]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # print(s_len, num_samples)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for _ in range(num_samples):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b_in = ''.join(random.choices(self.chars, k=s_len)).encode()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._test_encoding(b_in)if __name__ == '__main__':&nbsp; &nbsp; unittest.main()用法示例:>>> encoder = BytesIntEncoder()>>> s = 'Test123'>>> b = s.encode()>>> bb'Test123'>>> encoder.encode(b)3908257788270>>> encoder.decode(_)b'Test123'
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python