构建大型词到索引到词词典的最有效数据结构是什么?

我想索引大量的字符串(将每个字符串映射到一个数值),但也能够从其数字索引中检索每个字符串。

由于内存问题,使用哈希表或 python dict 不是一种选择,所以我决定使用基数树来存储字符串,我可以非常快速地检索任何字符串的索引并处理非常大量的字符串。

我的问题是我还需要从它们的数字索引中检索字符串,如果我维护一个“反向索引”列表 [string1, string2, ..., stringn],我将失去 Trie 的内存优势。

我想也许“反向索引”可能是指向某种 Trie 结构的最后一个节点的指针列表,但首先,python 中没有指针,其次我不确定我是否可以拥有“节点级” " 访问我当前使用的 Trie 结构。

这种数据结构是否已经存在?如果不是,你将如何在 python 中做到这一点?


交互式爱情
浏览 165回答 2
2回答

慕无忌1623718

您需要两个同步的数据结构用于键和值查找,每个都保存对另一个叶节点的引用。ID 查找的结构可以是任何具有足够效率的结构——平衡树、哈希表、另一个树。为了能够从叶节点引用中提取值,trie 需要允许 1) 叶节点引用本身(不一定是真正的 Python 引用,它的 API 可以使用的任何东西);2)走上trie从该引用中提取单词。请注意,引用实际上是一个唯一的整数,因此如果您的 ID 不大于一个整数,那么将某些内容作为 ID 重用是有意义的——例如,trie 节点引用自身。然后,如果 trie API 可以验证这样的引用(即判断它是否具有具有这样引用的已用节点),这将充当 ID 查找,您根本不需要第二个结构!这样,尽管进程和运行之间的引用值(有效的内存地址)会发生变化,但 ID 将是非持久的。

忽然笑

我在回答自己,因为我最终只使用 python3 内置函数创建了我自己的数据结构,它非常适合我遇到的单词到索引到单词的问题。我试图让它干净高效,但显然还有改进的空间,C 绑定会更好。所以最终的结果是一个 indexedtrie 类,它看起来像一个 python dict(或者 defaultdict,如果你用 default_factory 参数调用它)但也可以像列表一样查询,因为自动维护了一种“反向索引”。存储在内部基数树中的键可以是任何可下标的对象(字节、字符串、元组、列表)以及您想要在其中存储任何内容的值。indextrie 类也是可选的,您可以从 radix 尝试的关于“前缀搜索”和此类事情的优势中受益!树中的每个键都与一个唯一的整数索引相关联,您可以检索带有索引的键或带有键的索引,整个过程快速且内存安全,所以我个人认为这是最好的数据结构之一世界,它应该被集成到 python 标准库中 :)。话不多说,下面是代码,随意改编和使用:"""A Python3 indexed trie class.An indexed trie's key can be any subscriptable object.&nbsp;Keys of the indexed trie are stored using a "radix trie", a space-optimized data-structure which has many advantages (see https://en.wikipedia.org/wiki/Radix_tree).Also, each key in the indexed trie is associated to a unique index which is build dynamically.Indexed trie is used like a python dictionary (and even a collections.defaultdict if you want to) but its values can also be accessed or updated (but not created) like a list!Example:&nbsp; &nbsp; >>> t = indextrie()&nbsp; &nbsp; >>> t["abc"] = "hello"&nbsp; &nbsp; >>> t[0]&nbsp; &nbsp; 'hello'&nbsp; &nbsp; >>> t["abc"]&nbsp; &nbsp; 'hello'&nbsp; &nbsp; >>> t.index2key(0)&nbsp; &nbsp; 'abc'&nbsp; &nbsp; >>> t.key2index("abc")&nbsp; &nbsp; 0&nbsp; &nbsp; >>> t[:]&nbsp; &nbsp; [0]&nbsp; &nbsp; >>> print(t)&nbsp; &nbsp; {(0, 'abc'): hello}"""__author__ = "@fbparis"_SENTINEL = object()class _Node(object):&nbsp; &nbsp; """&nbsp; &nbsp; A single node in the trie.&nbsp; &nbsp; """&nbsp; &nbsp; __slots__ = "_children", "_parent", "_index", "_key"&nbsp; &nbsp; def __init__(self, key, parent, index=None):&nbsp; &nbsp; &nbsp; &nbsp; self._children = set()&nbsp; &nbsp; &nbsp; &nbsp; self._key = key&nbsp; &nbsp; &nbsp; &nbsp; self._parent = parent&nbsp; &nbsp; &nbsp; &nbsp; self._index = index&nbsp; &nbsp; &nbsp; &nbsp; self._parent._children.add(self)class IndexedtrieKey(object):&nbsp; &nbsp; """&nbsp; &nbsp; A pair (index, key) acting as an indexedtrie's key&nbsp; &nbsp; """&nbsp; &nbsp; __slots__ = "index", "key"&nbsp; &nbsp; def __init__(self, index, key):&nbsp; &nbsp; &nbsp; &nbsp; self.index = index&nbsp; &nbsp; &nbsp; &nbsp; self.key = key&nbsp; &nbsp; def __repr__(self):&nbsp; &nbsp; &nbsp; &nbsp; return "(%d, %s)" % (self.index, self.key)class indexedtrie(object):&nbsp; &nbsp; """&nbsp; &nbsp; The indexed trie data-structure.&nbsp; &nbsp; """&nbsp; &nbsp; __slots__ = "_children", "_indexes", "_values", "_nodescount", "_default_factory"&nbsp; &nbsp; def __init__(self, items=None, default_factory=_SENTINEL):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; A list of items can be passed to initialize the indexed trie.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; self._children = set()&nbsp; &nbsp; &nbsp; &nbsp; self.setdefault(default_factory)&nbsp; &nbsp; &nbsp; &nbsp; self._indexes = []&nbsp; &nbsp; &nbsp; &nbsp; self._values = []&nbsp; &nbsp; &nbsp; &nbsp; self._nodescount = 0 # keeping track of nodes count is purely informational&nbsp; &nbsp; &nbsp; &nbsp; if items is not None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for k, v in items:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if isinstance(k, IndexedtrieKey):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self.__setitem__(k.key, v)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self.__setitem__(k, v)&nbsp; &nbsp; @classmethod&nbsp; &nbsp; def fromkeys(cls, keys, value=_SENTINEL, default_factory=_SENTINEL):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Build a new indexedtrie from a list of keys.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; obj = cls(default_factory=default_factory)&nbsp; &nbsp; &nbsp; &nbsp; for key in keys:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if value is _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if default_factory is not _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; obj[key] = obj._default_factory()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; obj[key] = None&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; obj[key] = value&nbsp; &nbsp; &nbsp; &nbsp; return obj&nbsp; &nbsp; @classmethod&nbsp; &nbsp; def fromsplit(cls, keys, value=_SENTINEL, default_factory=_SENTINEL):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Build a new indexedtrie from a splitable object.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; obj = cls(default_factory=default_factory)&nbsp; &nbsp; &nbsp; &nbsp; for key in keys.split():&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if value is _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if default_factory is not _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; obj[key] = obj._default_factory()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; obj[key] = None&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; obj[key] = value&nbsp; &nbsp; &nbsp; &nbsp; return obj&nbsp; &nbsp; def setdefault(self, factory=_SENTINEL):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if factory is not _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # indexed trie will act like a collections.defaultdict except in some cases because the __missing__&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # method is not implemented here (on purpose).&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # That means that simple lookups on a non existing key will return a default value without adding&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # the key, which is the more logical way to do.&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Also means that if your default_factory is for example "list", you won't be able to create new&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # items with "append" or "extend" methods which are updating the list itself.&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Instead you have to do something like trie["newkey"] += [...]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _ = factory()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; except TypeError:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # a default value is also accepted as default_factory, even "None"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._default_factory = lambda: factory&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._default_factory = factory&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._default_factory = _SENTINEL&nbsp; &nbsp; def copy(self):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Return a pseudo-shallow copy of the indexedtrie.&nbsp; &nbsp; &nbsp; &nbsp; Keys and nodes are deepcopied, but if you store some referenced objects in values, only the references will be copied.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; return self.__class__(self.items(), default_factory=self._default_factory)&nbsp; &nbsp; def __len__(self):&nbsp; &nbsp; &nbsp; &nbsp; return len(self._indexes)&nbsp; &nbsp; def __repr__(self):&nbsp; &nbsp; &nbsp; &nbsp; if self._default_factory is not _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; default = ", default_value=%s" % self._default_factory()&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; default = ""&nbsp; &nbsp; &nbsp; &nbsp; return "<%s object at %s: %d items, %d nodes%s>" % (self.__class__.__name__, hex(id(self)), len(self), self._nodescount, default)&nbsp; &nbsp; def __str__(self):&nbsp; &nbsp; &nbsp; &nbsp; ret = ["%s: %s" % (k, v) for k, v in self.items()]&nbsp; &nbsp; &nbsp; &nbsp; return "{%s}" % ", ".join(ret)&nbsp; &nbsp; def __iter__(self):&nbsp; &nbsp; &nbsp; &nbsp; return self.keys()&nbsp; &nbsp; def __contains__(self, key_or_index):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Return True if the key or index exists in the indexed trie.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if isinstance(key_or_index, IndexedtrieKey):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return key_or_index.index >= 0 and key_or_index.index < len(self)&nbsp; &nbsp; &nbsp; &nbsp; if isinstance(key_or_index, int):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return key_or_index >= 0 and key_or_index < len(self)&nbsp; &nbsp; &nbsp; &nbsp; if self._seems_valid_key(key_or_index):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self._get_node(key_or_index)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; except KeyError:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return False&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return node._index is not None&nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("invalid key type")&nbsp; &nbsp; def __getitem__(self, key_or_index):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if isinstance(key_or_index, IndexedtrieKey):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return self._values[key_or_index.index]&nbsp; &nbsp; &nbsp; &nbsp; if isinstance(key_or_index, int) or isinstance(key_or_index, slice):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return self._values[key_or_index]&nbsp; &nbsp; &nbsp; &nbsp; if self._seems_valid_key(key_or_index):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self._get_node(key_or_index)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; except KeyError:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if self._default_factory is _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return self._default_factory()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if node._index is None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if self._default_factory is _SENTINEL:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise KeyError&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return self._default_factory()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return self._values[node._index]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("invalid key type")&nbsp; &nbsp; def __setitem__(self, key_or_index, value):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if isinstance(key_or_index, IndexedtrieKey):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._values[key_or_index.index] = value&nbsp; &nbsp; &nbsp; &nbsp; elif isinstance(key_or_index, int):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._values[key_or_index] = value&nbsp; &nbsp; &nbsp; &nbsp; elif isinstance(key_or_index, slice):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise NotImplementedError&nbsp; &nbsp; &nbsp; &nbsp; elif self._seems_valid_key(key_or_index):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self._get_node(key_or_index)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; except KeyError:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # create a new node&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._add_node(key_or_index, value)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if node._index is None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # if node exists but not indexed, we index it and update the value&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._add_to_index(node, value)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # else we update its value&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._values[node._index] = value&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("invalid key type")&nbsp; &nbsp; def __delitem__(self, key_or_index):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if isinstance(key_or_index, IndexedtrieKey):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self._indexes[key_or_index.index]&nbsp; &nbsp; &nbsp; &nbsp; elif isinstance(key_or_index, int):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self._indexes[key_or_index]&nbsp; &nbsp; &nbsp; &nbsp; elif isinstance(key_or_index, slice):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise NotImplementedError&nbsp; &nbsp; &nbsp; &nbsp; elif self._seems_valid_key(key_or_index):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self._get_node(key_or_index)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if node._index is None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise KeyError&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("invalid key type")&nbsp; &nbsp; &nbsp; &nbsp; # switch last index with deleted index (except if deleted index is last index)&nbsp; &nbsp; &nbsp; &nbsp; last_node, last_value = self._indexes.pop(), self._values.pop()&nbsp; &nbsp; &nbsp; &nbsp; if node._index != last_node._index:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; last_node._index = node._index&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._indexes[node._index] = last_node&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._values[node._index] = last_value&nbsp; &nbsp; &nbsp; &nbsp; if len(node._children) > 1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #case 1: node has more than 1 child, only turn index off&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node._index = None&nbsp; &nbsp; &nbsp; &nbsp; elif len(node._children) == 1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # case 2: node has 1 child&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; child = node._children.pop()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; child._key = node._key + child._key&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; child._parent = node._parent&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node._parent._children.add(child)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node._parent._children.remove(node)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; del(node)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._nodescount -= 1&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # case 3: node has no child, check the parent node&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent = node._parent&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent._children.remove(node)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; del(node)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._nodescount -= 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if hasattr(parent, "_index"):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if parent._index is None and len(parent._children) == 1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = parent._children.pop()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node._key = parent._key + node._key&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node._parent = parent._parent&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent._parent._children.add(node)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent._parent._children.remove(parent)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; del(parent)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._nodescount -= 1&nbsp; &nbsp; @staticmethod&nbsp; &nbsp; def _seems_valid_key(key):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Return True if "key" can be a valid key (must be subscriptable).&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _ = key[:0]&nbsp; &nbsp; &nbsp; &nbsp; except TypeError:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return False&nbsp; &nbsp; &nbsp; &nbsp; return True&nbsp; &nbsp; def keys(self, prefix=None):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Yield keys stored in the indexedtrie where key is a IndexedtrieKey object.&nbsp; &nbsp; &nbsp; &nbsp; If prefix is given, yield only keys of items with key matching the prefix.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if prefix is None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for i, node in enumerate(self._indexes):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield IndexedtrieKey(i, self._get_key(node))&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if self._seems_valid_key(prefix):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; empty = prefix[:0]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; children = [(empty, prefix, child) for child in self._children]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while len(children):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _children = []&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for key, prefix, child in children:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if prefix == child._key[:len(prefix)]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _key = key + child._key&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _children.extend([(_key, empty, _child) for _child in child._children])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if child._index is not None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield IndexedtrieKey(child._index, _key)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif prefix[:len(child._key)] == child._key:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _prefix = prefix[len(child._key):]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _key = key + prefix[:len(child._key)]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _children.extend([(_key, _prefix, _child) for _child in child._children])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; children = _children&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise ValueError("invalid prefix type")&nbsp; &nbsp; def values(self, prefix=None):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Yield values stored in the indexedtrie.&nbsp; &nbsp; &nbsp; &nbsp; If prefix is given, yield only values of items with key matching the prefix.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if prefix is None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for value in self._values:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield value&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for key in self.keys(prefix):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield self._values[key.index]&nbsp; &nbsp; def items(self, prefix=None):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Yield (key, value) pairs stored in the indexedtrie where key is a IndexedtrieKey object.&nbsp; &nbsp; &nbsp; &nbsp; If prefix is given, yield only (key, value) pairs of items with key matching the prefix.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; for key in self.keys(prefix):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield key, self._values[key.index]&nbsp; &nbsp; def show_tree(self, node=None, level=0):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Pretty print the internal trie (recursive function).&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if node is None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self&nbsp; &nbsp; &nbsp; &nbsp; for child in node._children:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print("-" * level + "<key=%s, index=%s>" % (child._key, child._index))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if len(child._children):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self.show_tree(child, level + 1)&nbsp; &nbsp; def _get_node(self, key):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Return the node associated to key or raise a KeyError.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; children = self._children&nbsp; &nbsp; &nbsp; &nbsp; while len(children):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; notfound = True&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for child in children:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if key == child._key:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return child&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if child._key == key[:len(child._key)]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; children = child._children&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; key = key[len(child._key):]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; notfound = False&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if notfound:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; raise KeyError&nbsp; &nbsp; def _add_node(self, key, value):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Add a new key in the trie and updates indexes and values.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; children = self._children&nbsp; &nbsp; &nbsp; &nbsp; parent = self&nbsp; &nbsp; &nbsp; &nbsp; moved = None&nbsp; &nbsp; &nbsp; &nbsp; done = len(children) == 0&nbsp; &nbsp; &nbsp; &nbsp; # we want to insert key="abc"&nbsp; &nbsp; &nbsp; &nbsp; while not done:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; done = True&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for child in children:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # assert child._key != key # uncomment if you don't trust me&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if child._key == key[:len(child._key)]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # case 1: child's key is "ab", insert "c" in child's children&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent = child&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; children = child._children&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; key = key[len(child._key):]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; done = len(children) == 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif key == child._key[:len(key)]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # case 2: child's key is "abcd", we insert "abc" in place of the child&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # child's parent will be the inserted node and child's key is now "d"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent = child._parent&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; moved = child&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent._children.remove(moved)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; moved._key = moved._key[len(key):]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif type(key) is type(child._key): # don't mess it up&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # find longest common prefix&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; prefix = key[:0]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for i, c in enumerate(key):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if child._key[i] != c:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; prefix = key[:i]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if prefix:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # case 3: child's key is abd, we spawn a new node with key "ab"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # to replace child ; child's key is now "d" and child's parent is&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # the new created node.&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # the new node will also be inserted as a child of this node&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # with key "c"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = _Node(prefix, child._parent)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._nodescount += 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; child._parent._children.remove(child)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; child._key = child._key[len(prefix):]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; child._parent = node&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node._children.add(child)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; key = key[len(prefix):]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent = node&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; # create the new node&nbsp; &nbsp; &nbsp; &nbsp; node = _Node(key, parent)&nbsp; &nbsp; &nbsp; &nbsp; self._nodescount += 1&nbsp; &nbsp; &nbsp; &nbsp; if moved is not None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # if we have moved an existing node, update it&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; moved._parent = node&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node._children.add(moved)&nbsp; &nbsp; &nbsp; &nbsp; self._add_to_index(node, value)&nbsp; &nbsp; def _get_key(self, node):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Rebuild key from a terminal node.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; key = node._key&nbsp; &nbsp; &nbsp; &nbsp; while node._parent is not self:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = node._parent&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; key = node._key + key&nbsp; &nbsp; &nbsp; &nbsp; return key&nbsp; &nbsp; def _add_to_index(self, node, value):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; Add a new node to the index.&nbsp; &nbsp; &nbsp; &nbsp; Also record its value.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; node._index = len(self)&nbsp; &nbsp; &nbsp; &nbsp; self._indexes.append(node)&nbsp; &nbsp; &nbsp; &nbsp; self._values.append(value)&nbsp; &nbsp; def key2index(self, key):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; key -> index&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if self._seems_valid_key(key):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node = self._get_node(key)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if node._index is not None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return node._index&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise KeyError&nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("invalid key type")&nbsp; &nbsp; def index2key(self, index):&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; index or IndexedtrieKey -> key.&nbsp; &nbsp; &nbsp; &nbsp; """&nbsp; &nbsp; &nbsp; &nbsp; if isinstance(index, IndexedtrieKey):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index = index.index&nbsp; &nbsp; &nbsp; &nbsp; elif not isinstance(index, int):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise TypeError("index must be an int")&nbsp; &nbsp; &nbsp; &nbsp; if index < 0 or index > len(self._indexes):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise IndexError&nbsp; &nbsp; &nbsp; &nbsp; return self._get_key(self._indexes[index])分享编辑跟随
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python