牧羊人nacy
您基本上有三种解决方案:1)编写自己的实现diff;2)破解difflib模块;3)找到解决方法。你自己的实现在情况 1) 中,您可以查看此问题 并阅读一些书籍,例如CLRS或 Robert Sedgewick 的书籍。破解difflib模块在情况 2) 中,查看源代码:在第 479 行get_matching_blocks调用。在 的核心中,您拥有将列表元素映射到它们在列表中的索引的字典。如果你覆盖这本字典,你就可以实现你想要的。这是标准版本:find_longest_matchfind_longest_matchb2jab>>> import difflib>>> from difflib import SequenceMatcher>>> list3 = ["orange","apple","lemons","grape"]>>> list4 = ["pears", "oranges","apple", "lemon", "cherry", "grapes"]>>> s = SequenceMatcher(None, list3, list4)>>> s.get_matching_blocks()[Match(a=1, b=2, size=1), Match(a=4, b=6, size=0)]>>> [(b.a+i, b.b+i, list3[b.a+i], list4[b.b+i]) for b in s.get_matching_blocks() for i in range(b.size)][(1, 2, 'apple', 'apple')]这是被黑的版本:>>> s = SequenceMatcher(None, list3, list4)>>> s.b2j{'pears': [0], 'oranges': [1], 'apple': [2], 'lemon': [3], 'cherry': [4], 'grapes': [5]}>>> s.b2j = {**s.b2j, 'orange':s.b2j['oranges'], 'lemons':s.b2j['lemon'], 'grape':s.b2j['grapes']}>>> s.b2j{'pears': [0], 'oranges': [1], 'apple': [2], 'lemon': [3], 'cherry': [4], 'grapes': [5], 'orange': [1], 'lemons': [3], 'grape': [5]}>>> s.get_matching_blocks()[Match(a=0, b=1, size=3), Match(a=3, b=5, size=1), Match(a=4, b=6, size=0)]>>> [(b.a+i, b.b+i, list3[b.a+i], list4[b.b+i]) for b in s.get_matching_blocks() for i in range(b.size)][(0, 1, 'orange', 'oranges'), (1, 2, 'apple', 'apple'), (2, 3, 'lemons', 'lemon'), (3, 5, 'grape', 'grapes')]这并不难自动化,但我不建议您使用该解决方案,因为有一个非常简单的解决方法。解决方法这个想法是按家庭对单词进行分组:families = [{"pears", "peras"}, {"orange", "oranges", "naranjas"}, {"apple", "manzana"}, {"lemons", "lemon", "limón"}, {"cherry", "cereza"}, {"grape", "grapes"}]现在很容易创建一个字典,将家庭中的每个单词映射到这些单词中的一个(让我们称之为主词):>>> d = {w:main for main, *alternatives in map(list, families) for w in alternatives}>>> d{'pears': 'peras', 'orange': 'naranjas', 'oranges': 'naranjas', 'manzana': 'apple', 'lemon': 'lemons', 'limón': 'lemons', 'cherry': 'cereza', 'grape': 'grapes'}请注意,main, *alternatives in map(list, families)使用星号运算符将家庭分解为一个主要词(列表的第一个)和一个替代列表:>>> head, *tail = [1,2,3,4,5]>>> head1>>> tail[2, 3, 4, 5]然后,您可以将列表转换为仅使用主要词:>>> list3=["orange","apple","lemons","grape"]>>> list4=["pears", "oranges","apple", "lemon", "cherry", "grapes"]>>> list5=["peras", "naranjas", "manzana", "limón", "cereza", "uvas"]>>> [d.get(w, w) for w in list3]['naranjas', 'apple', 'limón', 'grapes']>>> [d.get(w, w) for w in list4]['peras', 'naranjas', 'apple', 'limón', 'cereza', 'grapes']>>> [d.get(w, w) for w in list5]['peras', 'naranjas', 'apple', 'limón', 'cereza', 'uvas']表达式d.get(w, w)将返回d[w]ifw是一个键, elsew本身。因此,属于一个族的词被转换为该族的主要词,而其他词保持不变。这些列表很容易与difflib.重要提示:与 diff 算法相比,列表转换的时间复杂度可以忽略不计,因此您不应看到差异。完整代码作为奖励,完整代码:def match_seq(list1, list2): """A generator that yields matches of list1 vs list2""" s = SequenceMatcher(None, list1, list2) for block in s.get_matching_blocks(): for i in range(block.size): yield block.a + i, block.b + i # you don't need to store the matches, just yields themdef create_convert(*families): """Return a converter function that converts a list to the same list with only main words""" d = {w:main for main, *alternatives in map(list, families) for w in alternatives} return lambda L: [d.get(w, w) for w in L]families = [{"pears", "peras"}, {"orange", "oranges", "naranjas"}, {"apple", "manzana"}, {"lemons", "lemon", "limón"}, {"cherry", "cereza"}, {"grape", "grapes", "uvas"}]convert = create_convert(*families)list3=["orange","apple","lemons","grape"]list4=["pears", "oranges","apple", "lemon", "cherry", "grapes"]list5=["peras", "naranjas", "manzana", "limón", "cereza", "uvas"]print ("list3 vs list4")for a,b in match_seq(convert(list3), convert(list4)): print(a,b, list3[a],list4[b])# list3 vs list4# 0 1 orange oranges# 1 2 apple apple# 2 3 lemons lemon# 3 5 grape grapesprint ("list3 vs list5")for a,b in match_seq(convert(list3), convert(list5)): print(a,b, list3[a],list5[b])# list3 vs list5# 0 1 orange naranjas# 1 2 apple manzana# 2 3 lemons limón# 3 5 grape uvas
慕的地10843
下面是使用一类,从继承的方法UserString和覆盖__eq__()和__hash__()这样的字符串视为同义词评估作为平等的:import collectionsfrom difflib import SequenceMatcherclass SynonymString(collections.UserString): def __init__(self, seq, synonyms, inverse_synonyms): super().__init__(seq) self.synonyms = synonyms self.inverse_synonyms = inverse_synonyms def __eq__(self, other): if self.synonyms.get(other) and self.data in self.synonyms.get(other): return True return self.data == other def __hash__(self): if str(self.data) in self.inverse_synonyms: return hash(self.inverse_synonyms[self.data]) return hash(self.data)def match_seq_syn(list1, list2, synonyms): inverse_synonyms = { string: key for key, value in synonyms.items() for string in value } list1 = [SynonymString(s, synonyms, inverse_synonyms) for s in list1] list2 = [SynonymString(s, synonyms, inverse_synonyms) for s in list2] output = [] s = SequenceMatcher(None, list1, list2) blocks = s.get_matching_blocks() for bl in blocks: for bi in range(bl.size): cur_a = bl.a + bi cur_b = bl.b + bi output.append((cur_a, cur_b)) return outputlist3 = ["orange", "apple", "lemons", "grape"]list5 = ["peras", "naranjas", "manzana", "limón", "cereza", "uvas"]synonyms = { "orange": ["oranges", "naranjas"], "apple": ["manzana"], "pears": ["peras"], "lemon": ["lemons", "limón"], "cherry": ["cereza"], "grape": ["grapes", "uvas"],}for a, b in match_seq_syn(list3, list5, synonyms): print(a, b, list3[a], list5[b])结果(比较列表 3 和 5):0 1 橙色 naranjas1 2 苹果曼扎纳2 3 个柠檬3 5 葡萄藤
呼唤远方
因此,假设您想用应该相互匹配的元素填充列表。我没有使用任何库,但Generators。我不确定效率,我试过这个代码一次,但我认为它应该工作得很好。orange_list = ["orange", "oranges"] # Fill this with orange matching wordspear_list = ["pear", "pears"]lemon_list = ["lemon", "lemons"]apple_list = ["apple", "apples"]grape_list = ["grape", "grapes"]lists = [orange_list, pear_list, lemon_list, apple_list, grape_list] # Put your matching lists inside this listdef match_seq_bol(list1, list2): output=[] for x in list1: for lst in lists: matches = (y for y in list2 if (x in lst and y in lst)) if matches: for i in matches: output.append((list1.index(x), list2.index(i), x,i)) return output;list3=["orange","apple","lemons","grape"]list4=["pears", "oranges","apple", "lemon", "cherry", "grapes"]print(match_seq_bol(list3, list4))match_seq_bol()表示基于列表的匹配序列。输出匹配list3和list4将是:[ (0, 1, 'orange', 'oranges'), (1, 2, 'apple', 'apple'), (2, 3, 'lemons', 'lemon'), (3, 5, 'grape', 'grapes')]