当模式包含重复项时,如何在python中顺序替换模式

我有一个模式列表和一个替换列表。该模式包含重复元素,但它们对应于不同的替换。


txt=132GOasmHOMEwokdslNOWsdwkGO239NOW

pattern=['GO','HOME','NOW','GO','NOW']

REPLACEMENT=['why','nope','later','aha','genes']

所需的输出将是132whyasmnopewokdsllatersdwkaha239genes


完成顺序替换的最有效方法是什么?


茅侃侃
浏览 134回答 3
3回答

蝴蝶刀刀

txt='132GOasmHOMEwokdslNOWsdwkGO239NOW'pattern=['GO','HOME','NOW','GO','NOW']REPLACEMENT=['why','nope','later','aha','genes']for i,x in enumerate(pattern):    txt = txt.replace(x,REPLACEMENT[i], 1)有趣的是,这里是时间测试,因为这个问题要求最有效。pattern=['GO','HOME','NOW','GO','NOW']REPLACEMENT=['why','nope','later','aha','genes']t = time.time()for z in xrange(1000000):    txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'    for a,b in zip(pattern,REPLACEMENT):        txt=txt.replace(a,b,1)print time.time() - tt = time.time()for z in xrange(1000000):    txt2 = '132GOasmHOMEwokdslNOWsdwkGO239NOW'    for i,x in enumerate(pattern):        txt2 = txt2.replace(x,REPLACEMENT[i], 1)print time.time() - tt = time.time()for z in xrange(1000000):    txt3 = '132GOasmHOMEwokdslNOWsdwkGO239NOW'    x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))    for k in x:      txt3 = txt3.replace(k,x[k], 1)print time.time() - tt = time.time()for z in xrange(1000000):    txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'    new_d = iter(REPLACEMENT)    new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)print time.time() - t结果是:2.570999860762.485000133513.504999876024.23699998856如您所见,枚举比zip效率更高,而其他两个不在同一范围内。

眼眸繁星

您可以同时遍历两个列表,并且每次仅替换模式的第一个实例:for a,b in zip(pattern,REPLACEMENT):    txt=txt.replace(a,b,1)

慕盖茨4494581

使用dict减少您需要迭代的项目数量,这对于某些长输入可能是有价值的。txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'pattern = ['GO','HOME','NOW','GO','NOW']REPLACEMENT = ['why','nope','later','aha','genes']x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))for k in x:  txt = txt.replace(k,x[k], 1)print(txt)编辑:为了好玩,我为备份添加了一个基准,以说明减少一些需要迭代的项的数量对于某些长输入可能很有价值。当您使用琐碎的测试数据集时,最有效的方法并不总是显而易见的。 #! /usr/bin/env python# -*- coding: UTF8 -*- def alpha(pattern, REPLACEMENT, txt):  for a,b in zip(pattern,REPLACEMENT):    txt=txt.replace(a,b,1)def beta(pattern, REPLACEMENT, txt):  for i,x in enumerate(pattern):    txt = txt.replace(x,REPLACEMENT[i], 1)def gamma(pattern, REPLACEMENT, txt):  x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))  for k in x:    txt = txt.replace(k,x[k], 1)def delta(pattern, REPLACEMENT, txt):  new_d = iter(REPLACEMENT)  new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)if __name__ == '__main__':  import timeit, re  txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'  pattern = ['GO','HOME','NOW','GO','NOW']  REPLACEMENT = ['why','nope','later','aha','genes']  print("Trivial inputs:  len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));  print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))  print("beta:  ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta,  txt, pattern, REPLACEMENT"))  print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))  print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))  print("")  txtcopy = txt  patterncopy = pattern.copy()  REPLACEMENTcopy = REPLACEMENT.copy()  for _ in range(3):    txt = txt + txtcopy    pattern.extend(patterncopy)    REPLACEMENT.extend(REPLACEMENTcopy)  print("Small inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));  print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))  print("beta:  ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta,  txt, pattern, REPLACEMENT"))  print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))  print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))  print("")  txt = txtcopy  pattern = patterncopy.copy()  REPLACEMENT = REPLACEMENTcopy.copy()  for _ in range(300):    txt = txt + txtcopy    pattern.extend(patterncopy)    REPLACEMENT.extend(REPLACEMENTcopy)  print("Larger inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));  print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))  print("beta:  ", timeit.timeit("beta(pattern, REPLACEMENT, txt)", setup="from __main__ import beta,  txt, pattern, REPLACEMENT"))  print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))  print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))结果:Trivial inputs:  len(pattern): 5, len(REPLACEMENT): 5, len(txt): 33alpha:  4.60048107800003beta:   4.169088881999869gamma:  5.7612637450001785delta:  11.371387353000046Small inputs: len(pattern): 20, len(REPLACEMENT): 20, len(txt): 132alpha:  17.281149661999734beta:   15.131949634000193gamma:  7.339897444000144delta:  26.50896787900001Larger inputs: len(pattern): 1505, len(REPLACEMENT): 1505, len(txt): 9933alpha:  18766.660852467998beta:   17640.960064803gamma:  64.01868645999639delta:  901.3577002189995因此,对于平凡的输入,enumerate解决方案比zip快一点,比zip快很多iter。当输入的长度略微增加时,不删除重复项的成本开始显示出来,并且我的解决方案的运行时间不到一半。当运行包含大量重复项的长输入时,@ eatmeimadanish解决方案完成的时间比删除重复项时要花费27555%。哎哟。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python