蝴蝶刀刀
txt='132GOasmHOMEwokdslNOWsdwkGO239NOW'pattern=['GO','HOME','NOW','GO','NOW']REPLACEMENT=['why','nope','later','aha','genes']for i,x in enumerate(pattern): txt = txt.replace(x,REPLACEMENT[i], 1)有趣的是,这里是时间测试,因为这个问题要求最有效。pattern=['GO','HOME','NOW','GO','NOW']REPLACEMENT=['why','nope','later','aha','genes']t = time.time()for z in xrange(1000000): txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW' for a,b in zip(pattern,REPLACEMENT): txt=txt.replace(a,b,1)print time.time() - tt = time.time()for z in xrange(1000000): txt2 = '132GOasmHOMEwokdslNOWsdwkGO239NOW' for i,x in enumerate(pattern): txt2 = txt2.replace(x,REPLACEMENT[i], 1)print time.time() - tt = time.time()for z in xrange(1000000): txt3 = '132GOasmHOMEwokdslNOWsdwkGO239NOW' x = dict(zip(reversed(pattern), reversed(REPLACEMENT))) for k in x: txt3 = txt3.replace(k,x[k], 1)print time.time() - tt = time.time()for z in xrange(1000000): txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW' new_d = iter(REPLACEMENT) new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)print time.time() - t结果是:2.570999860762.485000133513.504999876024.23699998856如您所见,枚举比zip效率更高,而其他两个不在同一范围内。
慕盖茨4494581
使用dict减少您需要迭代的项目数量,这对于某些长输入可能是有价值的。txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'pattern = ['GO','HOME','NOW','GO','NOW']REPLACEMENT = ['why','nope','later','aha','genes']x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))for k in x: txt = txt.replace(k,x[k], 1)print(txt)编辑:为了好玩,我为备份添加了一个基准,以说明减少一些需要迭代的项的数量对于某些长输入可能很有价值。当您使用琐碎的测试数据集时,最有效的方法并不总是显而易见的。 #! /usr/bin/env python# -*- coding: UTF8 -*- def alpha(pattern, REPLACEMENT, txt): for a,b in zip(pattern,REPLACEMENT): txt=txt.replace(a,b,1)def beta(pattern, REPLACEMENT, txt): for i,x in enumerate(pattern): txt = txt.replace(x,REPLACEMENT[i], 1)def gamma(pattern, REPLACEMENT, txt): x = dict(zip(reversed(pattern), reversed(REPLACEMENT))) for k in x: txt = txt.replace(k,x[k], 1)def delta(pattern, REPLACEMENT, txt): new_d = iter(REPLACEMENT) new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)if __name__ == '__main__': import timeit, re txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW' pattern = ['GO','HOME','NOW','GO','NOW'] REPLACEMENT = ['why','nope','later','aha','genes'] print("Trivial inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt))); print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT")) print("beta: ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta, txt, pattern, REPLACEMENT")) print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT")) print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT")) print("") txtcopy = txt patterncopy = pattern.copy() REPLACEMENTcopy = REPLACEMENT.copy() for _ in range(3): txt = txt + txtcopy pattern.extend(patterncopy) REPLACEMENT.extend(REPLACEMENTcopy) print("Small inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt))); print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT")) print("beta: ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta, txt, pattern, REPLACEMENT")) print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT")) print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT")) print("") txt = txtcopy pattern = patterncopy.copy() REPLACEMENT = REPLACEMENTcopy.copy() for _ in range(300): txt = txt + txtcopy pattern.extend(patterncopy) REPLACEMENT.extend(REPLACEMENTcopy) print("Larger inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt))); print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT")) print("beta: ", timeit.timeit("beta(pattern, REPLACEMENT, txt)", setup="from __main__ import beta, txt, pattern, REPLACEMENT")) print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT")) print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))结果:Trivial inputs: len(pattern): 5, len(REPLACEMENT): 5, len(txt): 33alpha: 4.60048107800003beta: 4.169088881999869gamma: 5.7612637450001785delta: 11.371387353000046Small inputs: len(pattern): 20, len(REPLACEMENT): 20, len(txt): 132alpha: 17.281149661999734beta: 15.131949634000193gamma: 7.339897444000144delta: 26.50896787900001Larger inputs: len(pattern): 1505, len(REPLACEMENT): 1505, len(txt): 9933alpha: 18766.660852467998beta: 17640.960064803gamma: 64.01868645999639delta: 901.3577002189995因此,对于平凡的输入,enumerate解决方案比zip快一点,比zip快很多iter。当输入的长度略微增加时,不删除重复项的成本开始显示出来,并且我的解决方案的运行时间不到一半。当运行包含大量重复项的长输入时,@ eatmeimadanish解决方案完成的时间比删除重复项时要花费27555%。哎哟。