我正在尝试编写一个程序,该程序在 DNA 序列的定义长度的元素中移动,我无法理解我从循环中获得的输出。对于循环的前四次迭代,它似乎可以很好地进行移码,然后似乎恢复到旧序列。我已经非常努力地理解这种行为,但我对编程还太陌生,无法解决这个问题,非常感谢任何帮助。
这是我的代码:
seq = "ACTGCATTTTGCATTTT"
search = "TGCATTTTG"
import regex as re
def kmers(text,n):
for a in text:
b = text[text.index(a):text.index(a)+n]
c = len(re.findall(b, text, overlapped=True))
print ("the count for " + b + " is " + str(c))
(kmers(seq,3))
和我的输出:
the count for ACT is 1
the count for CTG is 1
the count for TGC is 2
the count for GCA is 2
#I expected 'CAT' next, from here on I don't understand the behaviour
the count for CTG is 1
the count for ACT is 1
the count for TGC is 2
the count for TGC is 2
the count for TGC is 2
the count for TGC is 2
the count for GCA is 2
the count for CTG is 1
the count for ACT is 1
the count for TGC is 2
the count for TGC is 2
the count for TGC is 2
the count for TGC is 2
显然,最终我想删除重复项等,但是我一直在思考为什么我的 for 循环没有按照我预期的方式工作,这让我停下了脚步,使其变得更好。
慕勒3428872
相关分类