删除包含相同字母超过3次的行

我有这段代码,如果行 3 次包含相同的字母,则删除行。如果字母重复超过 3 次(分隔),我需要它来删除该行。


3次(分开)我的意思是例如这一行BAABAAG。请注意,这一行包含该字母A四次,但我的代码没有删除它,因为这四个字母A并不彼此相邻。


bad_words = ['AAA','BBB','CCC','DDD','EEE','FFF','GGG','HHH','III','JJJ','KKK','LLL','MMM','NNN','OOO','PPP','QQQ','RRR','SSS','TTT','UUU','VVV','WWW','XXX','YYY','ZZZ','111','222','333','444','555','666','777','888','999','000']


with open('7.csv') as oldfile, open('new7.csv', 'w') as newfile:

    for line in oldfile:

        if not any(bad_word in line for bad_word in bad_words):

            newfile.write(line)

文件样本:


BAABAAB

BAABAAC

BAABAAD

BAABAAE

BAABAAF

BAABAAG

BAABAAH

BAABAAI

BAABAAJ

BAABAAK

BAABAAL

BAABAAM

BAABAAN

BAABAAO

BAABAAP

BAABAAQ


30秒到达战场
浏览 4184回答 4
4回答

慕婉清6462132

无需显式创建bad_words列表,您repeater也可以将其设置为变量repeater = 3newlist = []with open('input.txt') as f:        x = f.readlines()    for val in x:        word = val.split('\n')[0]        flag = True        for letter in word:            if letter.upper() * repeater in word:                flag = False                break        if flag:            newlist.append(word)    newlist = list(set(newlist))with open('output.txt', mode='w', encoding='utf-8') as newfile:    for value in newlist:        newfile.writelines(value+"\n")

四季花海

您可以创建一个函数来检查某个字符是否出现超过 3 次,然后在代码中调用它:def letter_count(str):    counts = dict()    for l in str:        if l in counts:            counts[l] += 1        else:            counts[l] = 1    counts[max(counts, key=lambda x : counts[x])]    return counts[max(counts, key=lambda x : counts[x])] > 3并在您的代码中这样调用它:with open('7.csv') as oldfile, open('new7.csv', 'w') as newfile:    for line in oldfile:        if if(letter_count(line)):            newfile.write(line)

慕妹3242003

您可以使用 aCounter检查每行中不同字母的频率,然后仅在它们未通过阈值时才写入此行:from collections import Counterthreshold = 3with open('7.csv') as oldfile, open('new7.csv', 'w') as newfile:    for line in oldfile:        counts = Counter(line)        if all(count < threshold for count in counts.values()):            newfile.write(line)这使用该all()函数来确保没有字母超过阈值。

神不在的星期二

使用单个字符而不是三元组和 的列表string.count()。制作一个小函数来封装过滤逻辑可能也是一个不错的选择。def f(line, chars, limit):&nbsp; &nbsp; for char in chars:&nbsp; &nbsp; &nbsp; &nbsp; if line.count(char) > limit:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return False&nbsp; &nbsp; return Truebad_chars = ['A','B', ...]with open('7.csv', 'r') as oldfile, open('new7.csv', 'w') as newfile:&nbsp; &nbsp; for line in oldfile:&nbsp; &nbsp; &nbsp; &nbsp; if f(line, bad_chars, 3):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; newfile.write(line)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python