开满天机
此后提出了使用正则表达式的部分解决方案。请注意,以下解决方案不适用于使用正则表达式,而是查找长度为 6 或更多的任何字符的任何序列。测试数据:number,sequence,status1,kjhfklashfkldflkhasdfl,02,aaaaaljgkldfkjgldkfjgfldj,03,bbbbbbjigdfsjgjg,04,ccCccCCcjjfijsdfjsdf,05,klsjdflsjdfhdddddjnjlkhngjk,06,kjkljfhnlasjkdfheeeeeeejjjeeeeeeeeeekjdkljfleeef,07,jhfshffFffFFFFffkljjjj908u89,0查找长度为 6 或更大的 MNR 的代码:import csvdef contains_mnr(sequence): start_char = "$" # choose a character that is sure not to be in the sequence count = 0 seq_lower = sequence.lower() for pos in range(0, len(seq_lower)): if seq_lower[pos] == start_char: count += 1 else: start_char = seq_lower[pos] count = 1 if count >= 6: return True return Falsewith open("input.csv", "r") as input_file: with open("output.csv", "w") as output_file: reader = csv.DictReader(input_file, dialect=csv.unix_dialect()) writer = csv.writer(output_file, dialect=csv.unix_dialect()) writer.writerow(reader.fieldnames) for row in reader: if contains_mnr(row["sequence"]): writer.writerow([ row["number"], row["sequence"], row["status"] ])请注意,可能需要根据运行代码和生成数据文件的系统调整 CSV 方言。输出上面给出的测试数据:"number","sequence","status""3","bbbbbbjigdfsjgjg","0""4","ccCccCCcjjfijsdfjsdf","0""6","kjkljfhnlasjkdfheeeeeeejjjeeeeeeeeeekjdkljfleeef","0""7","jhfshffFffFFFFffkljjjj908u89","0"