我是编程的初学者,所以我决定参加CS50课程。在问题集6(Python)中,我编写了代码,它适用于小型数据库,但对于大型数据库却失败了,所以我只寻求有关该想法的帮助。这是课程页面,您可以在此处下载(从Google云端硬盘)
我的代码
import csv
from sys import argv
class DnaTest(object):
"""CLASS HELP: the DNA test, simply give DNA sequence to the program, and it searches in the database to
determine the person who owns the sample.
type the following in cmd to run the program:
python dna.py databases/small.csv sequences/1.txt """
def __init__(self):
# get filename from the command line without directory names "database" and "sequence"
self.sequence_argv = str(argv[2][10:])
self.database_argv = str(argv[1][10:])
# Automatically open and close the database file
with open(f"databases/{self.database_argv}", 'r') as database_file:
self.database_file = database_file.readlines()
# Automatically open and close the sequence file
with open(f"sequences/{self.sequence_argv}", 'r') as sequence_file:
self.sequence_file = sequence_file.readline()
# Read CSV file as a dictionary, function: compare_database_with_sequence()
self.csv_database_dictionary = csv.DictReader(self.database_file)
# Read CSV file to take the first row, function: get_str_list()
self.reader = csv.reader(self.database_file)
# computed dictionary from the sequence file
self.dict_from_sequence = {}
# returns the first row of the CSV file (database file)
def get_str_list(self):
# get first row from CSV file
self.keys = next(self.reader)
# remove 'name' from list, get STR only.
self.keys.remove("name")
return self.keys
问题是
在函数i使用计数,它是工作,但对于顺序序列,在序列文件中(示例5.txt),所需的序列是非序列的,我无法比较每个连续序列的数量。我搜索了一下,但我没有找到任何简单的东西。有些人使用正则表达式模块,有些人使用re模块,我还没有找到解决方案。get_str_count_from_sequence(self):
DIEA
浮云间
相关分类