猿问

是否可以通过列表中的单词搜索 txt 文件并返回上面的行?

我有一个带有句子的 txt 文件,并且能够从其中的列表中找到单词。我想将“找到的行”上方的行打印到单独的列表中。我用下面的代码尝试过,但这只会返回[].


这是我的代码:


fname_in = "test.txt"

lv_pos = []

search_list = ['word1', 'word2']


with open (fname_in, 'r') as f:

    file_l1 = [line.split('\n') for line in f.readlines()]

    counter = 0


    for word in search_list:

        if word in file_l1:

            l_pos.append(file_l1[counter - 1])


    counter += 1


print(l_pos)

文本文件看起来像这样:


Bla bla bla

I want this line1.

I found this line with word1.

Bla bla bla

I want this line2.

I found this line with word2.

我想要的结果是:


l_pos = ['I want this line1.','I want this line2.']


慕盖茨4494581
浏览 150回答 3
3回答

jeck猫

首先,您的代码中有一些拼写错误——在您编写的某些地方l_pos和其他地方,lv_pos.另一个问题是我认为你没有意识到这file_l1是一个列表列表,所以if word in file_l1:它没有按照你的想法去做。您需要word根据这些子列表中的每一个检查每个。这是一些基于您自己的工作代码:fname_in = "simple_test.txt"l_pos = []search_list = ['word1', 'word2']with open(fname_in) as f:    lines = f.read().splitlines()    for i, line in enumerate(lines):        for word in search_list:            if word in line:                l_pos.append(lines[i - 1])print(l_pos)  # -> ['I want this line1.', 'I want this line2.']更新这是另一种方法,不需要一次将整个文件读入内存,因此不需要那么多内存:from collections import dequefname_in = "simple_test.txt"l_pos = []search_list = ['word1', 'word2']with open(fname_in) as file:    lines = (line.rstrip('\n') for line in file)  # Generator expression.    try:  # Create and initialize a sliding window.        sw = deque(next(lines), maxlen=2)    except StopIteration:  # File with less than 1 line.        pass    for line in lines:        sw.append(line)        for word in search_list:            if word in sw[1]:                l_pos.append(sw[0])print(l_pos)  # -> ['I want this line1.', 'I want this line2.']

森林海

在您的示例的第二行中,您编写了lv_pos而不是l_pos. 在with声明中,您可以像这样修复它,我认为:fname_in = "test.txt"l_pos = []search_list = ['word1', 'word2']file_l1 = f.readlines()for line in range(len(file_l1)):    for word in search_words:        if word in file_l1[line].split(" "):            l_pos.append(file_l1[line - 1])print(l_pos)我对这个解决方案并不感到兴奋,但我认为它可以通过最少的修改来修复您的代码。

饮歌长啸

将文件视为成对的line和lines-before的集合:[prev for prev,this in zip(lines, lines[1:])                     if 'word1' in this or 'word2' in this]#['I want this line1.', 'I want this line2.']这种方法可以扩展到涵盖任意数量的单词:words = {'word1', 'word2'}[prev for prev,this in zip(lines,lines[1:])            if any(word in this for word in words)]#['I want this line1.', 'I want this line2.']最后,如果您关心正确的单词而不是出现次数(如"thisisnotword1"),您应该正确地标记行,例如nltk.word_tokenize():from nltk import word_tokenize[prev for prev,this in zip(lines,lines[1:])            if words & set(word_tokenize(this))]#['I want this line1.', 'I want this line2.']
随时随地看视频慕课网APP

相关分类

Python
我要回答