只读取文本文件中的完整单词（词法分析仅检测整个单词）的python代码是什么

5回答

FFIVE

您应该使用spacy来标记您的列表，因为自然语言往往很棘手，包括所有例外情况和不包括在内：from spacy.lang.en import Englishnlp = English()# Create a Tokenizer with the default settings for English# including punctuation rules and exceptionstokenizer = nlp.Defaults.create_tokenizer(nlp)txt = f.readlines()line += 1for txt_line in txt: [print(f'Word {word} found at line {line}; pos: {txt.index(word)}') for word in nlp(txt)]或者，您可以通过以下方式使用textblob ：# from textblob import TextBlobtxt = f.readlines()blob = TextBlob(txt)for index, word in enumerate(list(blob.words)): line = line + 1 print(f'Word {word.text} found in position {index} at line {line}')

0 0

噜噜哒

用于nltk以可靠的方式标记您的文本。另外，请记住文本中的单词可能会混合大小写。在搜索之前将它们转换为小写。import nltk words = nltk.word_tokenize(txt.lower())

0 0

狐的传说

一般的正则表达式，以及\b具体的术语，意思是“单词边界”，是我将单词与其他任意字符分开的方式。这是一个例子：import re # words with arbitrary characters in betweendata = """now is;  the time for, all-good-mento come\t to the, aid of their... country"""exp = re.compile(r"\b\w+")pos = 0while True:    m = exp.search(data, pos)    if not m:        break    print(m.group(0))    pos = m.end(0)结果：nowisthetimeforallgoodmentocometotheaidoftheircountry

0 0

倚天杖

您可以使用正则表达式：import rewords_to_find = ["test1", "test2", "test3"] # converted this to a list to use `in`line = 0with open("User_Input.txt", "r") as f: txt = f.readline() line += 1 rx = re.findall('(\w+)', txt) # rx will be a list containing all the words in `txt` # you can iterate for every word in a line for word in rx: # for every word in the RegEx list if word in words_to_find: print(word) # or you can iterate through your search case only # note that this will find only the first occurance of each word in `words_to_find` for word in words_to_find # `test1`, `test2`, `test3`... if word in rx: print(word) # if `test1` is present in this line's list of words...上面的代码的作用是将(\w+)正则表达式应用于您的文本字符串并返回匹配列表。在这种情况下，正则表达式将匹配任何由空格分隔的单词。

0 0

慕容森

如果您尝试在文本文件中查找单词 test1、test2 或 test3，则不需要手动增加行值。假设文本文件中的每个单词都在单独的行上，则以下代码有效words_to_find = ("test1", "test2", "test3")file = open("User_Input.txt", "r").readlines()for line in file:    txt = line.strip('\n')    for word in words_to_find:        if word in txt:            print(F"Word: '{word}' found at line {file.index(line)+1}, "F"pos: {txt.index(word)}")我不知道立场意味着什么。

0 0