我的 for 循环与 yield 相结合的问题

我有一个连接由星号分隔的单词的程序。该程序删除星号并将单词的第一部分(星号之前的部分)与其第二部分(星号之后的部分)连接起来。除了一个主要问题外,它运行良好:第二部分(星号之后)仍在输出中。例如,程序连接了 ['presi', '*', 'dent'],但 'dent' 仍在输出中。我没有弄清楚我的代码哪里有问题。代码如下:


from collections import defaultdict

import nltk

from nltk.tokenize import word_tokenize

import re

import os

import sys

from pathlib import Path



def main():

    while True:

        try:

            file_to_open =Path(input("\nPlease, insert your file path: "))


            with open(file_to_open) as f:

                words = word_tokenize(f.read().lower())

                break

        except FileNotFoundError:

            print("\nFile not found. Better try again")

        except IsADirectoryError:

            print("\nIncorrect Directory path.Try again")


    word_separator = '*'


    with open ('Fr-dictionary2.txt') as fr:

            dic = word_tokenize(fr.read().lower())


    def join_asterisk(ary):


        for w1, w2, w3 in zip(words, words[1:], words[2:]):

            if w2 == word_separator:

                word = w1 + w3

                yield (word, word in dic)

            elif w1 != word_separator and w1 in dic:

                yield (w1, True)



    correct_words = []

    incorrect_words = []

    correct_words = [w for w, correct in join_asterisk(words) if correct]

    incorrect_words = [w for w, correct in join_asterisk(words) if not correct]

    text=' '.join(correct_words)


我想知道是否有人可以帮我检测这里的错误?


输入示例:


共和国总统*的承诺也是铁路公司领导人的承诺,他争论Elysee Palace的Grand-Est会议上的各种官员。


2017 年 7 月 1 日,共和国总统埃马纽埃尔·马克龙(右)与法国国营铁路公司的老板纪尧姆·佩皮在巴黎蒙帕纳斯车站。GEOFFROY VAN DER HASSELT / 法新社


SNCF 的用户有时会因火车取消或服务中断而感到恼火,这似乎也影响了共和国总统。作为大辩论的一部分,埃马纽埃尔·马克龙 (Emmanuel Macron) 于 2 月 26 日星期二在爱丽舍宫 (Elysee Palace) 的民选官员面前,在 12 月 23 日关闭了 Saint-Dié - Epinal 线路的 SNCF 发表了非常严厉的言论, 2018 年,而国家元首在 2018 年 4 月在孚日进行的迁移期间承诺,它将继续运营。



慕丝7291255
浏览 356回答 2
2回答

慕的地8271018

这两个额外的词(我假设)都在您的字典中,因此在 for 循环的 2 次迭代后第二次产生,因为它们在行中遇到这种情况w1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; elif w1 != word_separator and w1 in dic:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield (w1, True)重新设计你的join_asterisk函数似乎是最好的方法,因为任何试图修改这个函数来跳过这些的尝试都是非常笨拙的。以下是重新设计函数的一种方法,以便您可以跳过已包含在由“*”分隔的单词的后半部分的单词:incorrect_words = []def join_asterisk(array):&nbsp; &nbsp; ary = array + ['', '']&nbsp; &nbsp; i, size = 0, len(ary)&nbsp; &nbsp; while i < size - 2:&nbsp; &nbsp; &nbsp; &nbsp; if ary[i+1] == word_separator:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if ary[i] + ary[i+2] in dic:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield ary[i] + ary[i+2]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; incorrect_words.append(ary[i] + ary[i+2])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i+=2&nbsp; &nbsp; &nbsp; &nbsp; elif ary[i] in dic:&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield ary[i]&nbsp; &nbsp; &nbsp; &nbsp; i+=1如果您希望它更接近您的原始功能,可以将其修改为:def join_asterisk(array):&nbsp; &nbsp; ary = array + ['', '']&nbsp; &nbsp; i, size = 0, len(ary)&nbsp; &nbsp; while i < size - 2:&nbsp; &nbsp; &nbsp; &nbsp; if ary[i+1] == word_separator:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; concat_word = ary[i] + ary[i+2]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield (concat_word, concat_word in dic)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i+=2&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield (ary[i], ary[i] in dic)&nbsp; &nbsp; &nbsp; &nbsp; i+=1

撒科打诨

我认为这种替代实现join_asterisk符合您的意图:def join_asterisk(words, word_separator):&nbsp; &nbsp; if not words:&nbsp; &nbsp; &nbsp; &nbsp; return&nbsp; &nbsp; # Whether the previous word was a separator&nbsp; &nbsp; prev_sep = (words[0] == word_separator)&nbsp; &nbsp; # Next word to yield&nbsp; &nbsp; current = words[0] if not prev_sep else ''&nbsp; &nbsp; # Iterate words&nbsp; &nbsp; for word in words[1:]:&nbsp; &nbsp; &nbsp; &nbsp; # Skip separator&nbsp; &nbsp; &nbsp; &nbsp; if word == word_separator:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; prev_sep = True&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # If neither this or the previous were separators&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if not prev_sep:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Yield current word and clear&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield current&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; current = ''&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Add word to current&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; current += word&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; prev_sep = False&nbsp; &nbsp; # Yield last word if list did not finish with a separator&nbsp; &nbsp; if not prev_sep:&nbsp; &nbsp; &nbsp; &nbsp; yield currentwords = ['les', 'engagements', 'du', 'prési', '*', 'dent', 'de', 'la', 'républi', '*', 'que', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']word_separator = '*'print(list(join_asterisk(words, word_separator)))# ['les', 'engagements', 'du', 'président', 'de', 'la', 'république', 'sont', 'aussi', 'ceux', 'des', 'dirigeants', 'de', 'la', 'société', 'ferroviaire']
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python