我在用Python 3.6.8
我有一个文本文件,例如-
###
books 22 feb 2017 21 april 2018
books 22 feb 2017 21
22 feb 2017 21 april
feb 2017 21 april 2018
$$$
###
risk true stories people never thought they d dare share
risk true stories people never
true stories people never thought
stories people never thought they
people never thought they d
never thought they d dare
thought they d dare share
$$$
###
everyone hanging out without me mindy kaling non fiction
everyone hanging out without me
hanging out without me mindy
out without me mindy kaling
without me mindy kaling non
me mindy kaling non fiction
$$$
我们使用 -
for line_no, line in enumerate(books):
tokens = line.split(" ")
output = list(ngrams(tokens, 5))
booksWithNGrams.append("###") #Adding start of block
booksWithNGrams.append(books[line_no]) # Adding original line
for x in output: # Adding n-grams
booksWithNGrams.append(' '.join(x))
booksWithNGrams.append("$$$") # Adding end of block
如您所见,一个带有 n-gram 的句子以 . 开头###和结尾$$$。因此,块的开始和结束是明确定义的。
给定一个句子,我想删除一个块。例如 - 如果我输入22 feb 2017 21 april,我想删除 -
###
books 22 feb 2017 21 april 2018
books 22 feb 2017 21
22 feb 2017 21 april
feb 2017 21 april 2018
$$$
我怎样才能做到这一点?
catspeake
相关分类