如何为句子列表创建窗口/块?

我有一句话的名单,我想创建skipgram(window size = 3)但我DONT希望跨句子柜台跨度,因为他们都无关。


所以,如果我有以下句子:


[["my name is John"] , ["This PC is black"]]

三胞胎将是:


[my name is]

[name is john]

[this PC is]

[PC is black]

最好的方法是什么?


万千封印
浏览 190回答 3
3回答

一只斗牛犬

这是一个简单的功能来做到这一点。def skipgram(corpus, window_size = 3):&nbsp; &nbsp; sg = []&nbsp; &nbsp; for sent in corpus:&nbsp; &nbsp; &nbsp; &nbsp; sent = sent[0].split()&nbsp; &nbsp; &nbsp; &nbsp; if len(sent) <= window_size:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sg.append(sent)&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for i in range(0, len(sent)-window_size+1):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sg.append(sent[i: i+window_size])&nbsp; &nbsp; return sgcorpus = [["my name is John"] , ["This PC is black"]]skipgram(corups)

人到中年有点甜

你并不是真的想要一个skipgram本身,但你想要一个按大小划分的块,试试这个:from lazyme import per_chunktokens = "my name is John".split()list(per_chunk(tokens, 2))[出去]:[('my', 'name'), ('is', 'John')]如果你想要一个滚动窗口,即ngrams:from lazyme import per_windowtokens = "my name is John".split()list(per_window(tokens, 2))[出去]:[('my', 'name'), ('name', 'is'), ('is', 'John')]同样在 ngrams 的 NLTK 中:from nltk import ngramstokens = "my name is John".split()list(ngrams(tokens, 2))[出去]:[('my', 'name'), ('name', 'is'), ('is', 'John')]如果你想要实际的skipgrams,如何在python中计算skipgrams?from nltk import skipgramstokens = "my name is John".split()list(skipgrams(tokens, n=2, k=1))[出去]:[('my', 'name'),&nbsp;('my', 'is'),&nbsp;('name', 'is'),&nbsp;('name', 'John'),&nbsp;('is', 'John')]

慕村225694

尝试这个!from nltk import ngramsdef generate_ngrams(sentences,window_size =3):&nbsp; &nbsp; for sentence in sentences:&nbsp; &nbsp; &nbsp; &nbsp; yield from ngrams(sentence[0].split(), window_size)sentences= [["my name is John"] , ["This PC is black"]]for c in generate_ngrams(sentences,3):&nbsp; &nbsp; print (c)#output:('my', 'name', 'is')('name', 'is', 'John')('This', 'PC', 'is')('PC', 'is', 'black')
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python