从python中的单词列表中查找最长的常用单词序列

我搜索了很多解决方案,我确实发现了类似的问题。此答案返回可能不属于输入列表中所有字符串的最长字符序列。此答案返回必须属于输入列表中所有字符串的最长公共 WORDS 序列。

我正在寻找上述解决方案的组合。也就是说,我想要可能不会出现在输入列表的所有单词/短语中的最长的常见单词序列。

以下是预期的一些示例:

['exterior lighting', 'interior lighting']-->'lighting'

['ambient lighting', 'ambient light']-->'ambient'

['led turn signal lamp', 'turn signal lamp', 'signal and ambient lamp', 'turn signal light']-->'turn signal lamp'

['ambient lighting', 'infrared light']-->''

谢谢


慕斯王
浏览 182回答 2
2回答

SMILET

此代码还将按列表中最常见的单词对所需列表进行排序。它会计算列表中每个单词的数量,然后剪切只出现一次的单词并对其进行排序。lst=['led turn signal lamp', 'turn signal lamp', 'signal and ambient lamp', 'turn signal light'] d = {}d_words={}for i in lst:    for j in i.split():      if j in d:          d[j] = d[j]+1      else:          d[j]= 1for k,v in d.items():    if v!=1:        d_words[k] = vsorted_words = sorted(d_words,key= d_words.get,reverse = True)print(sorted_words)

波斯汪

一个相当粗略的解决方案,但我认为它有效:from nltk.util import everygramsimport pandas as pddef get_word_sequence(phrases):    ngrams = []    for phrase in phrases:                phrase_split = [ token for token in phrase.split()]        ngrams.append(list(everygrams(phrase_split)))    ngrams = [i for j in ngrams for i in j]  # unpack it        counts_per_ngram_series = pd.Series(ngrams).value_counts()    counts_per_ngram_df = pd.DataFrame({'ngram':counts_per_ngram_series.index, 'count':counts_per_ngram_series.values})    # discard the pandas Series    del(counts_per_ngram_series)    # filter out the ngrams that appear only once    counts_per_ngram_df = counts_per_ngram_df[counts_per_ngram_df['count'] > 1]    if not counts_per_ngram_df.empty:            # populate the ngramsize column        counts_per_ngram_df['ngramsize'] = counts_per_ngram_df['ngram'].str.len()        # sort by ngramsize, ngram_char_length and then by count        counts_per_ngram_df.sort_values(['ngramsize', 'count'], inplace = True, ascending = [False, False])        # get the top ngram        top_ngram = " ".join(*counts_per_ngram_df.head(1).ngram.values)        return top_ngram    return ''
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python