按类别分组的句子中最常用的词

首页课程实战体系课手记专栏慕课教程

按类别分组的句子中最常用的词

我正在尝试按类别对 10 个最常用的单词进行分组。我已经看到了这个答案，但我不能完全修改它以获得我想要的输出。

category | sentence

A cat runs over big dog

A dog runs over big cat

B random sentences include words

C including this one

所需的输出：

category | word/frequency

A runs, 2

cat: 2

dog: 2

over: 2

big: 2

B random: 1

C including: 1

由于我的数据框非常大，我只想获得前 10 个最常出现的词。我也看过这个答案

df.groupby('subreddit').agg(lambda x: nltk.FreqDist([w for wordlist in x for w in wordlist]))

但此方法也返回字母数。

萧十郎

浏览 138回答 3

3回答

元芳怎么了

# Split the sentence into Series    df1 = pd.DataFrame(df.sentence.str.split(' ').tolist())# Add category with as not been adding with the splitdf1['category']  = df['category']# Melt the Series corresponding to the splited sentencedf1 = pd.melt(df1, id_vars='category', value_vars=df1.columns[:-1].tolist())# Groupby and count (reset_index will create a column nammed 0)df1 = df1.groupby(['category', 'value']).size().reset_index()# Keep the 10 largests numbers df1 = df1.nlargest(10, 0)

0 0

随时随地看视频慕课网APP