如何在文件中查找重复句子的频率

首页课程实战体系课手记专栏慕课教程

如何在文件中查找重复句子的频率

我有数据框，我需要使用 Python 查找前 20 个重复的句子，请让我知道如何去做

Column A

Hello How are you?

This ticket is not valid

How are things at you end?

Hello How are you?

How can I help you?

Please help me with tickets

This ticket is not valid

Hello How are you?

预期产出

Column A Frequency of Repeated sentence

Hello How are you? 3

This ticket is not valid 2

How can I help you? 1

到目前为止的代码

df = pd.read_csv("C:\\Users\\aaa\\abc\\Analysis\\chat.csv", encoding="ISO-8859-1")

df['word_count'] = df['Column A'].apply(lambda x: len(str(x).split(" ")))

df[['Column A','word_count']].head()

for i, g in df.groupby('Column A'):

print ('Frequency of repeating sentence : {}'.format(g['Column A'].duplicated(keep=False).sum()))

我需要一个数据框中的结果，该数据框可以在最终结果中使用“A 列”和“频率”列写入 CSV

临摹微笑

浏览 202回答 4

4回答

郎朗坤

这是一种使用方法.value_counts：df['ColumnA'].value_counts()要将其添加为列，您可以执行以下操作：df['Frequency'] = df['ColumnA'].map(df['ColumnA'].value_counts())

0 0

隔江千里

尝试这个：df['count']=df.groupby(['ColumnA'] ).count()df.sort_values(by='count', ascending=False)print(df.head(20))

0 0

慕的地8271018

df['count'] = df.groupby('Sentence')['Sentence'].transform('count')df = df.sort_values(by = 'count', ascending = False)df.head(20)这将在原始数据框中添加一列“计数”，其中将包含相应句子的频率。transform()返回与原始数据框对齐的系列。

0 0

慕哥9229398

0 0

随时随地看视频慕课网APP