我有数据框,我需要使用 Python 查找前 20 个重复的句子,请让我知道如何去做
Column A
Hello How are you?
This ticket is not valid
How are things at you end?
Hello How are you?
How can I help you?
Please help me with tickets
This ticket is not valid
Hello How are you?
预期产出
Column A Frequency of Repeated sentence
Hello How are you? 3
This ticket is not valid 2
How can I help you? 1
.
.
.
到目前为止的代码
df = pd.read_csv("C:\\Users\\aaa\\abc\\Analysis\\chat.csv", encoding="ISO-8859-1")
df['word_count'] = df['Column A'].apply(lambda x: len(str(x).split(" ")))
df[['Column A','word_count']].head()
for i, g in df.groupby('Column A'):
print ('Frequency of repeating sentence : {}'.format(g['Column A'].duplicated(keep=False).sum()))
我需要一个数据框中的结果,该数据框可以在最终结果中使用“A 列”和“频率”列写入 CSV
郎朗坤
隔江千里
慕的地8271018
慕哥9229398
相关分类