我正在使用 Python 在期刊中创建关键字词云。我遇到的问题是我不希望关键字中的单独单词被拆分,而是一起考虑。我设法通过用 替换空格字符来做到这一点' ','_'但现在的问题是我得到的最终图像当然有下划线字符。这是代码:
import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import matplotlib.pyplot as plt
# Load in the dataframe
df = pd.read_csv("input/cmame_0.csv")
l = df['Author Keywords'].str.split(';', expand=False).tolist()
text = ';'.join([item for sublist in l if isinstance(sublist,list) for item in sublist])
text = text.replace(" ", "_")
stopwords = set(STOPWORDS)
# Create and generate a word cloud image:
wordcloud = WordCloud(stopwords=stopwords,
max_font_size=50,
max_words=100,
background_color="white").generate(text)
# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
产生
我可以在这里使用一个正则表达式,但我似乎找不到正确的正则表达式。
慕姐8265434
跃然一笑
相关分类