对于当前的项目,我计划将 Pandas DataFrame 分组为stock_symbol第一标准和quarter第二标准。
从其他线程中,我已经看到类似的结构group_data = df.groupby(['stock_symbol', 'quarter'])可能是这一点的可能解决方案。在给定的情况下,我只收到终端输出<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11fdcbf10>。
有没有人发现我这条线的思维错误?相关代码部分如下所示:
# Datetime conversion
df['date'] = pd.to_datetime(df['date'])
# Adding of 'Quarter' column
df['quarter'] = df['date'].dt.to_period('Q')
# Grouping both the Stock Symbol and the Quarter column
group_data = df.groupby(['stock_symbol', 'quarter'])
print(group_data)
在操作中要调用的函数突出显示如下:
# Word frequency analysis
def get_top_n_bigram(corpus, n=None):
vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)
bag_of_words = vec.transform(corpus)
sum_words = bag_of_words.sum(axis=0)
words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
return words_freq[:n]
慕斯王
相关分类