使用 pandas 数据集按功能分组。Cronbach 的 alpha 与 Python 中的分组数

假设我有一个sim_data包含 16 个变量的数据集 ( ),其中包括心理数据(问卷中的 15 个项目),第一列是分类变量(国家/地区)。

我可以使用以下方法轻松按组获取方法/标准差:

sim_data.groupby("country").describe()

但是,我想将来自特定包的函数(Cronbach 的 alpha)(pip install pingouin和( )与此数据一起应用,并按组import pingouin as pg获取结果(就像我之前所做的那样)。以下代码不起作用。

pg.cronbach_alpha(sim_data.groupby("country"))

这一个都不是

sim_data.groupby('country').apply(lambda grp: pg.cronbach_alpha())

重要笔记:

如果你想重现,那是我的例程和数据集

我是一个重度 R 用户,我正在将以下代码翻译成 python


sim_data %>%

  select('step_bfi1_ab_cor':'step_bfi39_ab_cor', "country") %>%

  nest(-country) %>%

  mutate(result=map(data, ~psych::alpha(.)$total)) %>%

  select(country,result) %>%

  unnest()

欢迎提出建议。如果有另一种方法(更优雅)来解决我的问题,请告诉我。谢谢


蝴蝶不菲
浏览 177回答 2
2回答

慕婉清6462132

通常,base R(不是 tidy R)更容易转换为 Python Pandas。您的 R 代码似乎在做的是按国家/地区列对数据框进行子集化,并将每个子集运行到psych::alpha().&nbsp;然后将提取的统计数据返回到带有国家指标的数据框中。您可以对基数 R 执行完全相同的操作,它可以在列表理解中by转换为 Pandas 。groupby看起来psych::alpha返回的统计数据比pingouin.cronbach_alpha.&nbsp;根据需要在未经测试的代码中调整字段和返回值。基础 R&nbsp;(使用文档)# DEFINE METHODrun_cronbach_alpha <- function(sub) {&nbsp; &nbsp; results <- psych::alpha(sub)$total&nbsp; &nbsp; # RETURNS LIST&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; data.frame(country = sub$country[1],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;raw_alpha = results$raw_alpha,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;std.alpha = results$std.alpha,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;G6 = results$G6,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;average_r = results$average_r,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;median_r = results$median,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;mean = results$mean,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sd = results$sd)&nbsp;}# FILTER COLUMNS IN DATA FRAMEsim_short <- sim_data[c("step_bfi1_ab_cor", ..., "step_bfi39_ab_cor", "country")]&nbsp;# RUN METHOD BY COUNTRY SUBSETS TO RETURN DF LISTresults_df_list <- by(sim_short, sim_short$country, run_cronbach_alpha)# ROW BIND ALL DFs TO SINGLE FINAL DATA FRAMEresults_df <- do.call(rbind.data.frame, results_df_list)Python Pandas&nbsp;(使用文档)# DEFINE METHODdef run_cronbach_alpha(c, sub):&nbsp; &nbsp; results = pg.cronbach_alpha(sub.drop(["country"], axis="columns"))&nbsp; &nbsp; # RETURNS TUPLE&nbsp; &nbsp; return pd.DataFrame({'country': c, 'cronbach_alpha': results[0], index=[0]})# FILTER COLUMNS IN DATA FRAMEsim_short = sim_data.reindex(["step_bfi1_ab_cor", ..., "step_bfi39_ab_cor", "country"],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;axis='columns')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# RUN METHOD BY COUNTRY SUBSETS TO RETURN DF LISTresults_df_list = [run_cronbach_alpha(i, df) for i,df in sim_short.groupby("country")]# CONCATENATE ALL DFs TO SINGLE FINAL DATA FRAMEresults_df = pd.concat(results_df_list)

慕村9548890

def run_cronbach_alpha(c, sub):    results = pg.cronbach_alpha(sub.drop(["country"], axis="columns"))    # RETURNS TUPLE    return pd.DataFrame({'country': c, 'cronbach_alpha': results[0]}, index = ["Result"])                       # RUN METHOD BY COUNTRY SUBSETS TO RETURN DF LISTresults_df_list = [run_cronbach_alpha(i, df) for i,df in sim_data.groupby("country")]results_df_list
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python