猿问

将列折叠/连接/聚合到每个组中的单个逗号分隔字符串

将列折叠/连接/聚合到每个组中的单个逗号分隔字符串

我想根据两个分组变量聚合数据框中的一列,并用逗号分隔各个值。


这是一些数据:


data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = rep(1:2, 3), C = c(5:10))

data

#     A B  C

# 1 111 1  5

# 2 111 2  6

# 3 111 1  7

# 4 222 2  8

# 5 222 1  9

# 6 222 2 10    

“A”和“B”是分组变量,“C”是我想要折叠成逗号分隔character字符串的变量。我试过了:


library(plyr)

ddply(data, .(A,B), summarise, test = list(C))


    A B  test

1 111 1  5, 7

2 111 2     6

3 222 1     9

4 222 2 8, 10

但是当我试图将测试列转换为character它时,它变成了这样:


ddply(data, .(A,B), summarise, test = as.character(list(C)))

#     A B     test

# 1 111 1  c(5, 7)

# 2 111 2        6

# 3 222 1        9

# 4 222 2 c(8, 10)

如何保留character格式并用逗号分隔?例如,第1行应该只是"5,7",而不是c(5,7)。


慕后森
浏览 770回答 3
3回答

隔江千里

这里有一些选项使用toString,一个很好的实用程序函数,用逗号连接字符串。如果你不希望逗号,你可以使用paste()与collapse参数来代替。data.table#&nbsp;alternative&nbsp;using&nbsp;data.tablelibrary(data.table)as.data.table(data)[,&nbsp;toString(C),&nbsp;by&nbsp;=&nbsp;list(A,&nbsp;B)]aggregate这不使用包:#&nbsp;alternative&nbsp;using&nbsp;aggregate&nbsp;from&nbsp;the&nbsp;stats&nbsp;package&nbsp;in&nbsp;the&nbsp;core&nbsp;of&nbsp;Raggregate(C&nbsp;~.,&nbsp;data,&nbsp;toString)sqldf以下是group_concat使用sqldf包使用SQL函数的替代方法:library(sqldf)sqldf("select&nbsp;A,&nbsp;B,&nbsp;group_concat(C)&nbsp;C&nbsp;from&nbsp;data&nbsp;group&nbsp;by&nbsp;A,&nbsp;B",&nbsp;method&nbsp;=&nbsp;"raw")dplyr甲dplyr替代:library(dplyr)data&nbsp;%>% &nbsp;&nbsp;group_by(A,&nbsp;B)&nbsp;%>% &nbsp;&nbsp;summarise(test&nbsp;=&nbsp;toString(C))&nbsp;%>% &nbsp;&nbsp;ungroup()plyr#&nbsp;plyrlibrary(plyr)ddply(data,&nbsp;.(A,B),&nbsp;summarize,&nbsp;C&nbsp;=&nbsp;toString(C))

Helenr

改变放置位置as.character:> out <- ddply(data, .(A, B), summarise, test = list(as.character(C)))> str(out)'data.frame':&nbsp; &nbsp;4 obs. of&nbsp; 3 variables:&nbsp;$ A&nbsp; &nbsp;: num&nbsp; 111 111 222 222&nbsp;$ B&nbsp; &nbsp;: int&nbsp; 1 2 1 2&nbsp;$ test:List of 4&nbsp; ..$ : chr&nbsp; "5" "7"&nbsp; ..$ : chr "6"&nbsp; ..$ : chr "9"&nbsp; ..$ : chr&nbsp; "8" "10"> out&nbsp; &nbsp; A B&nbsp; test1 111 1&nbsp; 5, 72 111 2&nbsp; &nbsp; &nbsp;63 222 1&nbsp; &nbsp; &nbsp;94 222 2 8, 10但请注意,每个项目实际上仍然是一个单独的字符,而不是单个字符串。也就是说,这不是一个看起来像“5,7”的实际字符串,而是两个字符“5”和“7”,R在它们之间用逗号显示。与以下内容比较:> out2 <- ddply(data, .(A, B), summarise, test = paste(C, collapse = ", "))> str(out2)'data.frame':&nbsp; &nbsp;4 obs. of&nbsp; 3 variables:&nbsp;$ A&nbsp; &nbsp;: num&nbsp; 111 111 222 222&nbsp;$ B&nbsp; &nbsp;: int&nbsp; 1 2 1 2&nbsp;$ test: chr&nbsp; "5, 7" "6" "9" "8, 10"> out&nbsp; &nbsp; A B&nbsp; test1 111 1&nbsp; 5, 72 111 2&nbsp; &nbsp; &nbsp;63 222 1&nbsp; &nbsp; &nbsp;94 222 2 8, 10基础R中的可比解决方案当然是aggregate:> A1 <- aggregate(C ~ A + B, data, function(x) c(as.character(x)))> str(A1)'data.frame':&nbsp; &nbsp;4 obs. of&nbsp; 3 variables:&nbsp;$ A: num&nbsp; 111 222 111 222&nbsp;$ B: int&nbsp; 1 1 2 2&nbsp;$ C:List of 4&nbsp; ..$ 0: chr&nbsp; "5" "7"&nbsp; ..$ 1: chr "9"&nbsp; ..$ 2: chr "6"&nbsp; ..$ 3: chr&nbsp; "8" "10"> A2 <- aggregate(C ~ A + B, data, paste, collapse = ", ")> str(A2)'data.frame':&nbsp; &nbsp;4 obs. of&nbsp; 3 variables:&nbsp;$ A: num&nbsp; 111 222 111 222&nbsp;$ B: int&nbsp; 1 1 2 2&nbsp;$ C: chr&nbsp; "5, 7" "9" "6" "8, 10"

森林海

这是stringr/ tidyverse解决方案:library(tidyverse)library(stringr)data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = rep(1:2, 3), C = c(5:10))data %>%&nbsp;group_by(A, B) %>%&nbsp;summarize(text = str_c(C, collapse = ", "))# A tibble: 4 x 3# Groups:&nbsp; &nbsp;A [2]&nbsp; &nbsp; &nbsp; A&nbsp; &nbsp; &nbsp;B test&nbsp;&nbsp; <dbl> <int> <chr>1&nbsp; &nbsp;111&nbsp; &nbsp; &nbsp;1 5, 7&nbsp;2&nbsp; &nbsp;111&nbsp; &nbsp; &nbsp;2 6&nbsp; &nbsp;&nbsp;3&nbsp; &nbsp;222&nbsp; &nbsp; &nbsp;1 9&nbsp; &nbsp;&nbsp;4&nbsp; &nbsp;222&nbsp; &nbsp; &nbsp;2 8, 10
随时随地看视频慕课网APP
我要回答