慕哥6287543
使用aggregate:aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum) Category x1 First 302 Second 53 Third 34在上面的示例中,可以在中指定多个维度list。可以通过cbind以下方式合并相同数据类型的多个聚合度量标准:aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...(嵌入@thelatemail评论),aggregate也有一个公式界面aggregate(Frequency ~ Category, x, sum)或者,如果要聚合多个列,可以使用.表示法(也适用于一列)aggregate(. ~ Category, x, sum)或者tapply:tapply(x$Frequency, x$Category, FUN=sum) First Second Third 30 5 34 使用此数据:x <- data.frame(Category=factor(c("First", "First", "First", "Second", "Third", "Third", "Second")), Frequency=c(10,15,5,2,14,20,3))
慕工程0101907
rcs提供的答案很简单。但是,如果您正在处理更大的数据集并需要提高性能,那么可以采用更快的替代方案:library(data.table)data = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"), Frequency=c(10,15,5,2,14,20,3))data[, sum(Frequency), by = Category]# Category V1# 1: First 30# 2: Second 5# 3: Third 34system.time(data[, sum(Frequency), by = Category] )# user system elapsed # 0.008 0.001 0.009 让我们使用data.frame和上面的内容将它与同一个东西进行比较:data = data.frame(Category=c("First","First","First","Second","Third", "Third", "Second"), Frequency=c(10,15,5,2,14,20,3))system.time(aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum))# user system elapsed # 0.008 0.000 0.015 如果你想保留列,这就是语法:data[,list(Frequency=sum(Frequency)),by=Category]# Category Frequency# 1: First 30# 2: Second 5# 3: Third 34对于较大的数据集,差异将变得更加明显,如下面的代码所示:data = data.table(Category=rep(c("First", "Second", "Third"), 100000), Frequency=rnorm(100000))system.time( data[,sum(Frequency),by=Category] )# user system elapsed # 0.055 0.004 0.059 data = data.frame(Category=rep(c("First", "Second", "Third"), 100000), Frequency=rnorm(100000))system.time( aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum) )# user system elapsed # 0.287 0.010 0.296 对于多个聚合,您可以组合lapply并按.SD如下方式进行组合data[, lapply(.SD, sum), by = Category]# Category Frequency# 1: First 30# 2: Second 5# 3: Third 34