在一次调用中按组对多个变量应用多个汇总函数

在一次调用中按组对多个变量应用多个汇总函数

我有以下数据框架

x <- read.table(text = "  id1 id2 val1 val2
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8", header = TRUE)

我要计算按Id1和id2分组的val1和val2的平均值,同时计算每个Id1-id2组合的行数。我可以分别执行每一项计算:

# calculate meanaggregate(. ~ id1 + id2, data = x, FUN = mean)# count rowsaggregate(. ~ id1 + id2, data = x, FUN = length)

为了在一次调用中进行两次计算,我尝试了

do.call("rbind", aggregate(. ~ id1 + id2, data = x, FUN = function(x) data.frame(m = mean(x), n = length(x))))

但是,我得到了一个错误的输出以及一个警告:

#     m   n# id1 1   2# id2 1   1#     1.5 2#     2   2#     3.5 2#     3   2#     6.5 2#     8   2#     7   2#   
  6   2# Warning message:#   In rbind(id1 = c(1L, 2L, 1L, 2L), id2 = c(1L, 1L, 2L, 2L), val1 = list( :# 
    number of columns of result is not a multiple of vector length (arg 1)

我可以使用plyr包,但是当数据集的大小增加时,我的数据集非常大,而且plyr非常慢(几乎无法使用)。

我怎么用aggregate还是其他函数在一次调用中执行多个计算?


慕尼黑8549860
浏览 599回答 3
3回答

波斯汪

鉴于这一点,在问题中:我可以使用plyr包,但是当数据集的大小增加时,我的数据集非常大,而且plyr非常慢(几乎无法使用)。然后进去data.table (1.9.4+)你可以尝试:> DT&nbsp; &nbsp;id1 id2 val1 val21:&nbsp; &nbsp;a&nbsp; &nbsp;x&nbsp; &nbsp; 1&nbsp; &nbsp; 92:&nbsp; &nbsp;a&nbsp; &nbsp;x&nbsp; &nbsp; 2&nbsp; &nbsp; 43:&nbsp; &nbsp;a&nbsp; &nbsp;y&nbsp; &nbsp; 3&nbsp; &nbsp; 54:&nbsp; &nbsp;a&nbsp; &nbsp;y&nbsp; &nbsp; 4&nbsp; &nbsp; 95:&nbsp; &nbsp;b&nbsp; &nbsp;x&nbsp; &nbsp; 1&nbsp; &nbsp; 76:&nbsp; &nbsp;b&nbsp; &nbsp;y&nbsp; &nbsp; 4&nbsp; &nbsp; 47:&nbsp; &nbsp;b&nbsp; &nbsp;x&nbsp; &nbsp; 3&nbsp; &nbsp; 98:&nbsp; &nbsp;b&nbsp; &nbsp;y&nbsp; &nbsp; 2&nbsp; &nbsp; 8> DT[ , .(mean(val1), mean(val2), .N), by = .(id1, id2)]&nbsp; &nbsp;# simplest&nbsp; &nbsp;id1 id2&nbsp; V1&nbsp; V2 N1:&nbsp; &nbsp;a&nbsp; &nbsp;x 1.5 6.5 22:&nbsp; &nbsp;a&nbsp; &nbsp;y 3.5 7.0 23:&nbsp; &nbsp;b&nbsp; &nbsp;x 2.0 8.0 24:&nbsp; &nbsp;b&nbsp; &nbsp;y 3.0 6.0 2> DT[ , .(val1.m = mean(val1), val2.m = mean(val2), count = .N), by = .(id1, id2)]&nbsp; # named&nbsp; &nbsp;id1 id2 val1.m val2.m count1:&nbsp; &nbsp;a&nbsp; &nbsp;x&nbsp; &nbsp; 1.5&nbsp; &nbsp; 6.5&nbsp; &nbsp; &nbsp;22:&nbsp; &nbsp;a&nbsp; &nbsp;y&nbsp; &nbsp; 3.5&nbsp; &nbsp; 7.0&nbsp; &nbsp; &nbsp;23:&nbsp; &nbsp;b&nbsp; &nbsp;x&nbsp; &nbsp; 2.0&nbsp; &nbsp; 8.0&nbsp; &nbsp; &nbsp;24:&nbsp; &nbsp;b&nbsp; &nbsp;y&nbsp; &nbsp; 3.0&nbsp; &nbsp; 6.0&nbsp; &nbsp; &nbsp;2> DT[ , c(lapply(.SD, mean), count = .N), by = .(id1, id2)]&nbsp; &nbsp;# mean over all columns&nbsp; &nbsp;id1 id2 val1 val2 count1:&nbsp; &nbsp;a&nbsp; &nbsp;x&nbsp; 1.5&nbsp; 6.5&nbsp; &nbsp; &nbsp;22:&nbsp; &nbsp;a&nbsp; &nbsp;y&nbsp; 3.5&nbsp; 7.0&nbsp; &nbsp; &nbsp;23:&nbsp; &nbsp;b&nbsp; &nbsp;x&nbsp; 2.0&nbsp; 8.0&nbsp; &nbsp; &nbsp;24:&nbsp; &nbsp;b&nbsp; &nbsp;y&nbsp; 3.0&nbsp; 6.0&nbsp; &nbsp; &nbsp;2时间比较aggregate(使用于有关问题及所有其他3项答案)data.table看见这个基准()agg和agg.x案件)。

回首忆惘然

您可以添加一个count列,用sum,然后缩小以获得mean:x$count <- 1agg <- aggregate(. ~ id1 + id2, data = x,FUN = sum)agg#&nbsp; &nbsp;id1 id2 val1 val2 count# 1&nbsp; &nbsp;a&nbsp; &nbsp;x&nbsp; &nbsp; 3&nbsp; &nbsp;13&nbsp; &nbsp; &nbsp;2# 2&nbsp; &nbsp;b&nbsp; &nbsp;x&nbsp; &nbsp; 4&nbsp; &nbsp;16&nbsp; &nbsp; &nbsp;2# 3&nbsp; &nbsp;a&nbsp; &nbsp;y&nbsp; &nbsp; 7&nbsp; &nbsp;14&nbsp; &nbsp; &nbsp;2# 4&nbsp; &nbsp;b&nbsp; &nbsp;y&nbsp; &nbsp; 6&nbsp; &nbsp;12&nbsp; &nbsp; &nbsp;2agg[c("val1", "val2")] <- agg[c("val1", "val2")] / agg$countagg#&nbsp; &nbsp;id1 id2 val1 val2 count# 1&nbsp; &nbsp;a&nbsp; &nbsp;x&nbsp; 1.5&nbsp; 6.5&nbsp; &nbsp; &nbsp;2# 2&nbsp; &nbsp;b&nbsp; &nbsp;x&nbsp; 2.0&nbsp; 8.0&nbsp; &nbsp; &nbsp;2# 3&nbsp; &nbsp;a&nbsp; &nbsp;y&nbsp; 3.5&nbsp; 7.0&nbsp; &nbsp; &nbsp;2# 4&nbsp; &nbsp;b&nbsp; &nbsp;y&nbsp; 3.0&nbsp; 6.0&nbsp; &nbsp; &nbsp;2它的优点是保留列名并创建一个count列。
打开App,查看更多内容
随时随地看视频慕课网APP