仅保留每个因子水平的最小值

我遇到了困扰我一段时间的问题……希望这里的任何人都可以帮助我。


我得到以下数据框


f <- c('a','a','b','b','b','c','d','d','d','d')

v1 <- c(1.3,10,2,10,10,1.1,10,3.1,10,10)

v2 <- c(1:10)

df <- data.frame(f,v1,v2)

f是一个因素;v1和v2是值。对于f的每个级别,我只想保留一行:在该因子级别中v1值最低的那一行。


f   v1  v2

a   1.3 1

b   2   3

c   1.1 6

d   3.1 8

我尝试了聚合,ddply,by,tapply等各种方法,但是似乎没有任何效果。如有任何建议,我将非常感谢。


蛊毒传说
浏览 489回答 3
3回答

潇潇雨雨

使用DWin的解决方案,tapply可以避免使用ave。df[ df$v1 == ave(df$v1, df$f, FUN=min), ]如下所示,这又可以提高速度。请注意,这也取决于级别数。我注意到ave,尽管它是R中更强大的功能之一,但我经常忘记它。f <- rep(letters[1:20],10000)v1 <- rnorm(20*10000)v2 <- 1:(20*10000)df <- data.frame(f,v1,v2)> system.time(df[ df$v1 == ave(df$v1, df$f, FUN=min), ])&nbsp; &nbsp;user&nbsp; system elapsed&nbsp;&nbsp; &nbsp;0.05&nbsp; &nbsp; 0.00&nbsp; &nbsp; 0.05&nbsp;> system.time(df[ df$v1 %in% tapply(df$v1, df$f, min), ])&nbsp; &nbsp;user&nbsp; system elapsed&nbsp;&nbsp; &nbsp;0.25&nbsp; &nbsp; 0.03&nbsp; &nbsp; 0.29&nbsp;> system.time(lapply(split(df, df$f), FUN = function(x) {+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;vec <- which(x[3] == min(x[3]))+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;return(x[vec, ])+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;})+&nbsp; .... [TRUNCATED]&nbsp;&nbsp; &nbsp;user&nbsp; system elapsed&nbsp;&nbsp; &nbsp;0.56&nbsp; &nbsp; 0.00&nbsp; &nbsp; 0.58&nbsp;> system.time(df[tapply(1:nrow(df),df$f,function(i) i[which.min(df$v1[i])]),]+ )&nbsp; &nbsp;user&nbsp; system elapsed&nbsp;&nbsp; &nbsp;0.17&nbsp; &nbsp; 0.00&nbsp; &nbsp; 0.19&nbsp;> system.time( ddply(df, .var = "f", .fun = function(x) {+&nbsp; &nbsp; &nbsp;return(subset(x, v1 %in% min(v1)))+&nbsp; &nbsp; &nbsp;}+ )+ )&nbsp; &nbsp;user&nbsp; system elapsed&nbsp;&nbsp; &nbsp;0.28&nbsp; &nbsp; 0.00&nbsp; &nbsp; 0.28&nbsp;
打开App,查看更多内容
随时随地看视频慕课网APP