如何在数据框中的分组变量中选择第一行和最后一行?

如何id在以下数据框中为每个唯一选择第一行和最后一行?


tmp <- structure(list(id = c(15L, 15L, 15L, 15L, 21L, 21L, 22L, 22L, 

22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L), d = c(1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), gr = c(2L, 1L, 

1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L), mm = c(3.4, 

4.9, 4.4, 5.5, 4, 3.8, 4, 4.9, 4.6, 2.7, 4, 3, 3, 2, 4, 2), area = c(1L, 

2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L)), .Names = c("id", 

"d", "gr", "mm", "area"), class = "data.frame", row.names = c(NA, 

-16L))

tmp

#>    id d gr  mm area

#> 1  15 1  2 3.4    1

#> 2  15 1  1 4.9    2

#> 3  15 1  1 4.4    1

#> 4  15 1  1 5.5    2

#> 5  21 1  1 4.0    2

#> 6  21 1  2 3.8    2

#> 7  22 1  1 4.0    2

#> 8  22 1  1 4.9    2

#> 9  22 1  2 4.6    2

#> 10 23 1  1 2.7    2

#> 11 23 1  1 4.0    2

#> 12 23 1  2 3.0    2

#> 13 24 1  1 3.0    2

#> 14 24 1  1 2.0    3

#> 15 24 1  1 4.0    2

#> 16 24 1  2 2.0    3


Qyouu
浏览 1387回答 3
3回答

心有法竹

一个plyr解决方案(tmp是您的数据框):library("plyr")ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])#&nbsp; &nbsp; id d gr&nbsp; mm area# 1&nbsp; 15 1&nbsp; 2 3.4&nbsp; &nbsp; 1# 2&nbsp; 15 1&nbsp; 1 5.5&nbsp; &nbsp; 2# 3&nbsp; 21 1&nbsp; 1 4.0&nbsp; &nbsp; 2# 4&nbsp; 21 1&nbsp; 2 3.8&nbsp; &nbsp; 2# 5&nbsp; 22 1&nbsp; 1 4.0&nbsp; &nbsp; 2# 6&nbsp; 22 1&nbsp; 2 4.6&nbsp; &nbsp; 2# 7&nbsp; 23 1&nbsp; 1 2.7&nbsp; &nbsp; 2# 8&nbsp; 23 1&nbsp; 2 3.0&nbsp; &nbsp; 2# 9&nbsp; 24 1&nbsp; 1 3.0&nbsp; &nbsp; 2# 10 24 1&nbsp; 2 2.0&nbsp; &nbsp; 3或使用dplyr(另请参见此处):library("dplyr")tmp %>%group_by(id) %>%slice(c(1, n())) %>%ungroup()# # A tibble: 10 × 5#&nbsp; &nbsp; &nbsp; &nbsp;id&nbsp; &nbsp; &nbsp;d&nbsp; &nbsp; gr&nbsp; &nbsp; mm&nbsp; area#&nbsp; &nbsp; <int> <int> <int> <dbl> <int># 1&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.4&nbsp; &nbsp; &nbsp;1# 2&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;5.5&nbsp; &nbsp; &nbsp;2# 3&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.0&nbsp; &nbsp; &nbsp;2# 4&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.8&nbsp; &nbsp; &nbsp;2# 5&nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.0&nbsp; &nbsp; &nbsp;2# 6&nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;4.6&nbsp; &nbsp; &nbsp;2# 7&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;2.7&nbsp; &nbsp; &nbsp;2# 8&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.0&nbsp; &nbsp; &nbsp;2# 9&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;3.0&nbsp; &nbsp; &nbsp;2# 10&nbsp; &nbsp; 24&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;2.0&nbsp; &nbsp; &nbsp;3

千万里不及你

这是base中的解决方案R。如果有多个相同的组,则id此代码返回每个单独组的第一行和最后一行。该解决方案可能比下面的其他答案更直观:lmy.df = read.table(text = '&nbsp; &nbsp; &nbsp;id&nbsp; &nbsp; d&nbsp; &nbsp; gr&nbsp; &nbsp; &nbsp;mm&nbsp; area&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.40&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.90&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.40&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;5.50&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.80&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;2.70&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;3.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;2.00&nbsp; &nbsp; &nbsp;3&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;2.00&nbsp; &nbsp; &nbsp;3', header = TRUE)head <- aggregate(lmy.df, by=list(lmy.df$id), FUN = function(x) { first = head(x,1) } )tail <- aggregate(lmy.df, by=list(lmy.df$id), FUN = function(x) {&nbsp; last = tail(x,1) } )head$order = 'first'tail$order = 'last'my.output <- rbind(head, tail)my.output#&nbsp; &nbsp;Group.1 id d gr&nbsp; mm area order#1&nbsp; &nbsp; &nbsp; &nbsp;15 15 1&nbsp; 2 3.4&nbsp; &nbsp; 1 first#2&nbsp; &nbsp; &nbsp; &nbsp;21 21 1&nbsp; 1 4.0&nbsp; &nbsp; 2 first#3&nbsp; &nbsp; &nbsp; &nbsp;22 22 1&nbsp; 1 4.0&nbsp; &nbsp; 2 first#4&nbsp; &nbsp; &nbsp; &nbsp;23 23 1&nbsp; 1 2.7&nbsp; &nbsp; 2 first#5&nbsp; &nbsp; &nbsp; &nbsp;24 24 1&nbsp; 1 3.0&nbsp; &nbsp; 2 first#6&nbsp; &nbsp; &nbsp; &nbsp;15 15 1&nbsp; 1 5.5&nbsp; &nbsp; 2&nbsp; last#7&nbsp; &nbsp; &nbsp; &nbsp;21 21 1&nbsp; 2 3.8&nbsp; &nbsp; 2&nbsp; last#8&nbsp; &nbsp; &nbsp; &nbsp;22 22 1&nbsp; 1 4.0&nbsp; &nbsp; 2&nbsp; last#9&nbsp; &nbsp; &nbsp; &nbsp;23 23 1&nbsp; 2 3.0&nbsp; &nbsp; 2&nbsp; last#10&nbsp; &nbsp; &nbsp; 24 24 1&nbsp; 2 2.0&nbsp; &nbsp; 3&nbsp; last自发布我的原始答案以来,我已经知道使用它lapply比更好apply。这是因为apply如果每个组具有相同的行数,则不起作用。请参阅此处:按组编号行时出错lmy.df = read.table(text = '&nbsp; &nbsp; &nbsp;id&nbsp; &nbsp; d&nbsp; &nbsp; gr&nbsp; &nbsp; &nbsp;mm&nbsp; area&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.40&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.90&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.40&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;5.50&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.80&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;2.70&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;3.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;2.00&nbsp; &nbsp; &nbsp;3&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;2.00&nbsp; &nbsp; &nbsp;3', header = TRUE)lmy.seq <- rle(lmy.df$id)$lengthslmy.df$first <- unlist(lapply(lmy.seq, function(x) seq(1,x)))lmy.df$last&nbsp; <- unlist(lapply(lmy.seq, function(x) seq(x,1,-1)))lmy.dflmy.df2 <- lmy.df[lmy.df$first==1 | lmy.df$last == 1,]lmy.df2#&nbsp; &nbsp;id d gr&nbsp; mm area first last#1&nbsp; 15 1&nbsp; 2 3.4&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 4#4&nbsp; 15 1&nbsp; 1 5.5&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; 1#5&nbsp; 21 1&nbsp; 1 4.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 2#6&nbsp; 21 1&nbsp; 2 3.8&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; 1#7&nbsp; 22 1&nbsp; 1 4.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 1#8&nbsp; 23 1&nbsp; 1 2.7&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 3#10 23 1&nbsp; 2 3.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;3&nbsp; &nbsp; 1#11 24 1&nbsp; 1 3.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 4#14 24 1&nbsp; 2 2.0&nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; 1这是一个示例,其中每个组都有两行:lmy.df = read.table(text = '&nbsp; &nbsp; &nbsp;id&nbsp; &nbsp; d&nbsp; &nbsp; gr&nbsp; &nbsp; &nbsp;mm&nbsp; area&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.40&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.90&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.80&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;4.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;6.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;2.70&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;23&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;3.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;3.00&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;24&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp;2.00&nbsp; &nbsp; &nbsp;3', header = TRUE)lmy.seq <- rle(lmy.df$id)$lengthslmy.df$first <- unlist(lapply(lmy.seq, function(x) seq(1,x)))lmy.df$last&nbsp; <- unlist(lapply(lmy.seq, function(x) seq(x,1,-1)))lmy.dflmy.df2 <- lmy.df[lmy.df$first==1 | lmy.df$last == 1,]lmy.df2#&nbsp; &nbsp;id d gr&nbsp; mm area first last#1&nbsp; 15 1&nbsp; 2 3.4&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 2#2&nbsp; 15 1&nbsp; 1 4.9&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; 1#3&nbsp; 21 1&nbsp; 1 4.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 2#4&nbsp; 21 1&nbsp; 2 3.8&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; 1#5&nbsp; 22 1&nbsp; 1 4.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 2#6&nbsp; 22 1&nbsp; 1 6.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; 1#7&nbsp; 23 1&nbsp; 1 2.7&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 2#8&nbsp; 23 1&nbsp; 2 3.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; 1#9&nbsp; 24 1&nbsp; 1 3.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 2#10 24 1&nbsp; 2 2.0&nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; 1原始答案:my.seq <- data.frame(rle(my.df$id)$lengths)my.df$first <- unlist(apply(my.seq, 1, function(x) seq(1,x)))my.df$last&nbsp; <- unlist(apply(my.seq, 1, function(x) seq(x,1,-1)))my.df2 <- my.df[my.df$first==1 | my.df$last == 1,]my.df2&nbsp; &nbsp;id d gr&nbsp; mm area first last1&nbsp; 15 1&nbsp; 2 3.4&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 44&nbsp; 15 1&nbsp; 1 5.5&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; 15&nbsp; 21 1&nbsp; 1 4.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 26&nbsp; 21 1&nbsp; 2 3.8&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; 17&nbsp; 22 1&nbsp; 1 4.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 39&nbsp; 22 1&nbsp; 2 4.6&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;3&nbsp; &nbsp; 110 23 1&nbsp; 1 2.7&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 312 23 1&nbsp; 2 3.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;3&nbsp; &nbsp; 113 24 1&nbsp; 1 3.0&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; 416 24 1&nbsp; 2 2.0&nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; 1
打开App,查看更多内容
随时随地看视频慕课网APP