如何提取每个组的前n行?

我有一个data.table dt。此data.table首先按列date(我的分组变量)排序,然后按列排序age:


library(data.table)

setkeyv(dt, c("date", "age")) # Sorts table first by column "date" then by "age"

> dt

         date age     name

1: 2000-01-01   3   Andrew

2: 2000-01-01   4      Ben

3: 2000-01-01   5  Charlie

4: 2000-01-02   6     Adam

5: 2000-01-02   7      Bob

6: 2000-01-02   8 Campbell

我的问题是:我想知道是否可以提取每个唯一日期的前两行?或更笼统地说:


如何提取每个组中的前n行?


在此示例中,结果dt.f为:


> dt.f = ???????? # function of dt to extract the first 2 rows per unique date

> dt.f

         date age   name

1: 2000-01-01   3 Andrew

2: 2000-01-01   4    Ben

3: 2000-01-02   6   Adam

4: 2000-01-02   7    Bob

ps这是创建上述data.table的代码:


install.packages("data.table")

library(data.table)

date <- c("2000-01-01","2000-01-01","2000-01-01",

    "2000-01-02","2000-01-02","2000-01-02")

age <- c(3,4,5,6,7,8)

name <- c("Andrew","Ben","Charlie","Adam","Bob","Campbell")

dt <- data.table(date, age, name)

setkeyv(dt,c("date","age")) # Sorts table first by column "date" then by "age"


神不在的星期二
浏览 429回答 2
2回答

开满天机

是的,只需.SD根据需要使用它并为其编制索引。&nbsp; DT[, .SD[1:2], by=date]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;date age&nbsp; &nbsp;name&nbsp; 1: 2000-01-01&nbsp; &nbsp;3 Andrew&nbsp; 2: 2000-01-01&nbsp; &nbsp;4&nbsp; &nbsp; Ben&nbsp; 3: 2000-01-02&nbsp; &nbsp;6&nbsp; &nbsp;Adam&nbsp; 4: 2000-01-02&nbsp; &nbsp;7&nbsp; &nbsp; Bob根据@eddi的建议进行编辑。@eddi的建议是:请改用此命令以提高速度:&nbsp; DT[DT[, .I[1:2], by = date]$V1]&nbsp; # using a slightly larger data set&nbsp; > microbenchmark(SDstyle=DT[, .SD[1:2], by=date], IStyle=DT[DT[, .I[1:2], by = date]$V1], times=200L)&nbsp; Unit: milliseconds&nbsp; &nbsp; &nbsp; expr&nbsp; &nbsp; &nbsp; &nbsp;min&nbsp; &nbsp; &nbsp; &nbsp; lq&nbsp; &nbsp; median&nbsp; &nbsp; &nbsp; &nbsp; uq&nbsp; &nbsp; &nbsp; max neval&nbsp; &nbsp;SDstyle 13.567070 16.224797 22.170302 24.239881 88.26719&nbsp; &nbsp;200&nbsp; &nbsp; IStyle&nbsp; 1.675185&nbsp; 2.018773&nbsp; 2.168818&nbsp; 2.269292 11.31072&nbsp; &nbsp;200
打开App,查看更多内容
随时随地看视频慕课网APP