对数据帧中组内的行进行编号

对数据帧中组内的行进行编号

使用类似于此的数据框架:


set.seed(100)  

df <- data.frame(cat = c(rep("aaa", 5), rep("bbb", 5), rep("ccc", 5)), val = runif(15))             

df <- df[order(df$cat, df$val), ]  

df  


   cat        val  

1  aaa 0.05638315  

2  aaa 0.25767250  

3  aaa 0.30776611  

4  aaa 0.46854928  

5  aaa 0.55232243  

6  bbb 0.17026205  

7  bbb 0.37032054  

8  bbb 0.48377074  

9  bbb 0.54655860  

10 bbb 0.81240262  

11 ccc 0.28035384  

12 ccc 0.39848790  

13 ccc 0.62499648  

14 ccc 0.76255108  

15 ccc 0.88216552 

我试图在每个组中添加一个有编号的列。这样做显然不是利用R的力量:


 df$num <- 1  

 for (i in 2:(length(df[,1]))) {  

   if (df[i,"cat"]==df[(i-1),"cat"]) {  

     df[i,"num"]<-df[i-1,"num"]+1  

     }  

 }  

 df  


   cat        val num  

1  aaa 0.05638315   1  

2  aaa 0.25767250   2  

3  aaa 0.30776611   3  

4  aaa 0.46854928   4  

5  aaa 0.55232243   5  

6  bbb 0.17026205   1  

7  bbb 0.37032054   2  

8  bbb 0.48377074   3  

9  bbb 0.54655860   4  

10 bbb 0.81240262   5  

11 ccc 0.28035384   1  

12 ccc 0.39848790   2  

13 ccc 0.62499648   3  

14 ccc 0.76255108   4  

15 ccc 0.88216552   5  

做这件事的好方法是什么?


动漫人物
浏览 845回答 4
4回答

慕姐8265434

因为我做了这个r-常见问题更完整的问题,一个基本的R选项sequence和rle:df$num <- sequence(rle(df$cat)$lengths)它给出了预期的结果:> df&nbsp; &nbsp;cat&nbsp; &nbsp; &nbsp; &nbsp; val num4&nbsp; aaa 0.05638315&nbsp; &nbsp;12&nbsp; aaa 0.25767250&nbsp; &nbsp;21&nbsp; aaa 0.30776611&nbsp; &nbsp;35&nbsp; aaa 0.46854928&nbsp; &nbsp;43&nbsp; aaa 0.55232243&nbsp; &nbsp;510 bbb 0.17026205&nbsp; &nbsp;18&nbsp; bbb 0.37032054&nbsp; &nbsp;26&nbsp; bbb 0.48377074&nbsp; &nbsp;39&nbsp; bbb 0.54655860&nbsp; &nbsp;47&nbsp; bbb 0.81240262&nbsp; &nbsp;513 ccc 0.28035384&nbsp; &nbsp;114 ccc 0.39848790&nbsp; &nbsp;211 ccc 0.62499648&nbsp; &nbsp;315 ccc 0.76255108&nbsp; &nbsp;412 ccc 0.88216552&nbsp; &nbsp;5如果df$cat是一个因素变量,您需要将它包装在as.character第一:df$num <- sequence(rle(as.character(df$cat))$lengths)

繁花不似锦

下面是使用for按组循环,而不是按行循环(就像OP做的那样)for&nbsp;(i&nbsp;in&nbsp;unique(df$cat))&nbsp;df$num[df$cat&nbsp;==&nbsp;i]&nbsp;<-&nbsp;seq_len(sum(df$cat&nbsp;==&nbsp;i))

BIG阳

我想添加一个data.table变量使用rank()函数,它提供了更改顺序的额外可能性,从而使其比seq_len()解决方案,非常类似于RDBMS中的行号函数。# Variant with ascending orderinglibrary(data.table)dt <- data.table(df)dt[, .( val&nbsp; &nbsp;, num = rank(val))&nbsp; &nbsp; , by = list(cat)][order(cat, num),]&nbsp; &nbsp; cat&nbsp; &nbsp; &nbsp; &nbsp; val num&nbsp;1: aaa 0.05638315&nbsp; &nbsp;1&nbsp;2: aaa 0.25767250&nbsp; &nbsp;2&nbsp;3: aaa 0.30776611&nbsp; &nbsp;3&nbsp;4: aaa 0.46854928&nbsp; &nbsp;4&nbsp;5: aaa 0.55232243&nbsp; &nbsp;5&nbsp;6: bbb 0.17026205&nbsp; &nbsp;1&nbsp;7: bbb 0.37032054&nbsp; &nbsp;2&nbsp;8: bbb 0.48377074&nbsp; &nbsp;3&nbsp;9: bbb 0.54655860&nbsp; &nbsp;410: bbb 0.81240262&nbsp; &nbsp;511: ccc 0.28035384&nbsp; &nbsp;112: ccc 0.39848790&nbsp; &nbsp;213: ccc 0.62499648&nbsp; &nbsp;314: ccc 0.76255108&nbsp; &nbsp;4# Variant with descending orderingdt[, .( val&nbsp; &nbsp;, num = rank(-val))&nbsp; &nbsp; , by = list(cat)][order(cat, num),]
打开App,查看更多内容
随时随地看视频慕课网APP