从数据帧分层随机抽样

从数据帧分层随机抽样

我有一个格式的数据框:


head(subset)

# ants  0 1 1 0 1 

# age   1 2 2 1 3

# lc    1 1 0 1 0

我需要根据年龄和lc创建带有随机样本的新数据框。例如,我想要30个年龄的样本:1和lc:1,30个样本来自年龄:1和lc:0等。


我确实看过随机抽样方法;


newdata <- function(subset, age, 30)

但这不是我想要的代码。


叮当猫咪
浏览 569回答 2
2回答

牛魔王的故事

我建议使用stratified我的“splitstackshape”包或sample_n“dplyr”包:## Sample dataset.seed(1)n <- 1e4d <- data.table(age = sample(1:5, n, T),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lc = rbinom(n, 1 , .5),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ants = rbinom(n, 1, .7))# table(d$age, d$lc)对于stratified,您基本上指定数据集,分层列和表示每个组所需大小的整数或表示要返回的分数的小数(例如,.1表示每组的10%)。library(splitstackshape)set.seed(1)out <- stratified(d, c("age", "lc"), 30)head(out)#&nbsp; &nbsp; age lc ants# 1:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1# 2:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 0# 3:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1# 4:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1# 5:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 0# 6:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1table(out$age, out$lc)#&nbsp; &nbsp;&nbsp;#&nbsp; &nbsp; &nbsp; 0&nbsp; 1#&nbsp; &nbsp;1 30 30#&nbsp; &nbsp;2 30 30#&nbsp; &nbsp;3 30 30#&nbsp; &nbsp;4 30 30#&nbsp; &nbsp;5 30 30对于sample_n首先要创建一个分组表(使用group_by),然后指定想要观测次数。如果你想要比例取样,你应该使用sample_frac。library(dplyr)set.seed(1)out2 <- d %>%&nbsp; group_by(age, lc) %>%&nbsp; sample_n(30)# table(out2$age, out2$lc)

函数式编程

我建议使用stratified我的“splitstackshape”包或sample_n“dplyr”包:## Sample dataset.seed(1)n <- 1e4d <- data.table(age = sample(1:5, n, T),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lc = rbinom(n, 1 , .5),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ants = rbinom(n, 1, .7))# table(d$age, d$lc)对于stratified,您基本上指定数据集,分层列和表示每个组所需大小的整数或表示要返回的分数的小数(例如,.1表示每组的10%)。library(splitstackshape)set.seed(1)out <- stratified(d, c("age", "lc"), 30)head(out)#&nbsp; &nbsp; age lc ants# 1:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1# 2:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 0# 3:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1# 4:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1# 5:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 0# 6:&nbsp; &nbsp;1&nbsp; 0&nbsp; &nbsp; 1table(out$age, out$lc)#&nbsp; &nbsp;&nbsp;#&nbsp; &nbsp; &nbsp; 0&nbsp; 1#&nbsp; &nbsp;1 30 30#&nbsp; &nbsp;2 30 30#&nbsp; &nbsp;3 30 30#&nbsp; &nbsp;4 30 30#&nbsp; &nbsp;5 30 30对于sample_n首先要创建一个分组表(使用group_by),然后指定想要观测次数。如果你想要比例取样,你应该使用sample_frac。library(dplyr)set.seed(1)out2 <- d %>%&nbsp; group_by(age, lc) %>%&nbsp; sample_n(30)# table(out2$age, out2$lc)
打开App,查看更多内容
随时随地看视频慕课网APP