从数据帧分层随机抽样

首页课程实战体系课手记专栏慕课教程

从数据帧分层随机抽样

从数据帧分层随机抽样

我有一个格式的数据框：

head(subset)

# ants 0 1 1 0 1

# age 1 2 2 1 3

# lc 1 1 0 1 0

我需要根据年龄和lc创建带有随机样本的新数据框。例如，我想要30个年龄的样本：1和lc：1,30个样本来自年龄：1和lc：0等。

我确实看过随机抽样方法;

newdata <- function(subset, age, 30)

但这不是我想要的代码。

叮当猫咪

浏览 679回答 2

2回答

牛魔王的故事

我建议使用stratified我的“splitstackshape”包或sample_n“dplyr”包：## Sample dataset.seed(1)n <- 1e4d <- data.table(age = sample(1:5, n, T),                 lc = rbinom(n, 1 , .5),                ants = rbinom(n, 1, .7))# table(d$age, d$lc)对于stratified，您基本上指定数据集，分层列和表示每个组所需大小的整数或表示要返回的分数的小数（例如，.1表示每组的10％）。library(splitstackshape)set.seed(1)out <- stratified(d, c("age", "lc"), 30)head(out)#    age lc ants# 1:   1  0    1# 2:   1  0    0# 3:   1  0    1# 4:   1  0    1# 5:   1  0    0# 6:   1  0    1table(out$age, out$lc)#    #      0  1#   1 30 30#   2 30 30#   3 30 30#   4 30 30#   5 30 30对于sample_n首先要创建一个分组表（使用group_by），然后指定想要观测次数。如果你想要比例取样，你应该使用sample_frac。library(dplyr)set.seed(1)out2 <- d %>%  group_by(age, lc) %>%  sample_n(30)# table(out2$age, out2$lc)

0 0

函数式编程

我建议使用stratified我的“splitstackshape”包或sample_n“dplyr”包：## Sample dataset.seed(1)n <- 1e4d <- data.table(age = sample(1:5, n, T),                 lc = rbinom(n, 1 , .5),                ants = rbinom(n, 1, .7))# table(d$age, d$lc)对于stratified，您基本上指定数据集，分层列和表示每个组所需大小的整数或表示要返回的分数的小数（例如，.1表示每组的10％）。library(splitstackshape)set.seed(1)out <- stratified(d, c("age", "lc"), 30)head(out)#    age lc ants# 1:   1  0    1# 2:   1  0    0# 3:   1  0    1# 4:   1  0    1# 5:   1  0    0# 6:   1  0    1table(out$age, out$lc)#    #      0  1#   1 30 30#   2 30 30#   3 30 30#   4 30 30#   5 30 30对于sample_n首先要创建一个分组表（使用group_by），然后指定想要观测次数。如果你想要比例取样，你应该使用sample_frac。library(dplyr)set.seed(1)out2 <- d %>%  group_by(age, lc) %>%  sample_n(30)# table(out2$age, out2$lc)

0 0

随时随地看视频慕课网APP

相关分类

R语言: r语言中，== 和=,<-的区别是什么？ 1 回答; R语言中$是什么意思？ 1 回答