猿问

用最新的非NA值替换NA

用最新的非NA值替换NA

在data.frame(或data.table)中,我想用最近的非NA值“填充”NA。一个简单的例子,使用向量(而不是a data.frame)如下:


> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)

我想要一个fill.NAs()允许我构造的函数yy:


> yy

[1] NA NA NA  2  2  2  2  3  3  3  4  4

我需要对许多(总计~1 Tb)小尺寸data.frames(~30-50 Mb)重复此操作,其中一行是NA,其所有条目都是。解决问题的好方法是什么?


我做的丑陋的解决方案使用这个功能:


last <- function (x){

    x[length(x)]

}    


fill.NAs <- function(isNA){

if (isNA[1] == 1) {

    isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs 

                                              # can't be forward filled

}

isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)

isNA.pos[isNA.diff < 0] <- 0

isNA.neg[isNA.diff > 0] <- 0

which.isNA.neg <- which(as.logical(isNA.neg))

if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works

which.isNA.pos <- which(as.logical(isNA.pos))

which.isNA <- which(as.logical(isNA))

if (length(which.isNA.neg)==length(which.isNA.pos)){

    replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)], 

                                which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] - 

                                which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])      

    replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))

} else {

    replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])     

    replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))

}

replacement

}

该功能fill.NAs使用如下:


y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)

isNA <- as.numeric(is.na(y))

replacement <- fill.NAs(isNA)

if (length(replacement)){

which.isNA <- which(as.logical(isNA))

to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]

y[to.replace] <- y[replacement]

产量


> y

[1] NA  2  2  2  2  3  3  3  4  4  4

......似乎有效。但是,伙计,这太丑了!有什么建议?


慕田峪4524236
浏览 736回答 3
3回答
随时随地看视频慕课网APP
我要回答