子集数据仅包含名称与条件匹配的列

我是否有一种方法可以基于以特定字符串开头的列名来对数据进行子集化?我有一些列的方式一样ABC_1 ABC_2 ABC_3,有些像XYZ_1, XYZ_2,XYZ_3我们说。

如何df仅基于包含上述文本部分的列(例如,ABCXYZ)对我的子集进行分类?我可以使用索引,但是列太分散在数据中,因此很难进行硬编码。

另外,我只想包含这些值>0中的6每一个1的行,因此,如果以上任何一列在行中都有一个,则可以切入我的最终数据帧。


海绵宝宝撒
浏览 492回答 3
3回答

慕姐8265434

试一试grepl您的名字data.frame。grepl将正则表达式与目标TRUE匹配,如果找到匹配项则返回,FALSE否则返回。该函数是矢量化的,因此您可以传递一个字符串向量来进行匹配,并且您将获得一个返回布尔值的向量。例#&nbsp; Datadf <- data.frame( ABC_1 = runif(3),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ABC_2 = runif(3),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; XYZ_1 = runif(3),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; XYZ_2 = runif(3) )#&nbsp; &nbsp; &nbsp; ABC_1&nbsp; &nbsp; &nbsp;ABC_2&nbsp; &nbsp; &nbsp;XYZ_1&nbsp; &nbsp; &nbsp;XYZ_2#1 0.3792645 0.3614199 0.9793573 0.7139381#2 0.1313246 0.9746691 0.7276705 0.0126057#3 0.7282680 0.6518444 0.9531389 0.9673290#&nbsp; Use grepldf[ , grepl( "ABC" , names( df ) ) ]#&nbsp; &nbsp; &nbsp; ABC_1&nbsp; &nbsp; &nbsp;ABC_2#1 0.3792645 0.3614199#2 0.1313246 0.9746691#3 0.7282680 0.6518444#&nbsp; grepl returns logical vector like this which is what we use to subset columnsgrepl( "ABC" , names( df ) )#[1]&nbsp; TRUE&nbsp; TRUE FALSE FALSE为了回答第二部分,我将创建子集data.frame,然后创建一个向量来索引要保留的行(逻辑向量),如下所示:set.seed(1)df <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ABC_2 = sample(0:1,3,repl = TRUE),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; XYZ_1 = sample(0:1,3,repl = TRUE),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; XYZ_2 = sample(0:1,3,repl = TRUE) )# We will want to discard the second row because 'all' ABC values are 0:#&nbsp; ABC_1 ABC_2 XYZ_1 XYZ_2#1&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;0#2&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;0#3&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;0df1 <- df[ , grepl( "ABC" , names( df ) ) ]ind <- apply( df1 , 1 , function(x) any( x > 0 ) )df1[ ind , ]#&nbsp; ABC_1 ABC_2#1&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp;1#3&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;1

慕丝7291255

您也可以使用starts_with和dplyr的select(),像这样:df <- df %>% dplyr:: select(starts_with("ABC"))

杨__羊羊

以防万一,对于data.table用户来说,以下内容适用于我:df[, grep("ABC", names(df)), with = FALSE]
打开App,查看更多内容
随时随地看视频慕课网APP