选择列具有类似'hsa ..'的字符串的行(部分字符串匹配)

选择列具有类似'hsa ..'的字符串的行(部分字符串匹配)

我有一个包含micro RNA数据的371MB文本文件。基本上,我只想选择那些有人类microRNA信息的行。

我已经使用read.table读取了该文件。通常,我会用sqldf完成我想要的 - 如果它有'like'语法(select * from <>其中miRNA就像'hsa')。不幸的是 - sqldf不支持该语法。

我怎么能在R中这样做?我查看了stackoverflow并没有看到如何进行部分字符串匹配的示例。我甚至安装了stringr包 - 但它并不完全符合我的需要。

我想做的是这样的 - 所有选择hsa- *的行。

selectedRows <- conservedData[, conservedData$miRNA %like% "hsa-"]

当然,这是不正确的语法。

有人可以帮我这个吗?非常感谢阅读。

阿斯达


犯罪嫌疑人X
浏览 538回答 3
3回答

蝴蝶刀刀

我注意到你%like%在当前的方法中提到了一个函数。我不知道这是否是对%like%“data.table” 的引用,但如果是,你肯定可以按如下方式使用它。请注意,对象不必是a data.table(但也要记住data.frames和data.tables的子集方法不相同):library(data.table)mtcars[rownames(mtcars) %like% "Merc", ]iris[iris$Species %like% "osa", ]如果这就是你所拥有的,那么也许你只是混合了行和列位置来分组数据。如果您不想加载包,可以尝试使用grep()搜索匹配的字符串。以下是mtcars数据集的示例,其中我们匹配行名称包含“Merc”的所有行:mtcars[grep("Merc", rownames(mtcars)), ]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;mpg cyl&nbsp; disp&nbsp; hp drat&nbsp; &nbsp;wt qsec vs am gear carb# Merc 240D&nbsp; &nbsp;24.4&nbsp; &nbsp;4 146.7&nbsp; 62 3.69 3.19 20.0&nbsp; 1&nbsp; 0&nbsp; &nbsp; 4&nbsp; &nbsp; 2# Merc 230&nbsp; &nbsp; 22.8&nbsp; &nbsp;4 140.8&nbsp; 95 3.92 3.15 22.9&nbsp; 1&nbsp; 0&nbsp; &nbsp; 4&nbsp; &nbsp; 2# Merc 280&nbsp; &nbsp; 19.2&nbsp; &nbsp;6 167.6 123 3.92 3.44 18.3&nbsp; 1&nbsp; 0&nbsp; &nbsp; 4&nbsp; &nbsp; 4# Merc 280C&nbsp; &nbsp;17.8&nbsp; &nbsp;6 167.6 123 3.92 3.44 18.9&nbsp; 1&nbsp; 0&nbsp; &nbsp; 4&nbsp; &nbsp; 4# Merc 450SE&nbsp; 16.4&nbsp; &nbsp;8 275.8 180 3.07 4.07 17.4&nbsp; 0&nbsp; 0&nbsp; &nbsp; 3&nbsp; &nbsp; 3# Merc 450SL&nbsp; 17.3&nbsp; &nbsp;8 275.8 180 3.07 3.73 17.6&nbsp; 0&nbsp; 0&nbsp; &nbsp; 3&nbsp; &nbsp; 3# Merc 450SLC 15.2&nbsp; &nbsp;8 275.8 180 3.07 3.78 18.0&nbsp; 0&nbsp; 0&nbsp; &nbsp; 3&nbsp; &nbsp; 3另一个例子,使用iris搜索字符串的数据集osa:irisSubset <- iris[grep("osa", iris$Species), ]head(irisSubset)#&nbsp; &nbsp;Sepal.Length Sepal.Width Petal.Length Petal.Width Species# 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5.1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2&nbsp; setosa# 2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4.9&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2&nbsp; setosa# 3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4.7&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2&nbsp; setosa# 4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4.6&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2&nbsp; setosa# 5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.6&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2&nbsp; setosa# 6&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5.4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.9&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.7&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.4&nbsp; setosa对于你的问题尝试:selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]

湖上湖

尝试str_detect()使用stringr包,它检测字符串中是否存在模式。下面是还采用了一种方法%>%管和filter()从dplyr包:library(stringr)library(dplyr)CO2 %>%&nbsp; filter(str_detect(Treatment, "non"))&nbsp; &nbsp;Plant&nbsp; &nbsp; &nbsp; &nbsp; Type&nbsp; Treatment conc uptake1&nbsp; &nbsp; Qn1&nbsp; &nbsp; &nbsp; Quebec nonchilled&nbsp; &nbsp;95&nbsp; &nbsp;16.02&nbsp; &nbsp; Qn1&nbsp; &nbsp; &nbsp; Quebec nonchilled&nbsp; 175&nbsp; &nbsp;30.43&nbsp; &nbsp; Qn1&nbsp; &nbsp; &nbsp; Quebec nonchilled&nbsp; 250&nbsp; &nbsp;34.84&nbsp; &nbsp; Qn1&nbsp; &nbsp; &nbsp; Quebec nonchilled&nbsp; 350&nbsp; &nbsp;37.25&nbsp; &nbsp; Qn1&nbsp; &nbsp; &nbsp; Quebec nonchilled&nbsp; 500&nbsp; &nbsp;35.3...对过滤变量包含子串“非”的行过滤样本CO2数据集(R附带)。您可以调整是否str_detect找到固定匹配或使用正则表达式 - 请参阅stringr包的文档。

翻阅古今

LIKE 应该在sqlite中工作:require(sqldf)df <- data.frame(name = c('bob','robert','peter'),id=c(1,2,3))sqldf("select * from df where name LIKE '%er%'")&nbsp; &nbsp; name id1 robert&nbsp; 22&nbsp; peter&nbsp; 3
打开App,查看更多内容
随时随地看视频慕课网APP