我们是否有任何功能可以在 R 或 Python 中过滤数据

我是 R 的新手,我无法弄清楚如何根据需要过滤数据

下面是数据(326 行和 6 列)

数据集

这是一个小例子:

Author,Commenid,Parentid,Submissionid Score Stance

User1 ,  333c ,    222b ,   111b     , 10 ,  Positive      

User2 ,  444c ,    333c ,    5hdc    , 15 ,  Neutral

User3 ,  222b ,    555d ,    23er    , 20 ,  Negative

User4 ,  555d ,    666f ,    111b    , 11 ,  Positive

这里user1的意思是,他已经回复了user2


 user3 had replied to user1  

 user4 had replied to user3

我想过滤为具有相同 commentid 和 parentid 的用户,对于上面的示例,我们将过滤为数据


Author     Score   Stance         Reply    Score  Stance

User2      15      Neutral          User1      10    Positive 

User1      10      Positive         User3      20    Negative 

User3      20     Negative         User4      11    Positive

我尝试了很多但我无法弄清楚,任何人都可以帮助我如何准确地做到这一点(R 或 Python)。


胡子哥哥
浏览 96回答 2
2回答

慕慕森

这是一个基本的 R 答案。第一match列Commenid与Parentid. 创建一个数据集,其中Author列和Reply作者的列之前匹配。保留所有没有NA值的行,并将 ( merge) 与原始数据连接起来以获得其他列。i <- with(df1, match(Commenid, Parentid))res <- data.frame(Author = df1$Author, Reply = df1$Author[i])res <- res[complete.cases(res), ]&nbsp;&nbsp;merge(res, df1)#&nbsp; Author Reply Commenid Parentid Submissionid#1&nbsp; User1 User2&nbsp; &nbsp; &nbsp;333c&nbsp; &nbsp; &nbsp;222b&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;111b#2&nbsp; User3 User1&nbsp; &nbsp; &nbsp;222b&nbsp; &nbsp; &nbsp;555d&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;23er#3&nbsp; User4 User3&nbsp; &nbsp; &nbsp;555d&nbsp; &nbsp; &nbsp;666f&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;111b一种dplyr解决方案可能是library(dplyr)df1 %>%&nbsp; mutate(i = match(Commenid, Parentid),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Reply = Author[i]) %>%&nbsp; filter(!is.na(i)) %>%&nbsp; select(Author, Reply, everything(vars = -i))数据df1 <- read.csv(text = "Author,Commenid,Parentid,Submissionid&nbsp;&nbsp;User1 ,&nbsp; 333c ,&nbsp; &nbsp; 222b ,&nbsp; &nbsp;111b&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;User2 ,&nbsp; 444c ,&nbsp; &nbsp; 333c ,&nbsp; &nbsp; 5hdc&nbsp; &nbsp;User3 ,&nbsp; 222b ,&nbsp; &nbsp; 555d ,&nbsp; &nbsp; 23er&nbsp; &nbsp;User4 ,&nbsp; 555d ,&nbsp; &nbsp; 666f ,&nbsp; &nbsp; 111b&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;")df1[] <- lapply(df1, trimws)编辑有了评论中描述的新数据和问题,这里有一个dplyr解决方案。在与上面基本相同之后,它将结果与原始数据集连接起来并对列重新排序。library(dplyr)df2 %>%&nbsp; mutate(i = match(Commenid, Parentid),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Reply = Author[i]) %>%&nbsp; filter(!is.na(i)) %>%&nbsp; select(-i) %>%&nbsp; select(Author, Score, Stance, Reply, everything()) %>%&nbsp; left_join(df2 %>% select(Author, Score, Stance), by = c("Reply" = "Author")) %>%&nbsp; select(-matches("id$"), everything(), matches("id$"))新数据df2 <- read.csv(text = "Author,Commenid,Parentid,Submissionid, Score, StanceUser1 ,&nbsp; 333c ,&nbsp; &nbsp; 222b ,&nbsp; &nbsp;111b&nbsp; &nbsp; &nbsp;, 10 ,&nbsp; Positive&nbsp; &nbsp; &nbsp;&nbsp;User2 ,&nbsp; 444c ,&nbsp; &nbsp; 333c ,&nbsp; &nbsp; 5hdc&nbsp; &nbsp; , 15 ,&nbsp; NeutralUser3 ,&nbsp; 222b ,&nbsp; &nbsp; 555d ,&nbsp; &nbsp; 23er&nbsp; &nbsp; , 20 ,&nbsp; NegativeUser4 ,&nbsp; 555d ,&nbsp; &nbsp; 666f ,&nbsp; &nbsp; 111b&nbsp; &nbsp; , 11 ,&nbsp; Positive")names(df1) <- trimws(names(df1))df1[] <- lapply(df1, trimws)

慕侠2389804

您可以将每个用户与其他用户进行比较,如果commentid相等parentid则您可以打印它,下面是您如何在 Python 中执行此操作:for u1 in dataset :&nbsp; &nbsp; for u2 in dataset :&nbsp; &nbsp; &nbsp; &nbsp; if u1['parentid'] == u2['commentid'] :&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;print( u1['Author'],' had comment of ',u2['Author'] )
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python