慕慕森
这是一个基本的 R 答案。第一match列Commenid与Parentid. 创建一个数据集,其中Author列和Reply作者的列之前匹配。保留所有没有NA值的行,并将 ( merge) 与原始数据连接起来以获得其他列。i <- with(df1, match(Commenid, Parentid))res <- data.frame(Author = df1$Author, Reply = df1$Author[i])res <- res[complete.cases(res), ] merge(res, df1)# Author Reply Commenid Parentid Submissionid#1 User1 User2 333c 222b 111b#2 User3 User1 222b 555d 23er#3 User4 User3 555d 666f 111b一种dplyr解决方案可能是library(dplyr)df1 %>% mutate(i = match(Commenid, Parentid), Reply = Author[i]) %>% filter(!is.na(i)) %>% select(Author, Reply, everything(vars = -i))数据df1 <- read.csv(text = "Author,Commenid,Parentid,Submissionid User1 , 333c , 222b , 111b User2 , 444c , 333c , 5hdc User3 , 222b , 555d , 23er User4 , 555d , 666f , 111b ")df1[] <- lapply(df1, trimws)编辑有了评论中描述的新数据和问题,这里有一个dplyr解决方案。在与上面基本相同之后,它将结果与原始数据集连接起来并对列重新排序。library(dplyr)df2 %>% mutate(i = match(Commenid, Parentid), Reply = Author[i]) %>% filter(!is.na(i)) %>% select(-i) %>% select(Author, Score, Stance, Reply, everything()) %>% left_join(df2 %>% select(Author, Score, Stance), by = c("Reply" = "Author")) %>% select(-matches("id$"), everything(), matches("id$"))新数据df2 <- read.csv(text = "Author,Commenid,Parentid,Submissionid, Score, StanceUser1 , 333c , 222b , 111b , 10 , Positive User2 , 444c , 333c , 5hdc , 15 , NeutralUser3 , 222b , 555d , 23er , 20 , NegativeUser4 , 555d , 666f , 111b , 11 , Positive")names(df1) <- trimws(names(df1))df1[] <- lapply(df1, trimws)