累积粘贴(连接)由另一个变量分组的值

我在处理R中的数据帧时遇到问题。我想根据另一列中单元格的值将不同行中单元格的内容粘贴到一起。我的问题是我希望输出逐渐(累积)打印。输出向量的长度必须与输入向量的长度相同。这是一个与我正在处理的样本表相似的样本表:


id <- c("a", "a", "a", "b", "b", "b")

content <- c("A", "B", "A", "B", "C", "B")

(testdf <- data.frame(id, content, stringsAsFactors=FALSE))

#  id content

#1  a       A

#2  a       B

#3  a       A

#4  b       B

#5  b       C

#6  b       B

这就是我希望结果看起来像这样:


result <- c("A", "A B", "A B A", "B", "B C", "B C B") 

result


#[1] "A"     "A B"   "A B A" "B"     "B C"   "B C B"

我不需要这样的东西:


ddply(testdf, .(id), summarize, content_concatenated = paste(content, collapse = " "))


#  id content_concatenated

#1  a                A B A

#2  b                B C B


大话西游666
浏览 605回答 3
3回答

侃侃尔雅

您可以使用以下命令定义“累积粘贴”功能Reduce:cumpaste = function(x, .sep = " ")&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Reduce(function(x1, x2) paste(x1, x2, sep = .sep), x, accumulate = TRUE)cumpaste(letters[1:3], "; ")#[1] "a"&nbsp; &nbsp; &nbsp; &nbsp;"a; b"&nbsp; &nbsp; "a; b; c"Reduce的循环避免了从一开始就重新串联元素,因为它通过下一个元素延长了先前的串联。按组应用:ave(as.character(testdf$content), testdf$id, FUN = cumpaste)#[1] "A"&nbsp; &nbsp; &nbsp;"A B"&nbsp; &nbsp;"A B A" "B"&nbsp; &nbsp; &nbsp;"B C"&nbsp; &nbsp;"B C B"另一个想法是,可以在开始时依次连接整个向量,然后逐步地substring:cumpaste2 = function(x, .sep = " "){&nbsp; &nbsp; concat = paste(x, collapse = .sep)&nbsp; &nbsp; substring(concat, 1L, cumsum(c(nchar(x[[1L]]), nchar(x[-1L]) + nchar(.sep))))}cumpaste2(letters[1:3], " ;@-")#[1] "a"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;"a ;@-b"&nbsp; &nbsp; &nbsp; "a ;@-b ;@-c"这似乎也更快一些:set.seed(077)X = replicate(1e3, paste(sample(letters, sample(0:5, 1), TRUE), collapse = ""))identical(cumpaste(X, " --- "), cumpaste2(X, " --- "))#[1] TRUEmicrobenchmark::microbenchmark(cumpaste(X, " --- "), cumpaste2(X, " --- "), times = 30)#Unit: milliseconds#&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; expr&nbsp; &nbsp; &nbsp; min&nbsp; &nbsp; &nbsp; &nbsp;lq&nbsp; &nbsp; &nbsp;mean&nbsp; &nbsp;median&nbsp; &nbsp; &nbsp; &nbsp;uq&nbsp; &nbsp; &nbsp; max neval cld#&nbsp; cumpaste(X, " --- ") 21.19967 21.82295 26.47899 24.83196 30.34068 39.86275&nbsp; &nbsp; 30&nbsp; &nbsp;b# cumpaste2(X, " --- ") 14.41291 14.92378 16.87865 16.03339 18.56703 23.22958&nbsp; &nbsp; 30&nbsp; a...使其成为cumpaste_faster。

翻阅古今

您也可以尝试 dplyr&nbsp;library(dplyr)&nbsp;res <- testdf%>%&nbsp; &nbsp; &nbsp; &nbsp; mutate(n=row_number()) %>%&nbsp; &nbsp; &nbsp; &nbsp; group_by(id) %>%&nbsp; &nbsp; &nbsp; &nbsp; mutate(n1=n[1L]) %>%&nbsp; &nbsp; &nbsp; &nbsp; rowwise() %>%&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; do(data.frame(cont_concat= paste(content[.$n1:.$n],collapse=" "),stringsAsFactors=F))&nbsp;res$cont_concat&nbsp;#[1] "A"&nbsp; &nbsp; &nbsp;"A B"&nbsp; &nbsp;"A B A" "B"&nbsp; &nbsp; &nbsp;"B C"&nbsp; &nbsp;"B C B"

翻翻过去那场雪

这是ddply一种使用sapply和子集逐步粘贴在一起的方法:library(plyr)ddply(testdf, .(id), mutate, content_concatenated = sapply(seq_along(content), function(x) paste(content[seq(x)], collapse = " ")))&nbsp; id content content_concatenated1&nbsp; a&nbsp; &nbsp; &nbsp; &nbsp;A&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; A2&nbsp; a&nbsp; &nbsp; &nbsp; &nbsp;B&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; A B3&nbsp; a&nbsp; &nbsp; &nbsp; &nbsp;A&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; A B A4&nbsp; b&nbsp; &nbsp; &nbsp; &nbsp;B&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; B5&nbsp; b&nbsp; &nbsp; &nbsp; &nbsp;C&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; B C6&nbsp; b&nbsp; &nbsp; &nbsp; &nbsp;B&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; B C B
打开App,查看更多内容
随时随地看视频慕课网APP