猿问

收集多组列

收集多组列

我有一项在线调查的数据,在调查中,被调查者反复回答1到3次问题。调查软件(Qualtrics)将这些数据记录在多个列中,也就是说,调查中的Q3.2将包含列。Q3.2.1.Q3.2.2.,和Q3.2.3.:

df <- data.frame(
  id = 1:10,
  time = as.Date('2009-01-01') + 0:9,
  Q3.2.1. = rnorm(10, 0, 1),
  Q3.2.2. = rnorm(10, 0, 1),
  Q3.2.3. = rnorm(10, 0, 1),
  Q3.3.1. = rnorm(10, 0, 1),
  Q3.3.2. = rnorm(10, 0, 1),
  Q3.3.3. = rnorm(10, 0, 1))# Sample data

   id       time    Q3.2.1.     Q3.2.2.    Q3.2.3.     Q3.3.1.    Q3.3.2.     Q3.3.3.1   1 2009-01-01 -0.2059165 -0.29177677 -0.7107192  1.52718069 -0.4484351 -1.215506002   2 2009-01-02 -0.1981136 -1.19813815  1.1750200 -0.40380049 -1.8376094  1.035884823   3 2009-01-03  0.3514795 -0.27425539  1.1171712 -1.02641801 -2.0646661 -0.35353058...

我想将所有QN.N*列合并成整洁的单个QN.N列,最终得到如下的结果:

   id       time loop_number        Q3.2        Q3.31   1 2009-01-01           1 -0.20591649  1.527180692   2 2009-01-02           1 -0.19811357 -0.403800493   3 2009-01-03           1  0.35147949 -1.02641801...11  1 2009-01-01           2 -0.29177677  -0.448435112  2 2009-01-02           2 -1.19813815  -1.837609413  3 2009-01-03           2 -0.27425539  -2.0646661...21  1 2009-01-01           3 -0.71071921 -1.2155060022  2 2009-01-02           3  1.17501999  1.0358848223  3 2009-01-03           3  1.11717121 -0.35353058...

这个tidyr库有gather()函数,它对组合非常有用。一组栏:


得到的数据帧有30行,如预期的那样(10个个体,每个循环3个循环)。但是,收集第二组列不能正常工作-它成功地使这两组合并的列正常工作。Q3.2Q3.3,但最后为90行,而不是30行(10个个体的所有组合,Q3.2的3个循环,Q3.3的3个循环;实际数据中每组列的组合将大幅度增加):
df %>% gather(loop_number, Q3.2, starts_with("Q3.2")) %>% 
  gather(loop_number, Q3.3, starts_with("Q3.3")) %>%
  mutate(loop_number = str_sub(loop_number,-2,-2))


   id       time loop_number        Q3.2        Q3.31   1 2009-01-01           1 -0.20591649  1.527180692   2 2009-01-02           1 -0.19811357 -0.403800493   3 2009-01-03           1  0.35147949 -1.02641801...89  9 2009-01-09           3 -0.58581232 -0.1318702490 10 2009-01-10           3 -2.33393981 -0.48502131

是否有一种方法可以使用多个调用gather()像这样,在保持正确的行数的同时组合这样的列的小子集?


慕后森
浏览 516回答 3
3回答

噜噜哒

这种方法对我来说似乎很自然:df %>%&nbsp; gather(key, value, -id, -time) %>%&nbsp; extract(key, c("question", "loop_number"), "(Q.\\..)\\.(.)") %>%&nbsp; spread(question, value)首先收集所有问题列,使用extract()分离成question和loop_number,然后spread()回到列中。#>&nbsp; &nbsp; id&nbsp; &nbsp; &nbsp; &nbsp;time loop_number&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Q3.2&nbsp; &nbsp; &nbsp; &nbsp; Q3.3#> 1&nbsp; &nbsp;1 2009-01-01&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; 0.142259203 -0.35842736#> 2&nbsp; &nbsp;1 2009-01-01&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2&nbsp; 0.061034802&nbsp; 0.79354061#> 3&nbsp; &nbsp;1 2009-01-01&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3 -0.525686204 -0.67456611#> 4&nbsp; &nbsp;2 2009-01-02&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 -1.044461185 -1.19662936#> 5&nbsp; &nbsp;2 2009-01-02&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2&nbsp; 0.393808163&nbsp; 0.42384717

Qyouu

这可以用reshape..这是有可能的dplyr尽管如此。&nbsp; colnames(df) <- gsub("\\.(.{2})$", "_\\1", colnames(df))&nbsp; colnames(df)[2] <- "Date"&nbsp; res <- reshape(df, idvar=c("id", "Date"), varying=3:8, direction="long", sep="_")&nbsp; row.names(res) <- 1:nrow(res)&nbsp; &nbsp;head(res)&nbsp; #&nbsp; id&nbsp; &nbsp; &nbsp; &nbsp;Date time&nbsp; &nbsp; &nbsp; &nbsp;Q3.2&nbsp; &nbsp; &nbsp; &nbsp;Q3.3&nbsp; #1&nbsp; 1 2009-01-01&nbsp; &nbsp; 1&nbsp; 1.3709584&nbsp; 0.4554501&nbsp; #2&nbsp; 2 2009-01-02&nbsp; &nbsp; 1 -0.5646982&nbsp; 0.7048373&nbsp; #3&nbsp; 3 2009-01-03&nbsp; &nbsp; 1&nbsp; 0.3631284&nbsp; 1.0351035&nbsp; #4&nbsp; 4 2009-01-04&nbsp; &nbsp; 1&nbsp; 0.6328626 -0.6089264&nbsp; #5&nbsp; 5 2009-01-05&nbsp; &nbsp; 1&nbsp; 0.4042683&nbsp; 0.5049551&nbsp; #6&nbsp; 6 2009-01-06&nbsp; &nbsp; 1 -0.1061245 -1.7170087或使用dplyr&nbsp; library(tidyr)&nbsp; library(dplyr)&nbsp; colnames(df) <- gsub("\\.(.{2})$", "_\\1", colnames(df))&nbsp; df %>%&nbsp; &nbsp; &nbsp;gather(loop_number, "Q3", starts_with("Q3")) %>%&nbsp;&nbsp; &nbsp; &nbsp;separate(loop_number,c("L1", "L2"), sep="_") %>%&nbsp;&nbsp; &nbsp; &nbsp;spread(L1, Q3) %>%&nbsp; &nbsp; &nbsp;select(-L2) %>%&nbsp; &nbsp; &nbsp;head()&nbsp; #&nbsp; id&nbsp; &nbsp; &nbsp; &nbsp;time&nbsp; &nbsp; &nbsp; &nbsp;Q3.2&nbsp; &nbsp; &nbsp; &nbsp;Q3.3&nbsp; #1&nbsp; 1 2009-01-01&nbsp; 1.3709584&nbsp; 0.4554501&nbsp; #2&nbsp; 1 2009-01-01&nbsp; 1.3048697&nbsp; 0.2059986&nbsp; #3&nbsp; 1 2009-01-01 -0.3066386&nbsp; 0.3219253&nbsp; #4&nbsp; 2 2009-01-02 -0.5646982&nbsp; 0.7048373&nbsp; #5&nbsp; 2 2009-01-02&nbsp; 2.2866454 -0.3610573&nbsp; #6&nbsp; 2 2009-01-02 -1.7813084 -0.7838389
随时随地看视频慕课网APP
我要回答