如何获取唯一 ID 并将另一列的每一行转换为 R 和 Python 中的另一列

我的数据具有以下形式:


groups <- c("4","4.2","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1")

x <- data.frame(ID = c(rep("samp_1", 4), rep("samp_2", 5)), Group = groups)

我如何得到这个?:


ID       col_1   col_2   col_3   col_4     col_5

samp_1   4       4.2     4.2.1   4.2.1.1   NA

samp_2   1       1.2     1.2.1   1.2.1.2   1.2.1.2.1

每列将由字符串的长度决定,因此第 4 列中的所有数据的长度均为 4(或包括点在内的长度为 7)。


我正在寻找最通用的解决方案(例如使用循环;使用尽可能少的包)因为我需要在 R 和 Python 中实现它。


蓝山帝景
浏览 92回答 3
3回答

HUH函数

在中,我们可以使用(from )R为新名称创建一个列,并将其转换为“宽”格式rowiddata.tablelibrary(dplyr)library(data.table)library(stringr)x %>%&nbsp; &nbsp;mutate(name = str_c('col_', rowid(ID))) %>%&nbsp;&nbsp; &nbsp;pivot_wider(names_from = name, values_from = Group)# A tibble: 2 x 6#&nbsp; ID&nbsp; &nbsp; &nbsp;col_1 col_2 col_3 col_4&nbsp; &nbsp;col_5&nbsp; &nbsp;&nbsp;#&nbsp; <chr>&nbsp; <chr> <chr> <chr> <chr>&nbsp; &nbsp;<chr>&nbsp; &nbsp;&nbsp;#1 samp_1 4&nbsp; &nbsp; &nbsp;4.2&nbsp; &nbsp;4.2.1 4.2.1.1 <NA>&nbsp; &nbsp; &nbsp;#2 samp_2 1&nbsp; &nbsp; &nbsp;1.2&nbsp; &nbsp;1.2.1 1.2.1.2 1.2.1.2.1或使用data.tablelibrary(data.table)dcast(setDT(x), ID ~ paste0('col_', rowid(ID)), value.var = 'Group')#&nbsp; &nbsp; &nbsp; &nbsp;ID col_1 col_2 col_3&nbsp; &nbsp;col_4&nbsp; &nbsp; &nbsp;col_5#1: samp_1&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp;4.2 4.2.1 4.2.1.1&nbsp; &nbsp; &nbsp; <NA>#2: samp_2&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;1.2 1.2.1 1.2.1.2 1.2.1.2.1或base R与reshapereshape(transform(x, name = paste0('col_', ave(seq_along(ID), ID,&nbsp;&nbsp; &nbsp; FUN = seq_along))), idvar = 'ID', direction = 'wide', timevar = 'name')

aluckdog

akrun 的优秀选择。如果数据有点乱,你可能想试试这个:x %>%&nbsp; mutate(temp = str_c('col_', str_count(Group, "\\."))) %>%&nbsp; pivot_wider(names_from = temp, values_from = Group) %>%&nbsp; select(ID, order(colnames(.)))数据:groups <- c("41.2","4","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1")x <- data.frame(ID = c(rep("samp_1", 4), rep("samp_2", 5)), Group = groups)结果:# A tibble: 2 x 6&nbsp; ID&nbsp; &nbsp; &nbsp;col_0 col_1 col_2 col_3&nbsp; &nbsp;col_4&nbsp; &nbsp;&nbsp;&nbsp; <chr>&nbsp; <chr> <chr> <chr> <chr>&nbsp; &nbsp;<chr>&nbsp; &nbsp;&nbsp;1 samp_1 4&nbsp; &nbsp; &nbsp;41.2&nbsp; 4.2.1 4.2.1.1 NA&nbsp; &nbsp; &nbsp; &nbsp;2 samp_2 1&nbsp; &nbsp; &nbsp;1.2&nbsp; &nbsp;1.2.1 1.2.1.2 1.2.1.2.1

慕虎7371278

你可以在 python 中试试这个:import pandas as pdimport numpy as npdf= pd.DataFrame({'ID':np.repeat(["samp_1","samp_2"],[4,5]),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'groups':["4","4.2","4.2.1","4.2.1.1", "1", "1.2", "1.2.1", "1.2.1.2","1.2.1.2.1"],})df['entry']=df.groupby(['ID']).cumcount()+1我们为每组提供一个数字,并将其添加为entry列。下面我们像在 R 中一样进行旋转,使用该列提供列名,最后我们重置索引:df.pivot(values='groups',columns='entry',index='ID').reset_index()entry&nbsp; &nbsp;ID&nbsp; 1&nbsp; &nbsp;2&nbsp; &nbsp;3&nbsp; &nbsp;4&nbsp; &nbsp;50&nbsp; &nbsp;samp_1&nbsp; 4&nbsp; &nbsp;4.2 4.2.1&nbsp; &nbsp;4.2.1.1 NaN1&nbsp; &nbsp;samp_2&nbsp; 1&nbsp; &nbsp;1.2 1.2.1&nbsp; &nbsp;1.2.1.2 1.2.1.2.1
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python