问题
我想将pandas数据框的一列分为2列,在percent列中(请参见下文),每个条目都以大写字母字符开头,我想在此字母后立即将'Percentage'列拆分为新列标记为“氨基酸”。
当前代码:
import pandas as pd
df = pd.read_csv('foo.csv')
df['Amino Acid'], df['Percentage'] = zip(*df['Percentage'].map(lambda x: x.split('[^a-zA-Z]')))
df.to_csv('bar.csv',index=False)
输入数据示例
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
| Species | ID | OGT | DB | Percentage |
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | E is 8.333003365670164% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | R is 6.310991522830762% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | A is 10.22668778459711% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
四季花海
梦里花落0921
相关分类