猿问

如何提取 pandas 数据框单元格中的字符串的一部分并创建一个包含该字符串的新列

我有一个数据框,其中一列包含一个很长的字符串,其中包含很多信息,我需要将这些信息分解为单独的列并将它们添加到数据框中。

我可以创建空列,但我不知道字符串是否可以提取元素或者是否可以将其分成列。

例如数据行

0    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs

所需输出

行号、伏特、Wfm、Sclk、图像、段

1 , 17 , BF27 , 100 , 1in24 , 24

数据

                                              Comments  Image

0    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs      0

1    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs      0

2    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs      0

3    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs      0

4    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs      0

..                                                 ...    ...

706  Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs      0

707  Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs      0

708  Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs      0

709  Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs      0

710  Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs      0

代码


import pandas as pd

import numpy as np


path = "/Users/.../Desktop/tk_gui_grid/"

file = "orig_data.txt"

filepath = path+file


df = pd.read_csv(filepath, sep='\t', lineterminator='\r')


com = df.loc[:,['Comments']]

dfLen = len(com)


image = [0]*dfLen

com['Image'] = image


print(com)


慕桂英3389331
浏览 161回答 2
2回答

神不在的星期二

这是使用正则表达式和命名捕获组的快速解决方案。正则表达式的优点split:有些人评论说不需要正则表达式,这是一个真实的说法。然而,从数据验证的角度来看,使用正则表达式有助于防止“杂散”数据悄悄进入。使用“盲”函数将split()数据分割为(一个字符);但如果源数据发生了变化怎么办?该split函数对此是盲目的。然而,使用正则表达式将有助于突出显示问题,因为模式根本不匹配。是的,您可能会收到一条错误消息 - 但这是一件好事,因为您会收到数据格式更改的警报,从而提供解决问题或更新正则表达式模式的机会。来源数据:模拟额外的行以进行演示。0&nbsp; &nbsp; Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs1&nbsp; &nbsp; Row 2 Ch475 Vi 17.1V BF27 Sclk 101ns 1in24 25segs2&nbsp; &nbsp; Row 3 Ch475 Vi 17.2V BF27 Sclk 102ns 1in24 26segs3&nbsp; &nbsp; Row 4 Ch475 Vi 17.3V BF27 Sclk 103ns 1in24 27segs4&nbsp; &nbsp; Row 5 Ch475 Vi 17.4V BF27 Sclk 104ns 1in24 28segs代码:import pandas as pdimport repath = './orig_data.txt'cols = ['rownumber', 'volts', 'wfm', 'sclk', 'image', 'segment']exp = re.compile(r'^\d+\s+Row\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r'(?P<rownumber>\d+).*\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r'(?P<volts>\d+\.\d+)V\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r'(?P<wfm>\w+)\sSclk\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r'(?P<sclk>\d+)ns\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r'(?P<image>\w+)\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r'(?P<segment>\d+)segs.*$')df = pd.read_csv(path, sep='|', header=None, names=['comment'])df[cols] = df['comment'].str.extract(exp, expand=True)输出:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;comment rownumber volts&nbsp; &nbsp;wfm&nbsp; \0&nbsp; 0&nbsp; &nbsp; Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in2...&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; 17.0&nbsp; BF27&nbsp; &nbsp;1&nbsp; 1&nbsp; &nbsp; Row 2 Ch475 Vi 17.1V BF27 Sclk 101ns 1in2...&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2&nbsp; 17.1&nbsp; BF27&nbsp; &nbsp;2&nbsp; 2&nbsp; &nbsp; Row 3 Ch475 Vi 17.2V BF27 Sclk 102ns 1in2...&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3&nbsp; 17.2&nbsp; BF27&nbsp; &nbsp;3&nbsp; 3&nbsp; &nbsp; Row 4 Ch475 Vi 17.3V BF27 Sclk 103ns 1in2...&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4&nbsp; 17.3&nbsp; BF27&nbsp; &nbsp;4&nbsp; 4&nbsp; &nbsp; Row 5 Ch475 Vi 17.4V BF27 Sclk 104ns 1in2...&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5&nbsp; 17.4&nbsp; BF27&nbsp; &nbsp;&nbsp; sclk&nbsp; image segment&nbsp;&nbsp;0&nbsp; 100&nbsp; 1in24&nbsp; &nbsp; &nbsp; 24&nbsp;&nbsp;1&nbsp; 101&nbsp; 1in24&nbsp; &nbsp; &nbsp; 25&nbsp;&nbsp;2&nbsp; 102&nbsp; 1in24&nbsp; &nbsp; &nbsp; 26&nbsp;&nbsp;3&nbsp; 103&nbsp; 1in24&nbsp; &nbsp; &nbsp; 27&nbsp;&nbsp;4&nbsp; 104&nbsp; 1in24&nbsp; &nbsp; &nbsp; 28

胡说叔叔

您需要将 Series obj 转换为字符串,然后将其拆分。之后您可以通过索引访问每个元素df['Comments'].str.split(' ')0&nbsp; &nbsp; [Row, 1, Ch475, Vi, 17.0V, BF27, Sclk, 100ns, ...df['Comments'].str.split(' ').str[0]Out[7]:&nbsp;0&nbsp; &nbsp; Rowdf['Comments'].str.split(' ').str[4]Out[8]:&nbsp;0&nbsp; &nbsp; 17.0V如果您了解如何访问拆分中的每一列,您可以将其分配给数据框中的新行,例如:df['RowNumber'] = df['Comments'].str.split(' ').str[1]df['Volts'] = df['Comments'].str.split(' ').str[4]
随时随地看视频慕课网APP

相关分类

Python
我要回答