通过分隔符分割 pandas 列,行中具有两种不同的大小

我试图通过空格分隔符将 pandas 数据框中的一列分成多个列。我意识到有些行有一个日期字段,因此与没有日期字段的行相比,它需要额外的列。这是列值的示例,


DA Firstname Lastname 09/30/2020 07:44 AM 9/23/2020 6:06:38 PM

JW Firstname Lastname 10/25/2020 11:06 AM None

第一行不适合空格分隔符,因为有 8 个空格。第二行适用于我的数据集,因为有 6 个空格。有没有办法将日期组合在一起作为分隔符?


["Inital" "Firstname" "lastname" "date/time1" "date/time2"] 其中“date/time2”列还可以包含“None”


我尝试使用的代码是,


dataset= pd.read_csv("newOutput6",encoding = "ISO-8859-1", delimiter="\t", names = ['Name','Date'], index=False)

tmpDF = pd.DataFrame(columns=['Initals','FName','LName','SignupTime','Waiver'])

tmpDF[['Initals','FName','LName','SignupTime','Waiver']] = dataset['Name'].str.split(' ', expand=True)

索引 16 是不遵循传统格式的行,我怀疑需要正则表达式来确定这一点。



呼如林
浏览 90回答 1
1回答

萧十郎

如果名字和姓氏中没有空格(否则如何区分它们):pattern = ('^(?P<Initials>\w+)\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<FName>\w+)\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<LName>\w+)\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<SignupTime>\d+/\d+/\d+ \d+:\d+ \w+)\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<Waiver>.*)')df['name'].str.extract(pattern)输出:&nbsp; Initials&nbsp; &nbsp; &nbsp; FName&nbsp; &nbsp; &nbsp;LName&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;SignupTime&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Waiver0&nbsp; &nbsp; &nbsp; &nbsp;DA&nbsp; Firstname&nbsp; Lastname&nbsp; 09/30/2020 07:44 AM&nbsp; 9/23/2020 6:06:38 PM1&nbsp; &nbsp; &nbsp; &nbsp;JW&nbsp; Firstname&nbsp; Lastname&nbsp; 10/25/2020 11:06 AM&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; None更新:对于可选的缩写,您可以尝试以下模式:pattern = ('^(?P<Initials>\w+\s)?'&nbsp; &nbsp; # make initial optional&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<FName>\w+)\s+'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<LName>\w+)\s+'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<SignupTime>\d+/\d+/\d+ \d+:\d+ \w+)\s'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ '(?P<Waiver>.*)')请注意,现在如果Initials存在,将会有一个尾随空格,您可以轻松处理。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python