智慧大石
一个str.extract电话将在这里工作:p = r'^(?P<Last_Name>.*), (?P<First_Name>\S+)\b\s*(?P<Middle_Name>.*)' u = df.loc[df.Type == "I", 'Complete_Name'].str.extract(p)pd.concat([df, u], axis=1).fillna('') ID Complete_Name Type Last_Name First_Name Middle_Name0 1 JERRY, Ben I JERRY Ben 1 2 VON HELSINKI, Olga I VON HELSINKI Olga 2 3 JENSEN, James Goodboy Dean I JENSEN James Goodboy Dean3 4 THE COMPANY C 4 5 CRUZ, Juan S. de la I CRUZ Juan S. de la正则表达式分解^ # Start-of-line(?P<Last_Name> # First named capture group - Last Name .* # Match anything until...), # ...we see a comma\s # whitespace (?P<First_Name> # Second capture group - First Name \S+ # Match all non-whitespace characters)\b # Word boundary \s* # Optional whitespace chars (mostly housekeeping) (?P<Middle_Name> # Third capture group - Zero of more middle names .* # Match everything till the end of string)
有只小跳蛙
我认为你可以这样做:# take the complete_name column and split it multiple timesdf2 = (df.loc[df['Type'].eq('I'),'Complete_Name'].str .split(',', expand=True) .fillna(''))# remove extra spaces for x in df2.columns: df2[x] = [x.strip() for x in df2[x]]# split the name on first space and join itdf2 = pd.concat([df2[0],df2[1].str.split(' ',1, expand=True)], axis=1)df2.columns = ['last','first','middle']# join the data framesdf = pd.concat([df[['ID','Complete_Name']], df2], axis=1)# rearrange columns - not necessary thoughdf = df[['ID','Complete_Name','first','middle','last']]# remove none valuesdf = df.replace([None], '') ID Complete_Name Type first middle last0 1 JERRY, Ben I Ben JERRY1 2 VON HELSINKI, Olga I Olga VON HELSINKI2 3 JENSEN, James Goodboy Dean I James Goodboy Dean JENSEN3 4 THE COMPANY C 4 5 CRUZ, Juan S. de la I Juan S. de la CRUZ
MM们
这是使用一些简单的 lambda 功能的另一个答案。import numpy as npimport pandas as pd""" Create data and data frame """info_dict = { 'ID': [1,2,3,4,5,], 'Complete_Name':[ 'JERRY, Ben', 'VON HELSINKI, Olga', 'JENSEN, James Goodboy Dean', 'THE COMPANY', 'CRUZ, Juan S. de la', ], 'Type':['I','I','I','C','I',], }data = pd.DataFrame(info_dict, columns = info_dict.keys())""" List of columns to add """name_cols = [ 'First Name', 'Middle Name', 'Last Name', ]"""Use partition() to separate first and middle names into Pandas series.Note: data[data['Type'] == 'I']['Complete_Name'] will allow us to target only thevalues that we want."""NO_LAST_NAMES = data[data['Type'] == 'I']['Complete_Name'].apply(lambda x: str(x).partition(',')[2].strip())LAST_NAMES = data[data['Type'] == 'I']['Complete_Name'].apply(lambda x: str(x).partition(',')[0].strip())# We can use index positions to quickly add columns to the dataframe.# The partition() function will keep the delimited value in the 1 index, so we'll use# the 0 and 2 index positions for first and middle names.data[name_cols[0]] = NO_LAST_NAMES.str.partition(' ')[0]data[name_cols[1]] = NO_LAST_NAMES.str.partition(' ')[2]# Finally, we'll add our Last Names columndata[name_cols[2]] = LAST_NAMES# Optional: We can replace all blank values with numpy.NaN values using regular expressions.data = data.replace(r'^$', np.NaN, regex=True)然后你应该得到这样的结果: ID Complete_Name Type First Name Middle Name Last Name0 1 JERRY, Ben I Ben NaN JERRY1 2 VON HELSINKI, Olga I Olga NaN VON HELSINKI2 3 JENSEN, James Goodboy Dean I James Goodboy Dean JENSEN3 4 THE COMPANY C NaN NaN NaN4 5 CRUZ, Juan S. de la I Juan S. de la CRUZ或者,用空字符串替换 NaN 值:data = data.replace(np.NaN, r'', regex=False)然后你有: ID Complete_Name Type First Name Middle Name Last Name0 1 JERRY, Ben I Ben JERRY1 2 VON HELSINKI, Olga I Olga VON HELSINKI2 3 JENSEN, James Goodboy Dean I James Goodboy Dean JENSEN3 4 THE COMPANY C 4 5 CRUZ, Juan S. de la I Juan S. de la CRUZ