明月笑刀无情
这将SeatBlock按空间划分,并给出各自的行。In [43]: dfOut[43]: CustNum CustomerName ItemQty Item Seatblocks ItemExt0 32363 McCartney, Paul 3 F04 2:218:10:4,6 601 31316 Lennon, John 25 F01 1:13:36:1,12 1:13:37:1,13 300In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()In [45]: s.index = s.index.droplevel(-1) # to line up with df's indexIn [46]: s.name = 'Seatblocks' # needs a name to joinIn [47]: sOut[47]: 0 2:218:10:4,61 1:13:36:1,121 1:13:37:1,13Name: Seatblocks, dtype: objectIn [48]: del df['Seatblocks']In [49]: df.join(s)Out[49]: CustNum CustomerName ItemQty Item ItemExt Seatblocks0 32363 McCartney, Paul 3 F04 60 2:218:10:4,61 31316 Lennon, John 25 F01 300 1:13:36:1,121 31316 Lennon, John 25 F01 300 1:13:37:1,13或者,在它自己的列中给每个冒号分隔的字符串:In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))Out[50]: CustNum CustomerName ItemQty Item ItemExt 0 1 2 30 32363 McCartney, Paul 3 F04 60 2 218 10 4,61 31316 Lennon, John 25 F01 300 1 13 36 1,121 31316 Lennon, John 25 F01 300 1 13 37 1,13这有点难看,但也许有人会提出一个更漂亮的解决方案。
繁星点点滴滴
import pandas as pdimport numpy as npdf = pd.DataFrame({'ItemQty': {0: 3, 1: 25}, 'Seatblocks': {0: '2:218:10:4,6', 1: '1:13:36:1,12 1:13:37:1,13'}, 'ItemExt': {0: 60, 1: 300}, 'CustomerName': {0: 'McCartney, Paul', 1: 'Lennon, John'}, 'CustNum': {0: 32363, 1: 31316}, 'Item': {0: 'F04', 1: 'F01'}}, columns=['CustNum','CustomerName','ItemQty','Item','Seatblocks','ItemExt'])print (df) CustNum CustomerName ItemQty Item Seatblocks ItemExt0 32363 McCartney, Paul 3 F04 2:218:10:4,6 601 31316 Lennon, John 25 F01 1:13:36:1,12 1:13:37:1,13 300另一种类似的链接解决方案是使用reset_index和rename:print (df.drop('Seatblocks', axis=1) .join ( df.Seatblocks .str .split(expand=True) .stack() .reset_index(drop=True, level=1) .rename('Seatblocks') )) CustNum CustomerName ItemQty Item ItemExt Seatblocks0 32363 McCartney, Paul 3 F04 60 2:218:10:4,61 31316 Lennon, John 25 F01 300 1:13:36:1,121 31316 Lennon, John 25 F01 300 1:13:37:1,13如果在列中是不 NaN值,最快的解决方案是使用list理解力DataFrame构造者:df = pd.DataFrame(['a b c']*100000, columns=['col'])In [141]: %timeit (pd.DataFrame(dict(zip(range(3), [df['col'].apply(lambda x : x.split(' ')[i]) for i in range(3)]))))1 loop, best of 3: 211 ms per loopIn [142]: %timeit (pd.DataFrame(df.col.str.split().tolist()))10 loops, best of 3: 87.8 ms per loopIn [143]: %timeit (pd.DataFrame(list(df.col.str.split())))10 loops, best of 3: 86.1 ms per loopIn [144]: %timeit (df.col.str.split(expand=True))10 loops, best of 3: 156 ms per loopIn [145]: %timeit (pd.DataFrame([ x.split() for x in df['col'].tolist()]))10 loops, best of 3: 54.1 ms per loop但是如果列包含NaN只起作用str.split带参数expand=True哪一回DataFrame (文献资料),它解释了为什么它更慢:df = pd.DataFrame(['a b c']*10, columns=['col'])df.loc[0] = np.nanprint (df.head()) col0 NaN1 a b c2 a b c3 a b c4 a b cprint (df.col.str.split(expand=True)) 0 1 20 NaN None None1 a b c2 a b c3 a b c4 a b c5 a b c6 a b c7 a b c8 a b c9 a b c