如何拆分具有多个选项的熊猫系列?

我有一个带有字符串列的熊猫数据框。我想要做的是将城市名称与字符串分开。


这是我的 MWE:


import numpy as np

import pandas as pd


data = """\

2930 Beverly Glen Circle Los Angeles

435 S. La Cienega Blvd. Los Angeles

12224 Ventura Blvd. Studio City

9570 Wilshire Blvd. Beverly Hills

26025 Pacific Coast Hwy. Malibu""".split('\n')


df = pd.DataFrame(data)

print(df)


cities = ['Los Angeles', 'Studio City', 'Beverly Hills','Malibu']


pat = '|'.join([r'(.*)\s({city})' for city in cities])

df = df[0].str.extract(pat,expand=True)

df


如何获得以下输出:


                                      0 addr                      city

0  2930 Beverly Glen Circle Los Angeles 2930 Beverly Glen Circle Los Angeles

1   435 S. La Cienega Blvd. Los Angeles 435 S. La Cienega Blvd.  Los Angeles

2       12224 Ventura Blvd. Studio City 12224 Ventura Blvd.      Studio City

3     9570 Wilshire Blvd. Beverly Hills 9570 Wilshire Blvd.      Beverly Hills

4       26025 Pacific Coast Hwy. Malibu 26025 Pacific Coast Hwy. Malibu


MYYA
浏览 77回答 2
2回答

慕雪6442864

您可以尝试使用Series.str.split:pat = '|'.join([rf'\s(?={city})' for city in cities])df1 = df[0].str.split(pat, expand=True).rename(columns={0: 'addr', 1: 'city'})df = pd.concat([df[0], df1], axis=1)或者,您可以使用Series.str.extract:pat = r'(?P<addr>.*)?\s' +&nbsp; r'(?P<city>' + '|'.join(cities) + r')'df = pd.concat([df[0], df[0].str.extract(pat, expand=True)], axis=1)结果:# print(df)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; addr&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;city0&nbsp; 2930 Beverly Glen Circle Los Angeles&nbsp; 2930 Beverly Glen Circle&nbsp; &nbsp; Los Angeles1&nbsp; &nbsp;435 S. La Cienega Blvd. Los Angeles&nbsp; &nbsp;435 S. La Cienega Blvd.&nbsp; &nbsp; Los Angeles2&nbsp; &nbsp; &nbsp; &nbsp;12224 Ventura Blvd. Studio City&nbsp; &nbsp; &nbsp; &nbsp;12224 Ventura Blvd.&nbsp; &nbsp; Studio City3&nbsp; &nbsp; &nbsp;9570 Wilshire Blvd. Beverly Hills&nbsp; &nbsp; &nbsp; &nbsp;9570 Wilshire Blvd.&nbsp; Beverly Hills4&nbsp; &nbsp; &nbsp; &nbsp;26025 Pacific Coast Hwy. Malibu&nbsp; 26025 Pacific Coast Hwy.&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Malibu

梵蒂冈之花

您应该将可选匹配项移动到一个捕获组中:import pandas as pddata = """\2930 Beverly Glen Circle Los Angeles435 S. La Cienega Blvd. Los Angeles12224 Ventura Blvd. Studio City9570 Wilshire Blvd. Beverly Hills26025 Pacific Coast Hwy. Malibu""".split('\n')df = pd.DataFrame(data)print(df)cities = ['Los Angeles', 'Studio City', 'Beverly Hills','Malibu']c = '|'.join(cities)pat = fr'(.*?)\s({c})'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# fixed pattern with f and rdf = df[0].str.extract(pat,expand=True)print(df)输出:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 10&nbsp; 2930 Beverly Glen Circle&nbsp; &nbsp; Los Angeles1&nbsp; &nbsp;435 S. La Cienega Blvd.&nbsp; &nbsp; Los Angeles2&nbsp; &nbsp; &nbsp; &nbsp;12224 Ventura Blvd.&nbsp; &nbsp; Studio City3&nbsp; &nbsp; &nbsp; &nbsp;9570 Wilshire Blvd.&nbsp; Beverly Hills4&nbsp; 26025 Pacific Coast Hwy.&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Malibu
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python