根据列表中的项目拆分DataFrame中的列

我有下表,想将每一行分成三列:州、邮政编码和城市。州和邮政编码很简单,但我无法提取城市。我想过在街道同义词之后和状态之前拆分每个字符串,但我似乎弄错了循环,因为它只会使用我列表中的最后一项。


输入数据:


    Address Text

0   11 North Warren Circle Lisbon Falls ME 04252

1   227 Cony Street Augusta ME 04330

2   70 Buckner Drive Battle Creek MI

3   718 Perry Street Big Rapids MI

4   14857 Martinsville Road Van Buren MI

5   823 Woodlawn Ave Dallas TX 75208

6   2525 Washington Avenue Waco TX 76710

7   123 South Main St Dallas TX 75201

我试图实现的输出(对于所有行,但我只写了前两个以节省时间)


    City          State    Postcode 

0   Lisbon Falls  ME       04252

1   Augusta       ME       04330

我的代码:


# Extract postcode and state

df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand = True)

df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand = True)


# Split after these substrings

street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]


# This is where I got stuck

df["Syn"] = df["Address Text"].apply(lambda x: x.split(syn))

df


动漫人物
浏览 110回答 1
1回答

30秒到达战场

这是一种方法:import pandas as pd# datadf = pd.DataFrame(    ['11 North Warren Circle Lisbon Falls ME 04252',     '227 Cony Street Augusta ME 04330',     '70 Buckner Drive Battle Creek MI',     '718 Perry Street Big Rapids MI',     '14857 Martinsville Road Van Buren MI',     '823 Woodlawn Ave Dallas TX 75208',     '2525 Washington Avenue Waco TX 76710',     '123 South Main St Dallas TX 75201'],    columns=['Address Text'])# Extract postcode and statedf["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand=True)df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand=True)# Split after these substringsstreet_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]def find_city(address, state, street_synonyms):    for syn in street_synonyms:        if syn in address:            # remove street            city = address.split(syn)[-1]            # remove State and postcode            city = city.split(state)[0]            return citydf['City'] = df.apply(lambda x: find_city(x['Address Text'], x['State'], street_synonyms), axis=1)print(df[['City', 'State', 'Zip']])"""             City State    Zip0   Lisbon Falls     ME  042521        Augusta     ME  043302   Battle Creek     MI    NaN3     Big Rapids     MI    NaN4      Van Buren     MI  148575         Dallas     TX  752086       nue Waco     TX  767107         Dallas     TX  75201"""
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python