我有下表,想将每一行分成三列:州、邮政编码和城市。州和邮政编码很简单,但我无法提取城市。我想过在街道同义词之后和状态之前拆分每个字符串,但我似乎弄错了循环,因为它只会使用我列表中的最后一项。
输入数据:
Address Text
0 11 North Warren Circle Lisbon Falls ME 04252
1 227 Cony Street Augusta ME 04330
2 70 Buckner Drive Battle Creek MI
3 718 Perry Street Big Rapids MI
4 14857 Martinsville Road Van Buren MI
5 823 Woodlawn Ave Dallas TX 75208
6 2525 Washington Avenue Waco TX 76710
7 123 South Main St Dallas TX 75201
我试图实现的输出(对于所有行,但我只写了前两个以节省时间)
City State Postcode
0 Lisbon Falls ME 04252
1 Augusta ME 04330
我的代码:
# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand = True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand = True)
# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]
# This is where I got stuck
df["Syn"] = df["Address Text"].apply(lambda x: x.split(syn))
df
30秒到达战场
相关分类