解析dataframe中json类型格式的元素

我有这个带有大地水准面的数据框,看起来像这样

http://img3.mukewang.com/6435058b00013b1803860164.jpg

我想要做的是将每个 msaid 的每个大地水准面编号放入列表中。理想情况下,我希望有一个看起来像这样的数据框

http://img1.mukewang.com/643505940001f48901290097.jpg

我希望这是有道理的。任何帮助,将不胜感激。

这里有两个例子:

159 [{"geoid":"02020000101"},{"geoid":"02020000204"},{"geoid":"02020000300"},{"geoid":"02020000400"},{"geoid":"02020000500"},{"geoid":"02020000600"},{"geoid":"02020000802"},{"geoid":"02020000901"},{"geoid":"02020000902"},{"geoid":"02020001000"},{"geoid":"02020001500"},{"geoid":"02020001601"},{"geoid":"02020001602"},{"geoid":"02020001701"},{"geoid":"02020001802"},{"geoid":"02020001900"},{"geoid":"02020002000"},{"geoid":"02020002100"},{"geoid":"02020002201"},{"geoid":"02020002400"},{"geoid":"02020002501"},{"geoid":"02020002502"},{"geoid":"02020002601"},{"geoid":"02020002712"},{"geoid":"02020002811"},{"geoid":"02020002812"},{"geoid":"02020002813"},{"geoid":"02122000100"},{"geoid":"02122000300"},{"geoid":"02170001300"},{"geoid":"02170000300"},{"geoid":"02170001100"},{"geoid":"02170000800"},{"geoid":"02261000300"},{"geoid":"02290000400"},{"geoid":"02240000400"},{"geoid":"02170000102"},{"geoid":"02170000402"},{"geoid":"02170000101"},{"geoid":"02170001201"},{"geoid":"02170001001"},{"geoid":"02170000706"},{"geoid":"02170001202"},{"geoid":"02170001004"},{"geoid":"02170000705"},{"geoid":"02170000603"},{"geoid":"02020000102"},{"geoid":"02020000201"},{"geoid":"02020000202"},{"geoid":"02020000203"},{"geoid":"02020000701"},{"geoid":"02020000702"},{"geoid":"02020000703"},{"geoid":"02020000801"},{"geoid":"02020001100"},{"geoid":"02020001200"},


Smart猫小萌
浏览 133回答 2
2回答

翻过高山走不出你

我下载了该文件并将其作为 csv 文件保存在我的计算机中。然后我运行了以下代码。import pandas as pddf = pd.read_csv('parse_this.csv')#remove characters and convert to listdf.tracts = df.tracts.apply(lambda x: x.strip('][').split(','))#explode tracts seriesdf = df.explode('tracts')#resetting index and renaming columnsdf.reset_index(drop = True, inplace = True)df.rename(columns={"tracts": "geoid"} , inplace = True)#removing extra characters to keep only the geoid numberdf.geoid = df.geoid.apply(lambda x: x.strip('geoid{}:""'))df

江户川乱折腾

我希望这个例子有帮助:#creating a dataframe for example:d = [{'A':3,'B':[{'id':'001'},{'id':'002'}]},    {'A':4,'B':[{'id':'003'},{'id':'004'}]},    {'A':5,'B':[{'id':'005'},{'id':'006'}]},    {'A':6,'B':[{'id':'007'},{'id':'008'}]}]df = pd.DataFrame(d)df    A   B0   3   [{'id': '001'}, {'id': '002'}]1   4   [{'id': '003'}, {'id': '004'}]2   5   [{'id': '005'}, {'id': '006'}]3   6   [{'id': '007'}, {'id': '008'}]#apply an explode to the column B and reset indexdf = df.explode('B')df.reset_index(drop = True, inplace = True)df# now it looks like this    A    B0   3   {'id': '001'}1   3   {'id': '002'}2   4   {'id': '003'}3   4   {'id': '004'}4   5   {'id': '005'}5   5   {'id': '006'}6   6   {'id': '007'}7   6   {'id': '008'}# now we need to remove the extra text and rename the column from B to iddf.B = df.B.apply(lambda x: x['id'])df.rename(columns={"B": "id"} , inplace = True)# this is the final product:df    A   id0   3   0011   3   0022   4   0033   4   0044   5   0055   5   0066   6   0077   6   008
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python