我的目标是在每次迭代中将多个熊猫数据帧连接成单个数据帧。我正在抓取一个表并用它创建数据帧。下面是注释的代码。
def visit_table_links():
links = grab_initial_links()
df_final = None
for obi in links:
resp = requests.get(obi[1])
tree = html.fromstring(resp.content)
dflist = []
for attr in tree.xpath('//th[contains(normalize-space(text()), "sometext")]/ancestor::table/tbody/tr'):
population = attr.xpath('normalize-space(string(.//td[2]))')
try:
population = population.replace(',', '')
population = int(population)
year = attr.xpath('normalize-space(string(.//td[1]))')
year = re.findall(r'\d+', year)
year = ''.join(year)
year = int(year)
#appending a to a list, 3 values first two integer last is string
dflist.append([year, population, obi[0]])
except Exception as e:
pass
#creating a dataframe which works fine
df = pd.DataFrame(dflist, columns = ['Year', 'Population', 'Municipality'])
#first time df_final is none so just make first df = df_final
#next time df_final is previous dataframe so concat with the new one
if df_final != None:
df_final = pd.concat(df_final, df)
else:
df_final = df
visit_table_links()
这是即将到来的数据帧
第一个数据帧
Year Population Municipality
0 1970 10193 Cape Coral
1 1980 32103 Cape Coral
2 1990 74991 Cape Coral
3 2000 102286 Cape Coral
4 2010 154305 Cape Coral
5 2018 189343 Cape Coral
我已经搜索了很多线程并耗尽了我的资源,我是熊猫的新手,不明白为什么会发生这种情况,
首先,我认为这是因为重复的索引,然后我使用相同的错误将 uuid.uuid4.int()作为索引。df.set_index('ID', drop=True, inplace=True)
任何指导都将非常有帮助,谢谢。
编辑: 1
很抱歉没有明确错误是从
df_final = pd.concat(df_final, df)
当我尝试将当前数据帧与以前的数据帧连接时
编辑 2:
将参数作为列表传递
df_final = pd.concat([df_final, df])
仍然相同的错误
不负相思意
海绵宝宝撒
相关分类