将网页抓取的表格导出到 Excel

我无法让 Pandas 以我想要的格式导出一些网络抓取数据。

我想访问其中的每个 URLURLs并从该页面获取各种元素，并将它们放入具有指定列名的 Excel 电子表格中。然后我想访问下一个 URLURLs并将这些数据放在 Excel 工作表的下一行，这样我就有一个包含 6 列和三行数据的 Excel 工作表，每个植物一个（每个植物在一个单独的 URL 中） .

目前我有一个错误，说ValueError: Length mismatch: Expected axis has 18 elements, new values have 6 elements新记录被水平放置在一起，而不是放在 Excel 中的新行上，而 Pandas 没有预料到这一点。

有人可以帮忙吗？谢谢

import csv

import pandas as pd

from pandas import ExcelWriter

from pandas import ExcelFile

import numpy as np

from urllib2 import urlopen

import bs4

from bs4 import BeautifulSoup

URLs = ["http://adbioresources.org/map/ajax-single/27881",

"http://adbioresources.org/map/ajax-single/27967",

"http://adbioresources.org/map/ajax-single/27880"]

mylist = []

for plant in URLs:

soup = BeautifulSoup(urlopen(plant),'lxml')

table = soup.find_all('td')

for td in table:

mylist.append(td.text)

heading2 = soup.find_all('h2')

for h2 in heading2:

mylist.append(h2.text)

para = soup.find_all('p')

for p in para:

mylist.append(p.text)

df = pd.DataFrame(mylist)

transposed_df = df.T

transposed_df.columns =

['Status','Type','Capacity','Feedstock','Address1','Address2']

writer = ExcelWriter('Pandas-Example.xlsx')

transposed_df.to_excel(writer,'Sheet1',index=False)

writer.save()

呼唤远方

浏览 316回答 2

将网页抓取的表格导出到 Excel

2回答