猿问

HTML表到适当的Excel表Python

我是Python的新手,并且正在努力将网络抓取数据打印到漂亮的Excel表格中。这是我试图在Python中抓取和复制的表格:HTML Table。


以下是HTML页面的外观:


</div>

    <section id="first" style="display:none" aria-label="Power situation graph section">

        <div class="gridModule-2up">

            <div class="prognos_controls hidden" data-proggraph="1">

                Show data for:

                <button value="1" onclick="this.blur();" type="button" class="btn  btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Yesterday</button>

                <button value="2" onclick="this.blur();" type="button" class="btn  btn--tertiary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Today</button>

                <button value="3" onclick="this.blur();" type="button" class="btn  btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Tomorrow</button>

            </div>

            <table summary="Consumption" id="prognos_datatable_total" class="prognos_datatable scrollable">

                <thead>

                    <tr>

                                <th data-sheets-numberformat="[null,1]"></th>

                                <th data-sheets-value="[null,2,'17/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-17</th>

                                <th data-sheets-numberformat="[null,1]"></th>

                                <th data-sheets-value="[null,2,'18/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-18</th>

                                <th data-sheets-numberformat="[null,1]"></th>

                                <th data-sheets-value="[null,2,'19/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-19</th>


                    </tr>


慕容森
浏览 102回答 2
2回答

呼如林

问题出在转义字符上。from bs4 import BeautifulSoupwith open("sample.html", "r") as f:&nbsp; &nbsp; contents = f.read()&nbsp; &nbsp; soup = BeautifulSoup(contents, 'lxml')&nbsp; &nbsp; extract = soup.find("table")&nbsp; &nbsp; # added strip() to remove leading and trailing characters&nbsp; &nbsp; table = [[item.text.strip() for item in row_data.select("th,td")]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for row_data in extract.select("tr")]&nbsp; &nbsp; for item in table:&nbsp; &nbsp; &nbsp; &nbsp; print(' '.join(item))

烙印99

尝试在这里与熊猫一起去。它在引擎盖下使用美丽的soup。我无法在您的URL上进行测试,因为您没有提供。import pandas as pdurl = 'myURLlink'df = pd.read_html(url)[1]df.to_csv('file.csv', index=False)print (df.to_string())
随时随地看视频慕课网APP

相关分类

Python
我要回答