如何使用 pandas 从文件中提取 html 表？

import pandas as pd# list to save all dataframe from all tables in all filesdf_list = list()# list of files to loadlist_of_files = ['test.html']# iterate through your filesfor file in list_of_files:        # create a list of dataframes from the tables in the file    dfl = pd.read_html(file, match='Game Name')        # fix the headers and columns    for d in dfl:        # select row 1 as the headers        d.columns = d.iloc[1]        # select row 0, column 0 as the platform        d['platform'] = d.iloc[0, 0]        # selection row 2 and below as the data, row 0 and 1 were the headers        d = d.iloc[2:]        # append the cleaned dataframe to df_list        df_list.append(d.copy())        # create a single dataframedf = pd.concat(df_list).reset_index(drop=True)# create a list of dicts from dfrecords = df.to_dict('records')print(records)[out]:[{'Game Name': 'GoW', 'Price': '49.99', 'platform': 'PS4'}, {'Game Name': 'FF VII R', 'Price': '59.99', 'platform': 'PS4'}, {'Game Name': 'Gears 5', 'Price': '49.99', 'platform': 'XBX'}, {'Game Name': 'Forza 5', 'Price': '59.99', 'platform': 'XBX'}]

如何使用 pandas 从文件中提取 html 表？

1回答