如何将包含许多注释行的数据文本文件加载到 pandas 中？

问题是该文件有 77 行注释文本，例如'Global Average Temperature Anomaly with Sea Ice Temperature Inferred from Air Temperatures'其中两行是标题有一堆数据，然后还有两个标头，以及一组新数据'Global Average Temperature Anomaly with Sea Ice Temperature Inferred from Water Temperatures'该解决方案将文件中的两个表分成单独的数据帧。这不像其他答案那么好，但数据被正确地分成不同的数据帧。标题很痛苦，手动创建自定义标题并跳过将标题与文本分开的代码行可能会更容易。重要的一点是air与ice数据分离。import requestsimport pandas as pdimport math# read the file with requestsurl = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'response = requests.get(url)data = response.text# convert data into a listdata = [d.strip().replace('% ', '') for d in data.split('\n')]# specify the data from the ranges in the fileair_header1 = data[74].split()  # not usedair_header2 = [v.strip() for v in data[75].split(',')]# combine the 2 parts of the header into a single headerair_header = air_header2[:2] + [f'{air_header1[math.floor(i/2)]}_{v}' for i, v in enumerate(air_header2[2:])]air_data = [v.split() for v in data[77:2125]]h2o_header1 = data[2129].split()  # not usedh2o_header2 = [v.strip() for v in data[2130].split(',')]# combine the 2 parts of the header into a single headerh2o_header = h2o_header2[:2] + [f'{h2o_header1[math.floor(i/2)]}_{v}' for i, v in enumerate(h2o_header2[2:])]h2o_data = [v.split() for v in data[2132:4180]]# create the dataframesair = pd.DataFrame(air_data, columns=air_header)h2o = pd.DataFrame(h2o_data, columns=h2o_header)没有标题代码通过使用手动标头列表来简化代码。import pandas as pdimport requests# read the file with requestsurl = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'response = requests.get(url)data = response.text# convert data into a listdata = [d.strip().replace('% ', '') for d in data.split('\n')]# manually created headerheaders = ['Year', 'Month', 'Monthly_Anomaly', 'Monthly_Unc.',           'Annual_Anomaly', 'Annual_Unc.',           'Five-year_Anomaly', 'Five-year_Unc.',           'Ten-year_Anomaly', 'Ten-year_Unc.',           'Twenty-year_Anomaly', 'Twenty-year_Unc.']# separate the air and h2o dataair_data = [v.split() for v in data[77:2125]]h2o_data = [v.split() for v in data[2132:4180]]# create the dataframesair = pd.DataFrame(air_data, columns=headers)h2o = pd.DataFrame(h2o_data, columns=headers)air   Year Month Monthly_Anomaly Monthly_Unc. Annual_Anomaly Annual_Unc. Five-year_Anomaly Five-year_Unc. Ten-year_Anomaly Ten-year_Unc. Twenty-year_Anomaly Twenty-year_Unc.0  1850     1          -0.777        0.412            NaN         NaN               NaN            NaN              NaN           NaN                 NaN              NaN1  1850     2          -0.239        0.458            NaN         NaN               NaN            NaN              NaN           NaN                 NaN              NaN2  1850     3          -0.426        0.447            NaN         NaN               NaN            NaN              NaN           NaN                 NaN              NaNh2o   Year Month Monthly_Anomaly Monthly_Unc. Annual_Anomaly Annual_Unc. Five-year_Anomaly Five-year_Unc. Ten-year_Anomaly Ten-year_Unc. Twenty-year_Anomaly Twenty-year_Unc.0  1850     1          -0.724        0.370            NaN         NaN               NaN            NaN              NaN           NaN                 NaN              NaN1  1850     2          -0.221        0.430            NaN         NaN               NaN            NaN              NaN           NaN                 NaN              NaN2  1850     3          -0.443        0.419            NaN         NaN               NaN            NaN              NaN           NaN                 NaN              NaN

如何将包含许多注释行的数据文本文件加载到 pandas 中？

2回答