千巷猫影
该脚本将从表中获取所有信息,包括链接:import requestsimport pandas as pdfrom bs4 import BeautifulSoupurl = 'http://fcf.cat/calendari/1920/futbol-11/infantil-primera-divisio/grup-1'soup = BeautifulSoup(requests.get(url).content, 'html.parser')out = []for tr in soup.select('.calendaritable tbody tr'): t = tr.get_text(strip=True, separator='|').split('|') a = tr.select_one('.calendaritablehover a') jornada = tr.find_previous('th', {'colspan': '4'}).get_text(strip=True) jornada_date = tr.find_previous('th', {'colspan': '3'}).get_text(strip=True) if len(t) == 4: # there's result out.append({'Team 1': t[0], 'Team 2': t[3], 'Goals 1': t[1], 'Goals 2': t[2], 'Jornada': jornada, 'Jornada date': jornada_date, 'Link': a['href'] if a else ''}) else: out.append({'Team 1': t[0], 'Team 2': t[2], 'Goals 1': '', 'Goals 2': '', 'Jornada': jornada, 'Jornada date': jornada_date, 'Link': a['href'] if a else ''})df = pd.DataFrame(out)print(df)df.to_csv('data.csv')印刷: Team 1 Team 2 Goals 1 Goals 2 Jornada Jornada date Link0 CATALONIA, U.B.,A EUROPA, C.E.,B 0 1 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...1 FUNDACIO P. CE. JUPITER,A ESCOLA PIA SARRIÀ S.E.,A 1 0 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...2 Pª BARC. CINC COPES,A MONTAÑESA, C.F.,A 1 0 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...3 Pª BARC. BARCINO, CE,A Pª BARC. ANGUERA,B 0 3 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...4 DIAGONAL CLUB ESP.,A SISTRELLS C.F.,A 8 2 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr..... ... ... ... ... ... ... ...235 DIAGONAL CLUB ESP.,A Pª BARC. ANGUERA,B Jornada 30 31-05-2020 236 MONTAÑESA, C.F.,A Pª BARC. BARCINO, CE,A Jornada 30 31-05-2020 237 Pª BARC. CINC COPES,A ESCOLA PIA SARRIÀ S.E.,A Jornada 30 31-05-2020 238 FUNDACIO P. CE. JUPITER,A EUROPA, C.E.,B Jornada 30 31-05-2020 239 L'HOSPITALET, CENTRE ESPORTS,B CATALONIA, U.B.,A Jornada 30 31-05-2020 [240 rows x 7 columns]