如何使用 BeautifulSoup 获得与正则表达式相同的结果?

我正在尝试提取从urldata-src-mp3生成的属性的所有值(它们是链接)。content1


该链接包含在<a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>.


一种方法是使用正则表达式'data-src-mp3="(.*?)"'


import requests

session = requests.Session()

from bs4 import BeautifulSoup

import re


headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}

url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'

r = session.get(url, headers = headers)           

soup = BeautifulSoup(r.content, 'html.parser')


content1 = soup.select_one('.cB.cB-def.dictionary.biling').contents

output = re.findall('data-src-mp3="(.*?)"', str(content1))


print(output)

结果是


['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']

我想问一下如何使用BeautifulSoup和结构<a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>来获得相同的结果而无需循环。


太感谢了!


慕尼黑5688855
浏览 95回答 1
1回答

BIG阳

您可以在使用时组合选择器.select:mp3s = [tag.attrs['data-src-mp3'] for tag in soup.select('.cB.cB-def.dictionary.biling [data-src-mp3]')]或者mp3s = list(map(lambda tag: tag.attrs['data-src-mp3'],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; soup.select('.cB.cB-def.dictionary.biling [data-src-mp3]')))[data-src-mp3]仅选择具有data-src-mp3属性(具有任何值)的元素。'data-src-mp3'在一个地方做一个小改动:mp3_tag = 'data-src-mp3'mp3s = list(map(lambda tag: tag.attrs[mp3_tag],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; soup.select('.cB.cB-def.dictionary.biling [{}]'.format(mp3_tag))))这个解决方案乍一看可能更吓人,但比依赖错误的工具(例如解析 HTML 时的正则表达式)要好得多。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python