我正在尝试提取从urldata-src-mp3生成的属性的所有值(它们是链接)。content1
该链接包含在<a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>.
一种方法是使用正则表达式'data-src-mp3="(.*?)"'
import requests
session = requests.Session()
from bs4 import BeautifulSoup
import re
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'
r = session.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')
content1 = soup.select_one('.cB.cB-def.dictionary.biling').contents
output = re.findall('data-src-mp3="(.*?)"', str(content1))
print(output)
结果是
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']
我想问一下如何使用BeautifulSoup和结构<a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>来获得相同的结果而无需循环。
太感谢了!
BIG阳
相关分类