我试图从网页中提取与某些关键字匹配的职位描述并且这有效,但是我也想提取与 HTML 中找到的描述相对应的链接。问题是链接出现在描述的关键字之前,并且 URL 不包含要搜索的关键字。如何提取与通过关键字找到的职位描述相匹配的链接?
这是我的代码:
import re, requests, time, os, csv, subprocess
from bs4 import BeautifulSoup
def get_jobs(url):
keywords = ["KI", "AI", "Big Data", "Data", "data", "big data", "Analytics", "analytics", "digitalisierung", "ML",
"Machine Learning", "Daten", "Datenexperte", "Datensicherheitsexperte"]
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}
html = requests.get(url, headers=headers, timeout=5)
time.sleep(2)
soup = BeautifulSoup(html.text, 'html.parser')
jobs = soup.find_all('p',text=re.compile(r'\b(?:%s)\b' % '|'.join(keywords)))
# links = jobs.find_all('a')
jobs_found = []
for word in jobs:
jobs_found.append(word)
with open("jobs.csv", 'a', encoding='utf-8') as toWrite:
writer = csv.writer(toWrite)
writer.writerows(jobs_found)
# subprocess.call('./Autopilot3.py')
print("Matched Jobs have been collected.")
get_jobs('https://www.auftrag.at//tenders.aspx')
炎炎设计
相关分类