改进代码 - 网络抓取工作机会 - 职位、雇主、薪水、需要链接

我写了一个网络抓取代码,扫描工作门户中的所有页面,并在功能中报告满足薪水要求的工作机会。对我来说重要的字段是职位、雇主、薪水和链接。我现在使用的是 getText() 方法,但需要所有元素。结果看起来像:


Zubný lekár/lekárka DENTAL CARE Dr. Rosa, s. r. o.Námestie sv. Františka, Karlova Ves



        Od 4 500 EUR/mesiac

    


Pridané Pred 4 dňami  Pridať k vybraným   

https://www.profesia.sk/praca/dental-care-dr-rosa/O3863429

https://www.profesia.sk/praca/dental-care-dr-rosa/O3863429



Head of Core Technology DevelopmentESET, spol. s r.o.Bratislava



        4 500 EUR/mesiac

    


Pridané pred 2 týždňami  Pridať k vybraným   

https://www.profesia.sk/praca/eset/C22141

https://www.profesia.sk/praca/eset/O3933805

https://www.profesia.sk/praca/eset/O3933805

它需要两个不必要的项目并复制链接(因为 <a 'href' 中有 2 到 3 个链接)有更好的主意吗?


def search4job(salary):

    import bs4, requests, re

    #Classes -> employer: class='employer'>

    # -> salary ".label"

    # -> Job Title class='title'

    # -> TODO: link 

    base_url= 'https://www.profesia.sk/praca/bratislava/plny-uvazok/?languages=73&page_num={}'

    page = 1 #to start from page1

    request = requests.get(base_url.format(page)) #to take complete url

    HTML = bs4.BeautifulSoup(request.text,'lxml') 

    pattern = r'(\d\s\d\d\d)' #salary pattern


    while len(HTML.select(".list-row"))>0: 

        #in pages without job offer the len of list-row is 0, iterates until there are no job offers

       

繁星淼淼
浏览 125回答 0
0回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python