Scrapy:如何使用 CSS 和 XPath 获取地址?

如何使用 CSS 和 XPath 获取地址?我尝试使用 CSS:response.css('.office-address::text').extract()


<span class="office-address" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">

                <span itemprop="streetAddress">

                    <span class="address-line1">5835 Post Rd.</span>


                        <span class="address-line2">Suite 217</span>

                </span>

                <span class="city-state-zip">

                    <span itemprop="addressLocality">East Greenwich</span>, <span itemprop="addressRegion">RI</span> <span itemprop="postalCode">02818</span>

                </span>

            </span>


蛊毒传说
浏览 117回答 4
4回答

慕无忌1623718

带有scrapy的CSS选择器选项:address = response.css("span.address-line1::text, span.address-line2::text, span[itemprop=addressLocality]::text, span[itemprop=addressRegion]::text, span[itemprop=postalCode]::text").extract() # should return listif address:&nbsp; &nbsp; address = ", ".

jeck猫

使用单行 XPath 的肮脏解决方案:concat(//span[@class='address-line1']/text(),'&nbsp;',//span[@class='address-line2']/text(),'&nbsp;',//span[@itemprop='addressLocality']/text(),',&nbsp;',//span[@itemprop='addressRegion']/text(),//span[@itemprop='postalCode']/text())输出 :"5835&nbsp;Post&nbsp;Rd.&nbsp;Suite&nbsp;217&nbsp;East&nbsp;Greenwich,&nbsp;RI02818"

PIPIONE

试试这个response.css('.office-address ::text').extract()之前添加的空间::text

郎朗坤

这是面向未来的想法,因为 ids/classes 可以在此期间发生变化:from re import subfrom bs4 import BeautifulSoup as bsteststr = """<span class="office-address" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span itemprop="streetAddress">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="address-line1">5835 Post Rd.</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="address-line2">Suite 217</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="city-state-zip">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span itemprop="addressLocality">East Greenwich</span>, <span itemprop="addressRegion">RI</span> <span itemprop="postalCode">02818</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span>"""r = bs(teststr,"lxml").getText().strip()r = sub( r"\n", ", ", r)r = sub( r"[, ]{2,}", ", ", r)print ( r )&nbsp;结果:5835 Post Rd., Suite 217, East Greenwich, RI 02818
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python