采纳了建议,我能够通过最初的错误,到目前为止非常感谢你们 :) 我快到了我想去的地方。似乎在缩进方面我仍然存在巨大的知识差距。你们真的是编码社区的瑰宝,到目前为止非常感谢你们:)
Here is the current code that has passed those errors and its down to a warning, and not extracting anything.
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://dc.urbanturf.com/pipeline'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
pipeline_items = soup.find_all('div', attrs={'class': 'pipeline-item'})
rows = []
columns = ['Listing Title', 'Listing url', 'listing image url', 'location', 'Project type', 'Status', 'Size']
for item in pipeline_items:
# title, image url, listing url
listing_title = item.a['title']
listing_url = item.a['href']
listing_image_url = item.a.img['src']
for p_tag in item.find_all('p'):
if not p_tag.h2:
if p_tag.text == 'Location:':
p_tag.span.extract()
property_location = p_tag.text.strip()
elif p_tag.span.text == 'Project type:':
p_tag.span.extract()
property_type = p_tag.text.strip()
elif p_tag.span.text == 'Status:':
p_tag.span.extract()
property_status = p_tag.text.strip()
elif p_tag.span.text == 'Size:':
p_tag.span.extract()
property_size = p_tag.text.strip()
row = [listing_title, listing_url, listing_image_url, property_location, property_type, property_status, property_size]
rows.append(row)
df = pd.Dataframe(rows, columns=columns)
df.to_excel('DC Pipeline Properties.xlsx', index=False)
print('File Saved')
我得到的错误是以下我使用 pycharm 2020.2 也许它是一个糟糕的选择?
row = [listing_title, listing_url, listing_image_url, property_location, property_type, property_status, property_size] NameError: name 'property_location' 未定义
尚方宝剑之说
慕容3067478
哔哔one
qq_花开花谢_0
相关分类