我正在尝试从网站上抓取评论并使用 Python (3.7) 和 BeautifulSoup 将它们存储到 csv 中。似乎抓取成功,但是当我写入文件时,只有一列包含完整数据,其余的只是第一个字符。
任何提示都将不胜感激,如果它很明显很抱歉 - 这是一个新的爱好:)
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
#URL to scrape
my_url = "https://www.indeed.com/cmp/Capital-One/reviews?fcountry=ALL&lang="
#open connection, grab page
uClient = uReq(my_url)
page_html = uClient
#html parsing
page_soup = soup(page_html, "lxml")
#grab all reviews on page
containers = page_soup.findAll("div",{"cmp-review-container"})
uClient.close()
#write to csv
filename = "indeedreviewtest.csv"
f=open(filename, "w")
headers = "review_id, review_score, role, review_text\n"
f.write(headers)
#loop through each review, collect review ID, rating, role & verbatum
for container in containers:
reviewid_container = container.div["data-tn-entityid"]
reviewid = reviewid_container[0]
score_container = container.div.div.div.meta["content"]
reviewscore = score_container[0]
role_container = container.find("span", attrs={"class":"cmp-reviewer- job-title"}).text
reviewerrole = role_container[0]
reviewtext_container = container.find("span", attrs={"class":"cmp-review-text"}).text
reviewtext = reviewtext_container
f.write(reviewid + "," + reviewscore + "," + reviewerrole.replace(",", "|") + "," + reviewtext.replace(",", "|") + "\n")
f.close()
谢谢!
相关分类