我最近尝试刮过一个http://quotes.toscrape.com/引号(仅在第一页上)并将其保存到一个csv文件中。我得到了一个很奇怪的结果。仅逗号用作分隔符。请参见下面的屏幕截图和代码:
from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv
csvfile = open('quotes.csv', 'w')
writer = csv.writer(csvfile)
writer.writerow(('text'))
def parse():
html = urlopen('http://quotes.toscrape.com/page/1/')
bs = BeautifulSoup(html, 'lxml')
quotes = bs.findAll('div', class_='quote')
for quote in quotes:
try:
text = quote.find('span', class_='text').getText(
).replace(',', '|').replace('"', '')
print(text)
writer.writerow((text))
except UnicodeEncodeError:
break
parse()
csvfile.close()
jeck猫
相关分类