Scrapy json 输出文件添加了不必要的方括号

首页课程实战体系课手记专栏慕课教程

Scrapy 正在输出有缺陷的 json 文件。当我尝试使用上述 json 文件时，

import json

我遇到了这个错误

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 311 column 94 (char 28466)

这是因为在 json 文件的前面添加了一个不必要的方括号。

JSON 文件将如下所示

[[{"city": "New York", "state": "New York", "rank": "1\n", "population": ["8,622,698\n"]},

{"city": "Los Angeles", "state": "California", "rank": "2\n", "population": ["3,999,759\n"]}]`

我正在使用此命令进行爬网：

scrapy crawl wiki -o items.json

当我手动删除方括号时，它运行正常。这是另一个python脚本：

import json

with open ("items1.json", "r") as read_file:

data = json.load(read_file)

print(type(data))

编辑

有问题的蜘蛛

# -*- coding: utf-8 -*-`

import scrapy

class WikiSpider(scrapy.Spider):

name = "wiki"

allowed_domains = ["en.wikipedia.org"]

start_urls = ('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population')

def parse(self, response):

table = response.xpath('//table')[4]

trs = table.xpath('.//tr')[1:]

for tr in trs:

rank = tr.xpath('.//td[1]/text()').extract_first()

city = tr.xpath('.//td[2]//text()').extract_first()

state = tr.xpath('.//td[3]//text()').extract()[1]

population = tr.xpath('.//td[4]//text()').extract()

yield {

'rank':rank,

'city': city,

'state': state,

'population':population

}

一只萌萌小番薯

浏览 139回答 1

慕标5832272

[您的 JSON 中肯定有不需要的内容，但我确实运行了您的代码并且它按预期工作。你确定你没有混淆items1.json和items.json？您的问题中都提到了两者。除此之外，我注意到维基百科 URL 是错误的，但我认为这只是一个错字。

0 0

随时随地看视频慕课网APP