Python iterparse 正在跳过值

首页课程实战体系课手记专栏慕课教程

Python iterparse 正在跳过值

我使用 iterparse 来解析一个大的 xml 文件 (1,8 gb)。我将所有数据写入一个 csv 文件。t 我制作的脚本运行良好，但由于某种原因它随机跳过了几行。这是我的脚本：

import xml.etree.cElementTree as ET

import csv

xml_data_to_csv =open('Out2.csv','w', newline='', encoding='utf8')

Csv_writer=csv.writer(xml_data_to_csv, delimiter=';')

file_path = "Products_50_producten.xml"

context = ET.iterparse(file_path, events=("start", "end"))

EcommerceProductGuid = ""

ProductNumber = ""

Description = ""

ShopSalesPriceInc = ""

Barcode = ""

AvailabilityStatus = ""

Brand = ""

# turn it into an iterator

#context = iter(context)

product_tag = False

for event, elem in context:

tag = elem.tag

if event == 'start' :

if tag == "Product" :

product_tag = True

elif tag == 'EcommerceProductGuid' :

EcommerceProductGuid = elem.text

elif tag == 'ProductNumber' :

ProductNumber = elem.text

elif tag == 'Description' :

Description = elem.text

elif tag == 'SalesPriceInc' :

ShopSalesPriceInc = elem.text

elif tag == 'Barcode' :

Barcode = elem.text

elif tag == 'AvailabilityStatus' :

AvailabilityStatus = elem.text

elif tag == 'Brand' :

Brand = elem.text

if event == 'end' and tag =='Product' :

product_tag = False

List_nodes = []

List_nodes.append(EcommerceProductGuid)

List_nodes.append(ProductNumber)

List_nodes.append(Description)

List_nodes.append(ShopSalesPriceInc)

List_nodes.append(Barcode)

List_nodes.append(AvailabilityStatus)

List_nodes.append(Brand)

Csv_writer.writerow(List_nodes)

print(EcommerceProductGuid)

List_nodes.clear()

EcommerceProductGuid = ""

ProductNumber = ""

Description = ""

ShopSalesPriceInc = ""

Barcode = ""

AvailabilityStatus = ""

Brand = ""

elem.clear()

例如，如果我将“产品”复制 300 次，它会将 csv 文件中第 155 行的“EcommerceProductGuid”值留空。如果我复制 Product 400 次，它会在第 155、310 和 368 行留下一个空值。这怎么可能？

MM们

浏览 200回答 2

2回答

狐的传说

对于它的价值以及可能正在搜索的任何人，上述答案也适用于 lxml 库 iterparse() 。我在使用 lxml 时遇到了类似的问题，并认为我会尝试一下，它的工作原理几乎完全相同。使用 start 事件获取 xml 信息时，随机启动事件将尚未拾取文本项。尝试在结束事件中获取该项目似乎已经用大型 xml 文件解决了我的问题。看起来 Daniel Haley 所做的通过检查文本是否存在增加了另一层保护。

0 0

随时随地看视频慕课网APP

相关分类

Python