使用 Python xml.etree.ElementTree 遍历 XML 树的问题

我有一个结构如下所示的 XML 文件（为了这个问题的目的而简化）。对于每条记录，我想提取文章标题和“ArticleId”元素中包含DOI编号的属性“IdType”的值（有时这个属性可能会丢失），然后将文章标题存储在带有DOI的字典中作为关键。

<ArticleTitle>Malathion and dithane induce DNA damage in Vicia faba.</ArticleTitle>

</Article>

</MedlineCitation>

</ArticleIdList>

</PubmedData>

</PubmedArticle>

为了实现这一点，我使用了 xml.etree.ElementTree，如下所示：

import xml.etree.ElementTree as ET

xmldoc = ET.parse('sample.xml')

root = xmldoc.getroot()

pubs = {}

for elem in xmldoc.iter(tag='ArticleTitle'):

title = elem.text

for subelem in xmldoc.iter(tag='ArticleId'):

if subelem.get("IdType") == "doi":

doi = subelem.text

pubs[doi] = title

if len(pubs) == 0:

print "No articles found"

else:

for pub in pubs.keys():

print pub + ' ' + pubs[pub]

但是遍历文档树的循环有问题，因为上面的代码导致：

10.1177/0748233717726877 [Influence of Four Kinds of PPCPs on Micronucleus Rate of the Root-Tip Cells of Vicia-faba and Garlic].

10.1016/j.crvi.2015.02.001 [Influence of Four Kinds of PPCPs on Micronucleus Rate of the Root-Tip Cells of Vicia-faba and Garlic].

也就是说，我得到了正确的 DOI，但只是上一篇文章标题的副本，没有 DOI！

正确的输出应该是：

10.1177/0748233717726877 Malathion and dithane induce DNA damage in Vicia faba.

10.1016/j.crvi.2015.02.001 Impact of dual inoculation with Rhizobium and PGPR on growth and antioxidant status of Vicia faba L. under copper stress.

任何人都可以向我提供一些解决这个烦人问题的提示吗？

aluckdog

浏览 226回答 0

随时随地看视频慕课网APP