xml文本中的Python正则表达式，查找标签

首页课程实战体系课手记专栏慕课教程

我正在使用Python搜索研究论文的XML，搜索特定字符串的项目。我已完成此操作，但我需要获取搜索结果的最前面的部分标题，即 TITLE 和 LABEL 标签及其内容。

#<..... some XML .....>

<title><italic>CHANDRA</italic> OBSERVATIONS</title>

<p>The 13 candidates were observed with the Advanced CCD Imaging

Spectrometer (ACIS; Burke et al. <xref ref-type="bibr"

rid="aj387295r8">1997</xref>) on board <italic>Chandra</italic>

(Weisskopf et al. <xref ref-type="bibr"

rid="aj387295r46">1996</xref>). We chose the S3 chip to image the

sources because of its better low-energy sensitivity. The standard

TIMED readout with a frame time of 3.2 s was used, and the data were

collected in VFAINT mode. In 12 cases, our <italic>Chandra</italic>

observations led us to conclude that the RASS detection was not of a

candidate INS (see Table <xref ref-type="table"

rid="aj387295t1">1</xref>; the <xref ref-type="sec"

rid="aj387295app1">Appendix</xref> includes a case-by-case discussion

of these sources).</p>

#<..... more XML ....>

我有一个正则表达式来获取包含“Chandra”的行，但我一直在努力获得“3.CHANDRA OBSERVATIONS”。这可能是非常明显的，但是我对正则表达式没有太多的培训。我对Chandra和其余行的正则表达式是“（。*）（c | C）handra \ b”

30秒到达战场

浏览 465回答 2

12345678_0001

如果您找到了正确的<sec>-tag，您只需要获取<label>and 中的文本<title>。title = '{} {}'.format(sec.findtext('label'), ''.join(sec.find('title').itertext())

0 0

墨色风雨

不建议使用RegEx读取XML值，如注释中所述。如果您无论如何都想使用它们：<tag>[\s\S]*?<\/tag>这些标签之间的部分是值。

0 0

随时随地看视频慕课网APP