我正在使用Python搜索研究论文的XML,搜索特定字符串的项目。我已完成此操作,但我需要获取搜索结果的最前面的部分标题,即 TITLE 和 LABEL 标签及其内容。
#<..... some XML .....>
<sec id="aj387295s3">
<label>3.</label>
<title><italic>CHANDRA</italic> OBSERVATIONS</title>
<p>The 13 candidates were observed with the Advanced CCD Imaging
Spectrometer (ACIS; Burke et al. <xref ref-type="bibr"
rid="aj387295r8">1997</xref>) on board <italic>Chandra</italic>
(Weisskopf et al. <xref ref-type="bibr"
rid="aj387295r46">1996</xref>). We chose the S3 chip to image the
sources because of its better low-energy sensitivity. The standard
TIMED readout with a frame time of 3.2 s was used, and the data were
collected in VFAINT mode. In 12 cases, our <italic>Chandra</italic>
observations led us to conclude that the RASS detection was not of a
candidate INS (see Table <xref ref-type="table"
rid="aj387295t1">1</xref>; the <xref ref-type="sec"
rid="aj387295app1">Appendix</xref> includes a case-by-case discussion
of these sources).</p>
#<..... more XML ....>
我有一个正则表达式来获取包含“Chandra”的行,但我一直在努力获得“3.CHANDRA OBSERVATIONS”。这可能是非常明显的,但是我对正则表达式没有太多的培训。我对Chandra和其余行的正则表达式是“(。*)(c | C)handra \ b”
12345678_0001
墨色风雨
相关分类