获取 xml 文件 doctype 中的实体列表

libxml2 有一个函数xmlGetDocEntity(doc, name)，它返回一个表示实体的对象，其中一个字段URI包含未解析的实体 URI。这就是我用于执行类似操作的工具的工具：https ://github.com/kibook/s1kd-tools/tree/master/tools/s1kd-refs 。示例用法：$ s1kd-refs --icn DMC-[...].XMLsomegraphic1.cgmsomegraphic2.cgm我使用“//@infoEntityIdent”之类的 XPath 表达式来获取所有使用的图形的列表，然后获取每个图形的实体 URI。请注意，这并未列出 DTD 中声明的所有 ENTITY，仅列出了在 XML 中实际用作<graphic>s 或<symbol>s 的那些。lxml 建立在 libxml2 之上，但我对它不够熟悉，不知道是否有与 xmlGetDocEntity 完全等价的东西。另一种选择是首先使用 XSLT 创建更易于解析的内容：<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">    <xsl:template match="/">      <graphics>        <xsl:apply-templates select="//@infoEntityIdent"/>      </graphics>    </xsl:template>    <xsl:template match="@infoEntityIdent">      <graphic>        <xsl:value-of select="unparsed-entity-uri(.)"/>      </graphic>    </xsl:template></xsl:transform>输出：<graphics>  <graphic>somegraphic1.cgm</graphic>  <graphic>somegraphic2.cgm</graphic></graphics>

获取 xml 文件 doctype 中的实体列表

2回答