我有一个包含某些标记HTML文件,我需要ID号码的格式添加到每个标签id="rule_1",id="rule_1.1",id="rule_1.2",id="rule_1.2.1",等。例如,当前的HTML是:
<div style="styles">
<p class="classname">TEXT</p>
<p class="classname">TEXT</p>
<ul style="styles">
<li>
<p class="classname">TEXT</p>
</li>
<li>
<p class="classname">TEXT</p>
</li>
</ul>
</div>
我需要该HTML看起来像这样:
<div style="styles" id="rule_1">
<p class="classname" id="rule_1.1">TEXT</p>
<p class="classname" id="rule_1.2">TEXT</p>
<ul style="styles" id="rule_1.3">
<li id="rule_1.3.1">
<p class="classname" id="rule_1.3.1.1">TEXT</p>
</li>
<li id="rule_1.3.2">
<p class="classname" id="rule_1.3.2.1">TEXT</p>
</li>
</ul>
</div>
我可以手动编写这些内容,但我希望使用现有的HTML解析器库。是否可以使用BeautifulSoup或其他模块?
我尝试过这样的事情:
from bs4 import BeautifulSoup as html_parser
with open('outputs/HTML/{}.html'.format(deal), 'r') as read_file:
html_source = read_file.read()
soup = html_parser(html_source, 'html.parser')
html_tags = soup.find_all(['div', 'p', 'span', 'ul', 'li'])
for each_tag in html_tags:
each_tag.attrs['id'] = html_tags.index(each_tag)
with open('outputs/HTML/{}-id.html'.format(deal), 'w') as save_file:
save_file.write(str(soup))
但这只是添加了id="1",id="2"等等。我怎么可以把它像交错1,1.1,1.1.1,等?
梦里花落0921
相关分类