我正在使用jsoup解析html并想要在body标签内提取innerHtml
到目前为止,我尝试并使用document.body.childern()。outerHtml; 但它只提供html元素并在正文内部跳过浮动文本(不包含在任何html标记内)
private String getBodyTag(final Document document) { return document.body().children().outerHtml();}
输入:
<!DOCTYPE html><html lang="de"> <head> <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> <link rel="stylesheet" type="text/css" href="assets/style.css"> </head> <body> <div>questions to improve formatting and clarity.</div> <h3>Guided Mode</h3> some sample raw/floating text </body></html>
预期:
<div>questions to improve formatting and clarity.</div><h3>Guided Mode</h3> some sample raw/floating text
实际:
<div>questions to improve formatting and clarity.</div><h3>Guided Mode</h3>
小唯快跑啊
慕后森
相关分类