我正在尝试将 HTML 页面转换为文本并将其存储在文件中。我可以,但是文件中有一些随机的斜线和星号。
这是我正在使用的代码
import html2text
from bs4 import BeautifulSoup
import requests as r
url = r.get("https://dev.bizlem.io:8082/scorpio1/HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018_1.html")
# print(html2text.html2text(url.text))
web_text = url.text
file = open('text', 'w+')
file.write(html2text.html2text(web_text.replace("** \----", "")))
file.close()
这是我得到的输出。
HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018
FROM: JONNY HAMMOND / AFFINITY TANKERS
HANDY & MR FUEL OIL POSITIONS BASIS MALTA, AS OF TUESDAY, 23RD OCTOBER 2018
===========================================================================
DATE VESSEL DWT YR PORT OPEN FLEET COMMENT
\---- \------ \--- -- ---- \---- \----- \-------
23/10 **KRISJANIS VALDEMA 37 07 MALTA 23/10 LATVIAN SUBS**
预期格式
HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018
FROM: JONNY HAMMOND / AFFINITY TANKERS
HANDY & MR FUEL OIL POSITIONS BASIS MALTA, AS OF TUESDAY, 23RD OCTOBER 2018
===========================================================================
DATE VESSEL DWT YR PORT OPEN FLEET COMMENT
---- ------ --- -- ---- ---- ----- -------
23/10 KRISJANIS VALDEMA 37 07 MALTA 23/10 LATVIAN SUBS
尚方宝剑之说
相关分类