将 HTML 转换为 TXT

我正在尝试将 HTML 页面转换为文本并将其存储在文件中。我可以,但是文件中有一些随机的斜线和星号。


这是我正在使用的代码


import html2text 

from bs4 import BeautifulSoup

import requests as r 



url = r.get("https://dev.bizlem.io:8082/scorpio1/HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018_1.html")


# print(html2text.html2text(url.text))

web_text = url.text

file = open('text', 'w+')

file.write(html2text.html2text(web_text.replace("** \----", "")))

file.close()

这是我得到的输出。


HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018


FROM: JONNY HAMMOND / AFFINITY TANKERS




HANDY & MR FUEL OIL POSITIONS BASIS MALTA, AS OF TUESDAY, 23RD OCTOBER 2018


===========================================================================




DATE  VESSEL           DWT YR PORT           OPEN  FLEET       COMMENT  


\----  \------           \--- -- ----           \----  \-----       \-------  


23/10 **KRISJANIS VALDEMA 37 07 MALTA           23/10 LATVIAN     SUBS**  

预期格式


HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018


FROM: JONNY HAMMOND / AFFINITY TANKERS




HANDY & MR FUEL OIL POSITIONS BASIS MALTA, AS OF TUESDAY, 23RD OCTOBER 2018


===========================================================================




DATE  VESSEL           DWT YR PORT           OPEN  FLEET       COMMENT       


----  ------           --- -- ----           ----  -----       -------       


23/10 KRISJANIS VALDEMA 37 07 MALTA          23/10 LATVIAN     SUBS  


郎朗坤
浏览 219回答 2
2回答

尚方宝剑之说

您可以使用replace以下方法删除不必要的符号:from html2text import html2textimport requests as rhtml = r.get("https://dev.bizlem.io:8082/scorpio1/HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018_1.html").texttext = html2text(html).replace('*', '').replace('\-', '')with open('text.txt', 'w') as f:    f.write(text)输出将是:HANDY_AND_MR_FUEL_OIL_POSITIONS_BASIS_MALTA_AS_OF_TUESDAY_23RD_OCTOBER_2018FROM: JONNY HAMMOND / AFFINITY TANKERSHANDY & MR FUEL OIL POSITIONS BASIS MALTA, AS OF TUESDAY, 23RD OCTOBER 2018===========================================================================DATE  VESSEL           DWT YR PORT           OPEN  FLEET       COMMENT---  -----           -- -- ----           ---  ----       ------  23/10 KRISJANIS VALDEMA 37 07 MALTA           23/10 LATVIAN     SUBS  25/10 SEAVALOUR          47 07 GREECE         23/10 THENAMARIS  SUBS
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python