替换文本文件中的错误网址并在 Python 中修复它们

我收到的 URL 已删除前向睫毛,我基本上需要更正文本文件内的 url。

文件中的 URL 如下所示:

https:www.ebay.co.ukitmReds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1

我需要将其更正为:

https://www.ebay.co.uk/itm/Reds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare/124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1

所以基本上我需要一个正则表达式或其他方式来编辑文件中每个 URL 的正斜杠,并替换文件中损坏的 URL。


心有法竹
浏览 121回答 1
1回答

江户川乱折腾

while True:&nbsp; &nbsp; import time&nbsp; &nbsp; import re&nbsp; &nbsp; #input file&nbsp; &nbsp; fin = open("ebay2.csv", "rt")&nbsp; &nbsp; #output file to write the result to&nbsp; &nbsp; fout = open("out.txt", "wt")&nbsp; &nbsp; #for each line in the input file&nbsp; &nbsp; for line in fin:&nbsp; &nbsp; &nbsp; &nbsp; #read replace the string and write to output file&nbsp; &nbsp; &nbsp; &nbsp; fout.write(line.replace('https://www.ebay.co.uk/sch/', 'https://').replace('itm', '/itm/').replace('https:www.ebay','https://www.ebay'))&nbsp; &nbsp; with open('out.txt') as f:&nbsp; &nbsp; &nbsp; regex = r"\d{12}"&nbsp; &nbsp; &nbsp; subst = "/\\g<0>"&nbsp; &nbsp; &nbsp; for l in f:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result = re.sub(regex, subst, l, 0, re.MULTILINE)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if result:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print(result)&nbsp; &nbsp; fin.close()&nbsp; &nbsp; fout.close()&nbsp; &nbsp; time.sleep(1)我最终想出了这个。这有点笨拙,但完成工作的速度足够快。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python