如何从列表中删除有错误的 URL?

我将 1000 多个 URL 的列表(这些 URL 用于下载报告)保存在一个.csv文件中。有些 URL 已经存在404 error,我想找到一种方法将它们从列表中删除。


我设法编写了一段代码来识别下面哪个 URL 无效(对于 python 3)。但是,由于存在许多 URL,我不知道如何自动从列表中删除这些 URL。谢谢你!


from urllib.request import urlopen

from urllib.error import HTTPError

try:

   urlopen("url")

except HTTPError as err:

   if err.code == 404:

      print ('invalid')

   else:

      raise 


慕姐4208626
浏览 141回答 4
4回答

犯罪嫌疑人X

您可以使用另一个列表来保存404 url(如果404 url小于正常url),然后获取差异集,所以:from urllib.request import urlopenfrom urllib.error import HTTPErrorexclude_urls = set()try:   urlopen("url")except HTTPError as err:   if err.code == 404:      exclude_urls.add(url)valid_urls = set(all_urls) - exclude_urls

至尊宝的传说

你可以这样做:from urllib.request import urlopenfrom urllib.error import HTTPErrordef load_data(csv_name):   ...def save_data(data,csv_name):   ...links=load_data(csv_name)new_links=set()for i in links:    try:        urlopen("url")    except HTTPError as err:       if err.code == 404:           print ('invalid')    else:        new_links.add(i)save_data( list(new_links),csv_name)

沧海一幻觉

考虑列表 A 包含所有 url。A = A.remove("invalid_url")

慕娘9325324

尝试这样的事情:from urllib.request import urlopenfrom urllib.error import HTTPError# 1. Load the CSV file into a listwith open('urls.csv', 'r') as file:    reader = csv.reader(file)    urls = [row[0] for row in reader]  # Assuming each row has one URL# 2. Check each URL for validity using your codevalid_urls = []for url in urls:    try:        urlopen(url)        valid_urls.append(url)    except HTTPError as err:        if err.code == 404:            print(f'Invalid URL: {url}')        else:            raise  # If it's another type of error, raise it so you're aware# 3. Write the cleaned list back to the CSV filewith open('cleaned_urls.csv', 'w') as file:    writer = csv.writer(file)    for url in valid_urls:        writer.writerow([url])
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Html5