报纸图书馆、

作为使用 python 主题的绝对新手，我在使用报纸库扩展时遇到了一些困难。我的目标是定期使用报纸扩展程序下载名为“tagesschau”的德国新闻网站的所有新文章和 CNN 的所有文章，以构建我可以在几年内进行分析的数据堆栈。如果我做对了，我可以使用以下命令下载所有文章并将其抓取到 python 库中。

import newspaper

from newspaper import news_pool

tagesschau_paper = newspaper.build('http://tagesschau.de')

cnn_paper = newspaper.build('http://cnn.com')

papers = [tagesschau_paper, cnn_paper]

news_pool.set(papers, threads_per_source=2) # (3*2) = 6 threads total

news_pool.join()`

如果这是下载所有文章的正确方法，那么我如何在 python 之外提取和保存这些文章？或者将这些文章保存在 python 中，以便我再次重新启动 python 时可以重用它们？

慕桂英3389331

浏览 209回答 2

2回答

素胚勾勒不出你

您可以使用 pickle 在 python 之外保存对象并稍后重新打开它们：file_Name = "testfile"# open the file for writingfileObject = open(file_Name,'wb') # this writes the object news_pool to the# file named 'testfile'pickle.dump(news_pool,fileObject)   # here we close the fileObjectfileObject.close()# we open the file for readingfileObject = open(file_Name,'r')  # load the object from the file into var news_pool_reopennews_pool_reopen = pickle.load(fileObject)  

随时随地看视频慕课网APP