我正在尝试解析 3 个不同的 RSS 源,这些是源。
https://www.nba.com/bucks/rss.xml
http://www.espn.com/espn/rss/ncb/news
http://rss.nytimes.com/services/xml/rss/nyt/ProBasketball.xml
在大多数情况下,所有这三个来源的结构都相似,除了 url
我正在尝试将这些解析为以下 Feed 对象,
class Feed(Base):
title = models.CharField(db_index=True, unique=True, max_length=255)
link = models.CharField(db_index=True, max_length=255, )
summary = models.TextField(null=True)
author = models.CharField(null=True, max_length=255)
url = models.CharField(max_length=512, null=True)
published = models.DateTimeField()
source = models.ForeignKey(Source, on_delete=models.CASCADE, null=True)
这是源对象,
class Source(Base):
name = models.CharField(db_index=True, max_length=255)
link = models.CharField(db_index=True, max_length=255, unique=True)
这是我用来解析的代码,
import logging
import xml.etree.ElementTree as ET
import requests
import maya
from django.utils import timezone
from aggregator.models import Feed
class ParseFeeds:
@staticmethod
def parse(source):
logger = logging.getLogger(__name__)
logger.info("Starting {}".format(source.name))
root = ET.fromstring(requests.get(source.link).text)
items = root.findall(".//item")
for item in items:
title = ''
if item.find('title'):
title = item.find('title').text
link = ''
if item.find('link'):
link = item.find('link').text
description = ''
if item.find('description'):
description = item.find('description').text
author = ''
if item.find('author'):
author = item.find('author').text
published = timezone.now()
虽然我可以在 python 控制台上解析这些源中的每一个,但此处创建的提要对象以所有None或默认字段结束。我在这里做错了什么。
慕哥9229398
相关分类