背景
我的结构如下:
trash = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
{'href': 'https://www.simplyrecipes.com/recipes/cuisine/german/'},
{'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
{'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
{'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]
{'href': 'https://www.simplyrecipes.com/',
'title': 'Simply Recipes Food and Cooking Blog', 'rel': ['home']},]
如您所见,大多数键是'href',大多数值包含'https://www.simplyrecipes.com/recipes/'. 问题是那些不符合此命名约定的键和值......
代码:
此代码遍历结构并使用re.findall获取之间的字符串值'recipes/',然后继续/为其对应的值创建一个新的键名。
for x in trash:
for y in x.values():
txt = ''
for i in re.findall("recipes/.*", y):
txt += i
title = txt.split('/')[1]
print({title: y})
输出:
假设我删除了不符合被命名和包含代码字符串值的命名约定的keys和,如下所示:values'href''https://www.simplyrecipes.com/recipes/'
{'cuisine': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}
{'cuisine': 'https://www.simplyrecipes.com/recipes/cuisine/german/'}
{'season': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}
{'type': 'https://www.simplyrecipes.com/recipes/type/condiment/'}
{'ingredient': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}
问题:
代码的问题是,TypeError: expected string or bytes-like object如果结构的键和值不符合代码中的命名约定,我会得到一个。
问题:
我将如何改进这段代码,以便它跳过任何未命名的键'href',如果它们被命名'href',如果它们的值不包含,将跳过'https://www.simplyrecipes.com/recipes/'?
森林海
相关分类