如何处理使用 pandas 应用 isbnlib.meta 返回的错误

isbnlib.meta当您输入 isbn 时,我正在使用它来提取元数据(书名、作者、出版商年份等)。我有一个包含 482,000 isbns 的数据框(列标题:isbn13)。当我运行该函数时,我会收到一个错误,NotValidISBNError该错误会停止其轨道中的代码。我想要发生的是,如果出现错误,代码将简单地跳过该行并移至下一行。


现在这是我的代码:


list_df[0]['publisher_isbnlib'] = list_df[0]['isbn13'].apply(lambda x: isbnlib.meta(x).get('Publisher', None))

list_df[0]['yearpublished_isbnlib'] = list_df[0]['isbn13'].apply(lambda x: isbnlib.meta(x).get('Year', None))

#list_df[0]['language_isbnlib'] = list_df[0]['isbn13'].apply(lambda x: isbnlib.meta(x).get('Language', None))

list_df[0]

list_df[0]是我尝试对数据帧进行分块后的前 20,000 行。我刚刚手动输入此代码 24 次来处理每个块。


我尝试尝试:和例外:但最终发生的只是代码停止,并且我没有报告任何元数据。


追溯:

---------------------------------------------------------------------------

NotValidISBNError                         Traceback (most recent call last)

<ipython-input-39-a06c45d36355> in <module>

----> 1 df['meta'] = df.isbn.apply(isbnlib.meta)


e:\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)

   4198             else:

   4199                 values = self.astype(object)._values

-> 4200                 mapped = lib.map_infer(values, f, convert=convert_dtype)

   4201 

   4202         if len(mapped) and isinstance(mapped[0], Series):


pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()


e:\Anaconda3\lib\site-packages\isbnlib\_ext.py in meta(isbn, service)

     23 def meta(isbn, service='default'):

     24     """Get metadata from Google Books ('goob'), Open Library ('openl'), ..."""

---> 25     return query(isbn, service) if isbn else {}

     26 

     27 


e:\Anaconda3\lib\site-packages\isbnlib\dev\_decorators.py in memoized_func(*args, **kwargs)

     22             return cch[key]

     23         else:

---> 24             value = func(*args, **kwargs)

     25             if value:

     26                 cch[key] = value




浮云间
浏览 132回答 2
2回答

弑天下

当前提取 isbn 元数据的实现速度极其缓慢且效率低下。如前所述,有 482,000 个唯一的 isbn 值,其数据被多次下载(例如,每列一次,因为当前编写的代码)最好一次性下载所有元数据,然后从 中提取数据dict,作为单独的操作。块try-except用于捕获无效 isbn 值的错误。返回一个空的dict, ,因为不能与或 一起使用。{}pd.json_normalizeNaNNone没有必要对 isbn 列进行分块。pd.json_normalize用于扩展dictfrom 返回的值.meta。用于pandas.DataFrame.rename重命名列和pandas.DataFrame.drop删除列。此实现将比当前实现快得多,并且对用于获取元数据的 API 发出的请求要少得多。要从 中提取值lists(例如'Authors'列),请使用df_meta = df_meta.explode('Authors');&nbsp;如果有多个作者,将为列表中的每一位附加作者创建一个新行。import pandas as pd&nbsp; # version 1.1.3import isbnlib&nbsp; # version 3.10.3# sample dataframedf = pd.DataFrame({'isbn': ['9780446310789', 'abc', '9781491962299', '9781449355722']})# function with try-except, for invalid isbn valuesdef get_meta(col: pd.Series) -> dict:&nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; return isbnlib.meta(col)&nbsp; &nbsp; except isbnlib.NotValidISBNError:&nbsp; &nbsp; &nbsp; &nbsp; return {}# get the meta data for each isbn or an empty dictdf['meta'] = df.isbn.apply(get_meta)# df&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; isbn&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;meta0&nbsp; 9780446310789&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;{'ISBN-13': '9780446310789', 'Title': 'To Kill A Mockingbird', 'Authors': ['Harper Lee'], 'Publisher': 'Grand Central Publishing', 'Year': '1988', 'Language': 'en'}1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; abc&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;{}2&nbsp; 9781491962299&nbsp; {'ISBN-13': '9781491962299', 'Title': 'Hands-On Machine Learning With Scikit-Learn And TensorFlow - Techniques And Tools To Build Learning Machines', 'Authors': ['Aurélien Géron'], 'Publisher': "O'Reilly Media", 'Year': '2017', 'Language': 'en'}3&nbsp; 9781449355722&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {'ISBN-13': '9781449355722', 'Title': 'Learning Python', 'Authors': ['Mark Lutz'], 'Publisher': '', 'Year': '2013', 'Language': 'en'}# extract all the dicts in the meta columndf = df.join(pd.json_normalize(df.meta)).drop(columns=['meta'])# extract values from the lists in the Authors columndf = df.explode('Authors')# df&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; isbn&nbsp; &nbsp; &nbsp; &nbsp; ISBN-13&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Title&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Authors&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Publisher&nbsp; Year Language0&nbsp; 9780446310789&nbsp; 9780446310789&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;To Kill A Mockingbird&nbsp; &nbsp; &nbsp; Harper Lee&nbsp; Grand Central Publishing&nbsp; 1988&nbsp; &nbsp; &nbsp; &nbsp;en1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; abc&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NaN&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp;NaN&nbsp; &nbsp; &nbsp; NaN2&nbsp; 9781491962299&nbsp; 9781491962299&nbsp; Hands-On Machine Learning With Scikit-Learn And TensorFlow - Techniques And Tools To Build Learning Machines&nbsp; Aurélien Géron&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; OReilly Media&nbsp; &nbsp;2017&nbsp; &nbsp; &nbsp; &nbsp;en3&nbsp; 9781449355722&nbsp; 9781449355722&nbsp;

holdtom

如果没有看到代码,很难回答,但是try/ except应该确实能够处理这个问题。我不是这里的专家,但看看这段代码:l = [0, 1, "a", 2, 3]for item in l:&nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; print(item + 1)&nbsp; &nbsp; except TypeError as e:&nbsp; &nbsp; &nbsp; &nbsp; print(item, "is not integer")&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;如果你尝试用字符串进行加法,Python 会讨厌它并用TypeError. 因此,您捕获了TypeErrorexcept 的使用,并可能报告有关它的一些内容。当我运行这段代码时:12a is not integer&nbsp; # exception handled!34您应该能够使用 处理异常except NotValidISBNError,然后报告您喜欢的任何元数据。您可以通过异常处理变得更加复杂,但这是基本思想。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python