如何在使用带有 lambda 表达式的 Pandas 应用函数时消除类型错误

我正在编写一个智能应用程序，根据来自 UCI 机器学习库的避孕方法选择数据集的数据，确定哪些因素会导致关系中的 0 个孩子，引用 Dua, D. 和 Graff, C. (2019)。UCI 机器学习存储库 [ http://archive.ics.uci.edu/ml]。加州尔湾：加州大学信息与计算机科学学院。我在使用 pandas apply 函数编写 lambda 表达式时遇到问题。

我不确定要尝试什么。

这是一些示例文件

wife's age, wife's education, husband's education, number of children, wife's religion, wife now working, husband's occupation, standard-of-living index, media exposure, contraceptive method used

24,2,3,3,1,1,2,3,0,1

45,1,3,10,1,1,3,4,0,1

43,2,3,7,1,1,3,4,0,1

42,3,2,9,1,1,3,3,0,1

36,3,3,8,1,1,3,2,0,1

19,4,4,0,1,1,3,3,0,1

这是我的代码

#import modules

import pandas as pd

#define functions

def read_datafile():

d = pd.read_csv('cmc.data.txt', sep=',')

return d

def create_bin_label(data):

data['numchildren'] = data.apply(lambda row: 1 if (row['number of children']) <= 0 else 0, axis=1)

data = data.drop(['number of children'], axis=1)

#read in datafile

data = read_datafile()

print(len(data))

#create a binary label column and delete the old column

bl = create_bin_label(data)

print(data.head())

我希望 create_bin_label(data) 从一组数值属性中找到一个值，例如，孩子的数量可以是任何数字，但我只想要 0，我还希望它将列“numchildren”添加为二进制标签，我希望 create_bin_label(data) 删除旧列（它称为“儿童数”。create_bin_label(data) 所做的是返回一个看起来像这样的错误（尽管我认为重要的部分是某些 str 正在尝试作为 int 处理，但我不确定这是在哪里发生的）

Traceback (most recent call last):

File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexes\base.py", line 4381, in get_value

return libindex.get_value_box(s, key)

File "pandas\_libs\index.pyx", line 52, in pandas._libs.index.get_value_box

File "pandas\_libs\index.pyx", line 48, in pandas._libs.index.get_value_at

File "pandas\_libs\util.pxd", line 113, in pandas._libs.util.get_value_at

File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.validate_indexer

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

www说

浏览 204回答 1

1回答

慕仙森

import pandas as pd#define functionsdef read_datafile():    d = pd.read_csv('cmc.data.txt', sep=',')    return ddef create_bin_label(data,columns):    # i added an extra columns argument that holds a list of all column names     # the 'number of children' column is on position 3 in the list    data['numchildren'] = data.apply(lambda row: 1 if (row[columns[3]]) <= 0 else 0,                            axis=1)    data = data.drop([columns[3]], axis=1)#read in datafiledata = read_datafile()print(len(data))columns = data.columns.values #this creates the list of the dataframe's column names#create a binary label column and delete the old columnbl = create_bin_label(data,columns) # remember to insert the var that holds the colsprint(data)

随时随地看视频慕课网APP