如何在使用带有 lambda 表达式的 Pandas 应用函数时消除类型错误

我正在编写一个智能应用程序,根据来自 UCI 机器学习库的避孕方法选择数据集的数据,确定哪些因素会导致关系中的 0 个孩子,引用 Dua, D. 和 Graff, C. (2019)。UCI 机器学习存储库 [ http://archive.ics.uci.edu/ml]。加州尔湾:加州大学信息与计算机科学学院。我在使用 pandas apply 函数编写 lambda 表达式时遇到问题。


我不确定要尝试什么。


这是一些示例文件


wife's age, wife's education, husband's education, number of children, wife's religion, wife now working, husband's occupation, standard-of-living index, media exposure, contraceptive method used

24,2,3,3,1,1,2,3,0,1

45,1,3,10,1,1,3,4,0,1

43,2,3,7,1,1,3,4,0,1

42,3,2,9,1,1,3,3,0,1

36,3,3,8,1,1,3,2,0,1

19,4,4,0,1,1,3,3,0,1

这是我的代码


#import modules

import pandas as pd


#define functions

def read_datafile():

    d = pd.read_csv('cmc.data.txt', sep=',')

    return d


def create_bin_label(data):

    data['numchildren'] = data.apply(lambda row: 1 if (row['number of children']) <= 0 else 0, axis=1)

    data = data.drop(['number of children'], axis=1)


#read in datafile

data = read_datafile()

print(len(data))


#create a binary label column and delete the old column

bl = create_bin_label(data)

print(data.head())

我希望 create_bin_label(data) 从一组数值属性中找到一个值,例如,孩子的数量可以是任何数字,但我只想要 0,我还希望它将列“numchildren”添加为二进制标签,我希望 create_bin_label(data) 删除旧列(它称为“儿童数”。create_bin_label(data) 所做的是返回一个看起来像这样的错误(尽管我认为重要的部分是某些 str 正在尝试作为 int 处理,但我不确定这是在哪里发生的)


Traceback (most recent call last):

  File "C:\Users\Hezekiah\PycharmProjects\Artificial Intelligence 0\venv\lib\site-packages\pandas\core\indexes\base.py", line 4381, in get_value

    return libindex.get_value_box(s, key)

  File "pandas\_libs\index.pyx", line 52, in pandas._libs.index.get_value_box

  File "pandas\_libs\index.pyx", line 48, in pandas._libs.index.get_value_at

  File "pandas\_libs\util.pxd", line 113, in pandas._libs.util.get_value_at

  File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.validate_indexer

TypeError: 'str' object cannot be interpreted as an integer


During handling of the above exception, another exception occurred:


www说
浏览 204回答 1
1回答

慕仙森

import pandas as pd#define functionsdef read_datafile():&nbsp; &nbsp; d = pd.read_csv('cmc.data.txt', sep=',')&nbsp; &nbsp; return ddef create_bin_label(data,columns):&nbsp; &nbsp; # i added an extra columns argument that holds a list of all column names&nbsp;&nbsp; &nbsp; # the 'number of children' column is on position 3 in the list&nbsp; &nbsp; data['numchildren'] = data.apply(lambda row: 1 if (row[columns[3]]) <= 0 else 0,&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;axis=1)&nbsp; &nbsp; data = data.drop([columns[3]], axis=1)#read in datafiledata = read_datafile()print(len(data))columns = data.columns.values #this creates the list of the dataframe's column names#create a binary label column and delete the old columnbl = create_bin_label(data,columns) # remember to insert the var that holds the colsprint(data)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python