猿问

pandas apply() 导致 UnboundLocalError

我有一个包含 2 列的数据框 (df_cluster) [客户 ID,集群]。大约有 13 个集群,我尝试使用 python 中的 apply() 为每个集群分配一个名称。我过去使用过相同的函数并且工作正常,但现在我收到“UnboundLocalError”错误。


如果我做错了什么,请告诉我。我对 apply() 的理解是,它跨轴传递函数(在这种情况下,函数 cluster_name 将为每一行传递)


这是代码


def cluster_name(df):

    if df['cluster'] == 1:

        value = 'A'

    elif df['cluster'] == 2:

        value = 'B'    

    elif df['cluster'] == 3:

        value = 'C'

    elif df['cluster'] == 4:

        value = 'D'

    elif df['cluster'] == 5:

        value = 'E'

    elif df['cluster'] == 6:

        value = 'F'

    elif df['cluster'] == 7:

        value = 'G'

    return value


df_cluster['cluster_name'] = df_cluster.apply(cluster_name, axis = 1)

错误


UnboundLocalError                         Traceback (most recent call last)

<ipython-input-16-b64f3fdc1260> in <module>

     16     return value

     17 

---> 18 df_cluster['cluster_name'] = df_cluster.apply(cluster_name, axis = 1)

     19 df_cluster['cluster_name'].value_counts()


/opt/cloudera/parcels/Anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)

   6926             kwds=kwds,

   6927         )

-> 6928         return op.get_result()

   6929 

   6930     def applymap(self, func):


/opt/cloudera/parcels/Anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/apply.py in get_result(self)

    184             return self.apply_raw()

    185 

--> 186         return self.apply_standard()

    187 

    188     def apply_empty_result(self):



翻阅古今
浏览 141回答 3
3回答

MYYA

else你的函数中缺少一个:def cluster_name(df):&nbsp; &nbsp; if df['cluster'] == 1:&nbsp; &nbsp; &nbsp; &nbsp; value = 'A'&nbsp; &nbsp; elif df['cluster'] == 2:&nbsp; &nbsp; &nbsp; &nbsp; value = 'B'&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; elif df['cluster'] == 3:&nbsp; &nbsp; &nbsp; &nbsp; value = 'C'&nbsp; &nbsp; elif df['cluster'] == 4:&nbsp; &nbsp; &nbsp; &nbsp; value = 'D'&nbsp; &nbsp; elif df['cluster'] == 5:&nbsp; &nbsp; &nbsp; &nbsp; value = 'E'&nbsp; &nbsp; elif df['cluster'] == 6:&nbsp; &nbsp; &nbsp; &nbsp; value = 'F'&nbsp; &nbsp; elif df['cluster'] == 7:&nbsp; &nbsp; &nbsp; &nbsp; value = 'G'&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; value = ...&nbsp; &nbsp; return value否则,value如果不在值 {1, 2, ..., 7} 之间,则不会设置df['cluster'],并且会出现异常。

catspeake

手动创建if-else函数被高估了,并且可能会错过某个条件。由于您将字母指定为'cluster_name',因此请使用string.ascii_uppercase来获取list所有字母中的 a ,并将zip它们分配给中的唯一值'cluster'dict从压缩值创建一个并.map创建'cluster_name'列。此实现使用列中的唯一值来创建映射,因此不会出现"local variable 'value' referenced before assignment".在您出现错误的情况下,这是因为return value当列中存在不符合您的if-else条件的值时执行,这意味着value未在函数中分配。import pandas as pdimport string# test dataframedf = pd.DataFrame({'cluster': range(1, 11)})# unique values from the cluster columnclusters = sorted(df.cluster.unique()) # create a dict to mapcluster_map = dict(zip(clusters, string.ascii_uppercase))# create the cluster_name columndf['cluster_name'] = df.cluster.map(cluster_map)# df   cluster cluster_name0        1            A1        2            B2        3            C3        4            D4        5            E5        6            F6        7            G7        8            H8        9            I9       10            J

白衣染霜花

似乎您的问题已在评论中得到解答,因此我将提出一种更面向熊猫的方法来解决您的问题。使用apply(axis=1)DataFrame 速度非常慢,而且几乎没有必要(与迭代数据帧中的行相同),因此更好的方法是使用矢量化方法。最简单的方法是在字典中定义 cluster -> cluster_name 映射,并使用以下方法map:df = pd.DataFrame(&nbsp; &nbsp; {"cluster": [1,2,3,4,5,6,7]})# repeat this dataframe 10000 timesdf = pd.concat([df] * 10000)应用方法:def mapping_func(row):&nbsp; &nbsp; if row['cluster'] == 1:&nbsp; &nbsp; &nbsp; &nbsp; value = 'A'&nbsp; &nbsp; elif row['cluster'] == 2:&nbsp; &nbsp; &nbsp; &nbsp; value = 'B'&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; elif row['cluster'] == 3:&nbsp; &nbsp; &nbsp; &nbsp; value = 'C'&nbsp; &nbsp; elif row['cluster'] == 4:&nbsp; &nbsp; &nbsp; &nbsp; value = 'D'&nbsp; &nbsp; elif row['cluster'] == 5:&nbsp; &nbsp; &nbsp; &nbsp; value = 'E'&nbsp; &nbsp; elif row['cluster'] == 6:&nbsp; &nbsp; &nbsp; &nbsp; value = 'F'&nbsp; &nbsp; elif row['cluster'] == 7:&nbsp; &nbsp; &nbsp; &nbsp; value = 'G'&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; # This is a "catch-all" in case none of the values in the column are 1-7&nbsp; &nbsp; &nbsp; &nbsp; value = "Z"&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; return value%timeit df.apply(mapping_func, axis=1)# 1.32 s ± 91.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each).map方法mapping_dict = {&nbsp; &nbsp; 1: "A",&nbsp; &nbsp; 2: "B",&nbsp; &nbsp; 3: "C",&nbsp; &nbsp; 4: "D",&nbsp; &nbsp; 5: "E",&nbsp; &nbsp; 6: "F",&nbsp; &nbsp; 7: "G"}# the `fillna` is our "catch-all" statement.#&nbsp; essentially if `map` encounters a value not in the dictionary#&nbsp; it will place a NaN there. So I fill those NaNs with "Z" to#&nbsp; be consistent with the above example%timeit df["cluster"].map(mapping_dict).fillna("Z")# 4.87 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)我们可以看到mapwith 字典方法比 while 方法要快得多,apply而且还避免了长if/elif语句链。
随时随地看视频慕课网APP
我要回答