pandas apply() 导致 UnboundLocalError

3回答

MYYA

else你的函数中缺少一个：def cluster_name(df):    if df['cluster'] == 1:        value = 'A'    elif df['cluster'] == 2:        value = 'B'        elif df['cluster'] == 3:        value = 'C'    elif df['cluster'] == 4:        value = 'D'    elif df['cluster'] == 5:        value = 'E'    elif df['cluster'] == 6:        value = 'F'    elif df['cluster'] == 7:        value = 'G'    else:        value = ...    return value否则，value如果不在值 {1, 2, ..., 7} 之间，则不会设置df['cluster']，并且会出现异常。

catspeake

手动创建if-else函数被高估了，并且可能会错过某个条件。由于您将字母指定为'cluster_name'，因此请使用string.ascii_uppercase来获取list所有字母中的 a ，并将zip它们分配给中的唯一值'cluster'dict从压缩值创建一个并.map创建'cluster_name'列。此实现使用列中的唯一值来创建映射，因此不会出现"local variable 'value' referenced before assignment".在您出现错误的情况下，这是因为return value当列中存在不符合您的if-else条件的值时执行，这意味着value未在函数中分配。import pandas as pdimport string# test dataframedf = pd.DataFrame({'cluster': range(1, 11)})# unique values from the cluster columnclusters = sorted(df.cluster.unique()) # create a dict to mapcluster_map = dict(zip(clusters, string.ascii_uppercase))# create the cluster_name columndf['cluster_name'] = df.cluster.map(cluster_map)# df cluster cluster_name0 1 A1 2 B2 3 C3 4 D4 5 E5 6 F6 7 G7 8 H8 9 I9 10 J

白衣染霜花

似乎您的问题已在评论中得到解答，因此我将提出一种更面向熊猫的方法来解决您的问题。使用apply(axis=1)DataFrame 速度非常慢，而且几乎没有必要（与迭代数据帧中的行相同），因此更好的方法是使用矢量化方法。最简单的方法是在字典中定义 cluster -> cluster_name 映射，并使用以下方法map：df = pd.DataFrame(    {"cluster": [1,2,3,4,5,6,7]})# repeat this dataframe 10000 timesdf = pd.concat([df] * 10000)应用方法：def mapping_func(row):    if row['cluster'] == 1:        value = 'A'    elif row['cluster'] == 2:        value = 'B'        elif row['cluster'] == 3:        value = 'C'    elif row['cluster'] == 4:        value = 'D'    elif row['cluster'] == 5:        value = 'E'    elif row['cluster'] == 6:        value = 'F'    elif row['cluster'] == 7:        value = 'G'    else:        # This is a "catch-all" in case none of the values in the column are 1-7        value = "Z"            return value%timeit df.apply(mapping_func, axis=1)# 1.32 s ± 91.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each).map方法mapping_dict = {    1: "A",    2: "B",    3: "C",    4: "D",    5: "E",    6: "F",    7: "G"}# the `fillna` is our "catch-all" statement.#  essentially if `map` encounters a value not in the dictionary#  it will place a NaN there. So I fill those NaNs with "Z" to#  be consistent with the above example%timeit df["cluster"].map(mapping_dict).fillna("Z")# 4.87 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)我们可以看到mapwith 字典方法比 while 方法要快得多，apply而且还避免了长if/elif语句链。