慕码人8056858
您可以np.where为此使用:wa = 0.2*df.A + 0.4*df.B + 0.2*df.Cdf['new_col'] = np.where(df.isna().any(axis=1), df.mean(axis=1), wa)例子df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6], 'C':[7,8,np.nan]}) A B C 0 1 4 7.0 1 2 5 8.0 2 3 6 NaN wa = 0.2*df.A + 0.4*df.B + 0.2*df.Cdf['new_col'] = np.where(df.isna().any(axis=1), df.mean(axis=1), wa) A B C new_col0 1 4 7.0 3.21 2 5 8.0 4.02 3 6 NaN 4.5细节np.where将根据条件的结果在平均值或加权平均值中进行选择has_nans:df.assign(has_nans = df.isna().any(axis=1), mean=df.mean(axis=1), weighted_av = wa) A B C new_col has_nans mean weighted_av0 1 4 7.0 3.2 False 3.80 3.21 2 5 8.0 4.0 False 4.75 4.02 3 6 NaN 4.5 True 4.50 NaN
烙印99
我正要写与 yatu基本相同的答案,但试图提高效率。import pandas as pdimport numpy as npdf = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,np.nan], 'D':[1, np.nan, np.nan]})weights = np.array([0.2,0.4,0.2,0.2])df["w_avg"]= np.where(df.isnull().any(1), df.mean(1), np.dot(df.values, weights))鉴于没有必要计算您不会使用的东西。使用虚拟 dfnp.dot代替wa手动计算在速度和泛化方面更好n = 5000df = pd.DataFrame({"A":np.random.rand(n), "B": np.random.rand(n), "C":np.random.rand(n), "D":np.random.rand(n)})%%timeitwa = 0.2*df.A + 0.4*df.B + 0.2*df.C + 0.2* df.D735 µs ± 19.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)%%timeitwa = np.dot(df.values, weights)18.9 µs ± 732 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)