大熊猫:每行适用哪个阈值?

给定分数列,例如,


scores = pd.DataFrame({"score":np.random.randn(10)})

和阈值


thresholds = pd.DataFrame({"threshold":[0.2,0.5,0.8]},index=[7,13,33])

我想找到每个分数的适用阈值,例如,


      score   threshold

 0 -1.613293   NaN

 1 -1.357980   NaN

 2  0.325720     7

 3  0.116000   NaN

 4  1.423171    33

 5  0.282557     7

 6 -1.195269   NaN

 7  0.395739     7

 8  1.072041    33

 9  0.197853   NaN

IOW,对于每个分数,s我都希望阈值t使得


t = min(t: thresholds.threshold[t] < s)

我怎么做?


PS。根据已删除的答案:


pd.cut(scores.score, bins=[-np.inf]+list(thresholds.threshold)+[np.inf],

       labels=["low"]+list(thresholds.index))


哆啦的时光机
浏览 149回答 3
3回答

梵蒂冈之花

您可以使用np.digitize以下方法实现它:indeces = [None,] + thresholds.index.tolist()scores["score"].apply(&nbsp; &nbsp; lambda x: indeces[np.digitize(x, thresholds["threshold"])])

肥皂起泡泡

您可以merge_asof通过一些操作来获得准确的结果。(pd.merge_asof( scores.reset_index().sort_values('score'),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; thresholds.reset_index(),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; left_on='score', right_on= 'threshold', suffixes = ('','_'))&nbsp; &nbsp; &nbsp;.drop('threshold',1).rename(columns={'index_':'threshold'})&nbsp; &nbsp; &nbsp;.set_index('index').sort_index())并使用您的数据,您将获得:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; score&nbsp; thresholdindex&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp;-1.613293&nbsp; &nbsp; &nbsp; &nbsp; NaN1&nbsp; &nbsp; &nbsp;-1.357980&nbsp; &nbsp; &nbsp; &nbsp; NaN2&nbsp; &nbsp; &nbsp; 0.325720&nbsp; &nbsp; &nbsp; &nbsp; 7.03&nbsp; &nbsp; &nbsp; 0.116000&nbsp; &nbsp; &nbsp; &nbsp; NaN4&nbsp; &nbsp; &nbsp; 1.423171&nbsp; &nbsp; &nbsp; &nbsp;33.05&nbsp; &nbsp; &nbsp; 0.282557&nbsp; &nbsp; &nbsp; &nbsp; 7.06&nbsp; &nbsp; &nbsp;-1.195269&nbsp; &nbsp; &nbsp; &nbsp; NaN7&nbsp; &nbsp; &nbsp; 0.395739&nbsp; &nbsp; &nbsp; &nbsp; 7.08&nbsp; &nbsp; &nbsp; 1.072041&nbsp; &nbsp; &nbsp; &nbsp;33.09&nbsp; &nbsp; &nbsp; 0.197853&nbsp; &nbsp; &nbsp; &nbsp; NaN
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python