我有数据框:
df =
original_title title
Mexico Oil Gas Summit
Mexico Oil Gas Summit
我必须模糊匹配这两个(original_title & title)列的实体并获得分数。下面是我的代码:
compare = pd.MultiIndex.from_product([ df['original_title'],df ['title'] ]). to_series()
def metrics (tup):
return pd.Series([fuzz.partial_ratio(*tup),fuzz.token_sort_ratio(*tup)], ['partial', 'token'])
compare.apply(metrics)
上面的代码将每个原始标题与整个标题列进行比较。同时,我希望它将每个原始标题与每行中的标题进行比较。我的预期结果是:
df =
original_title title partial_ratio
Mexico Oil Africa Oil 81
French Property Exhibition French 100
French Exhibition French Exhibition 100
感谢您的帮助。谢谢
芜湖不芜
相关分类