使用 SequenceMatcher 比较 pandas 中两列中的字符串

SequenceMatcher不是为熊猫系列设计的。你可以.apply的功能。SequenceMatcher例子偶数空格isjunk=None不被认为是垃圾。Withisjunk=lambda y: y == " "将空格视为垃圾。from difflib import SequenceMatcherimport pandas as pddata = {'Text1': ['Performance results achieved by the approaches submitted to this Challenge.', 'Accuracy is one of the basic principles of perfectionist.'], 'All': ['The six top approaches and three others outperform the strong baseline.', 'Where am I?']}df = pd.DataFrame(data)# isjunk=lambda y: y == " "df['ratio'] = df[['Text1', 'All']].apply(lambda x: SequenceMatcher(lambda y: y == " ", x[0], x[1]).ratio(), axis=1)# display(df) Text1 All ratio0 Performance results achieved by the approaches submitted to this Challenge. The six top approaches and three others outperform the strong baseline. 0.3561641 Accuracy is one of the basic principles of perfectionist. Where am I? 0.088235# isjunk=Nonedf['ratio'] = df[['Text1', 'All']].apply(lambda x: SequenceMatcher(None, x[0], x[1]).ratio(), axis=1)# display(df) Text1 All ratio0 Performance results achieved by the approaches submitted to this Challenge. The six top approaches and three others outperform the strong baseline. 0.4109591 Accuracy is one of the basic principles of perfectionist. Where am I? 0.117647

使用 SequenceMatcher 比较 pandas 中两列中的字符串

1回答