如何检查具有自定义容差级别的字符串中是否出现了类似的子字符串

如何检查 substirng 是否在具有特定编辑距离容差的字符串内。例如:


str = 'Python is a multi-paradigm, dynamically typed, multipurpose programming language, designed to be quick (to learn, to use, and to understand), and to enforce a clean and uniform syntax.'

substr1 = 'ython'

substr2 = 'thon'

substr3 = 'cython'

edit_distance_tolerance = 1


substr_in_str(str, substr1, edit_distance_tolerance)

>> True


substr_in_str(str, substr2, edit_distance_tolerance)

>> False


substr_in_str(str, substr3, edit_distance_tolerance)

>> True

我尝试了什么:我尝试将字符串分解为单词并删除特殊字符,然后一一进行比较,但性能(在速度和准确性方面)不是很好。


浮云间
浏览 126回答 2
2回答

阿波罗的战车

这是我想出的递归解决方案,希望它是正确的:def substr_in_str_word(string, substr, edit_distance_tolerance):&nbsp; &nbsp; if edit_distance_tolerance<0:&nbsp; &nbsp; &nbsp; &nbsp; return False&nbsp; &nbsp; if len(substr) == 0:&nbsp; &nbsp; &nbsp; &nbsp; return True&nbsp; &nbsp; if len(string) == 0:&nbsp; &nbsp; &nbsp; &nbsp; return False&nbsp; &nbsp; for s1 in string:&nbsp; &nbsp; &nbsp; &nbsp; for s2 in substr:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if s1==s2:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return substr_in_str(string[1:],substr[1:], edit_distance_tolerance)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return substr_in_str(string[1:],substr[1:], edit_distance_tolerance-1) or \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; substr_in_str(string[1:],substr[1:], edit_distance_tolerance-1) or\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; substr_in_str(string[1:],substr, edit_distance_tolerance-1) or \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; substr_in_str(string,substr[1:], edit_distance_tolerance-1)def substr_in_str(string, substr, edit_distance_tolerance):&nbsp; &nbsp; for word in string.split(' '):&nbsp; &nbsp; &nbsp; &nbsp; if substr_in_str_word(word, substr, edit_distance_tolerance):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return True&nbsp; &nbsp; return False&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;测试:str = 'Python is a multi-paradigm'substr1 = 'ython'substr2 = 'thon'substr3 = 'cython'edit_distance_tolerance = 1print(substr_in_str(str, substr1, edit_distance_tolerance))print(substr_in_str(str, substr2, edit_distance_tolerance))print(substr_in_str(str, substr3, edit_distance_tolerance))输出:TrueFalseTrue

阿晨1998

答案并不像你想象的那么简单,你需要大量的数学来实现这一点,而标准的 re(regex) 库无法解决这个问题。我认为 TRE 库已经在很大程度上解决了这个问题,请参见这里https://github.com/laurikari/tre/
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python