猿问

查找 2 个语句之间的匹配百分比

我有以下 2 个字符串,这实际上意味着相同:


GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO


Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent

我有 1000 多套这样的套路,手动操作会很恐慌,我尝试按以下方式操作:


package main


import (

    "fmt"

    "strings"

)


func main() {

    var str1 = "Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent"

    var str2 = "GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO"


    cnt := 0

    for _, i := range strings.Fields(str1) {

        for _, j := range strings.Fields(str2) {

            if strings.ToLower(i) == strings.ToLower(j) {

                cnt += 1

            }

        }

    }

    fmt.Printf("str1 is: %d length, and str2 is: %d length, they have; %d common words.", len(str1), len(str2), cnt)

}

但是匹配度很低,我得到了:


str1 is: 197 length, and str2 is: 358 length, they have; 29 common words.



但是它们之间的距离看起来很长,我得到了:


Distance between str1 and str2: 304

知道如何改进吗?


哆啦的时光机
浏览 131回答 1
1回答

一只萌萌小番薯

他们可能描述了同样的事情,但是您在不了解这一点的情况下使用算法来比较它们。例如,Levenshtein 距离只是衡量一个字符串等于另一个字符串所需的插入、删除和替换次数。它在“The quick brown fox jumped over the lazy gray dog”和“Dlkj adlkjll o824hs aldkj ladhfj adlbcvhiuywe”上的效果一样好。它不了解词汇或语法。相比之下,再多的字符串处理都不会认识到“站在我面前的鲜红色的房子”与“在我面前是一座闪亮的玫瑰色住宅”描述的是同一件事。您需要寻找自然语言处理算法或 NLP。这些使用起来并不简单,需要一些技巧。我不是 NLP 专家,我建议从搜索golang nlp开始,然后从那里开始。
随时随地看视频慕课网APP

相关分类

Go
我要回答