我有以下 2 个字符串,这实际上意味着相同:
GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO
和
Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent
我有 1000 多套这样的套路,手动操作会很恐慌,我尝试按以下方式操作:
package main
import (
"fmt"
"strings"
)
func main() {
var str1 = "Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent"
var str2 = "GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO"
cnt := 0
for _, i := range strings.Fields(str1) {
for _, j := range strings.Fields(str2) {
if strings.ToLower(i) == strings.ToLower(j) {
cnt += 1
}
}
}
fmt.Printf("str1 is: %d length, and str2 is: %d length, they have; %d common words.", len(str1), len(str2), cnt)
}
但是匹配度很低,我得到了:
str1 is: 197 length, and str2 is: 358 length, they have; 29 common words.
但是它们之间的距离看起来很长,我得到了:
Distance between str1 and str2: 304
知道如何改进吗?
一只萌萌小番薯
相关分类