我有一个包含术语和句子的平面文件。如果在句子中找到任何术语,我需要将其 id 附加到术语 (term|id) 中。模式匹配应该不区分大小写。此外,我们需要保留与句子中相同的大小写。是否可以在替换调用中使用它的键来引用字典来获取值?
from pandas import DataFrame
import re
df = {'id':[11,12,13,14,15,16],
'term': ['Ford', 'EXpensive', 'TOYOTA', 'Mercedes Benz', 'electric', 'cars'],
'sentence': ['F-FORD FORD/FORD is less expensive than Mercedes Benz.' ,'toyota, hyundai mileage is good compared to ford','tesla is an electric-car','toyota too has electric cars','CARS','CArs are expensive.']
}
#Dataframe creation
df = DataFrame(df,columns= ['id','term','sentence'])
#Dictionary creation
dict = {}
l_term = list(df['term'])
l_id = list(df['id'])
for i,j in zip(l_term,l_id):
dict[str(i)] = j
#Building patterns to replace
pattern = r'(?i)(?<!-)(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(df["term"],key=len,reverse=True))))
#Replace
df["sentence"].replace(pattern, r"\g<0>|present",, inplace=True,regex=True)
而不是 |present 我需要参考像 |dict.get(\g<0>) 这样的字典,或者有没有其他方法可以实现这一点?此外,如果我们为 16,17 找到两次汽车。我们可以附加任何一个。
预期的结果是
F-FORD FORD|11/FORD|11 is less expensive|12 than Mercedes Benz|14.
toyota|13, hyundai mileage is good compared to ford|11
tesla is an electric|15-car
toyota|13 too has electric|15 cars|16
CARS|16
CArs|16 are expensive|12.
牛魔王的故事
相关分类