猿问

在 Python 中替换时,我们可以引用字典从键中获取值吗?

我有一个包含术语和句子的平面文件。如果在句子中找到任何术语,我需要将其 id 附加到术语 (term|id) 中。模式匹配应该不区分大小写。此外,我们需要保留与句子中相同的大小写。是否可以在替换调用中使用它的键来引用字典来获取值?


from pandas import DataFrame

import re


df = {'id':[11,12,13,14,15,16],

    'term': ['Ford', 'EXpensive', 'TOYOTA', 'Mercedes Benz', 'electric', 'cars'],

        'sentence': ['F-FORD FORD/FORD is less expensive than Mercedes Benz.' ,'toyota, hyundai mileage is good compared to ford','tesla is an electric-car','toyota too has electric cars','CARS','CArs are expensive.']

        }

#Dataframe creation

df = DataFrame(df,columns= ['id','term','sentence'])


#Dictionary creation

dict = {}

l_term = list(df['term'])

l_id = list(df['id'])


for i,j in zip(l_term,l_id):

    dict[str(i)] = j


#Building patterns to replace

pattern = r'(?i)(?<!-)(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(df["term"],key=len,reverse=True))))


#Replace

df["sentence"].replace(pattern, r"\g<0>|present",, inplace=True,regex=True)


而不是 |present 我需要参考像 |dict.get(\g<0>) 这样的字典,或者有没有其他方法可以实现这一点?此外,如果我们为 16,17 找到两次汽车。我们可以附加任何一个。


预期的结果是


F-FORD FORD|11/FORD|11 is less expensive|12 than Mercedes Benz|14.

toyota|13, hyundai mileage is good compared to ford|11

tesla is an electric|15-car

toyota|13 too has electric|15 cars|16

CARS|16

CArs|16 are expensive|12.


紫衣仙女
浏览 109回答 1
1回答

牛魔王的故事

您可以对当前代码稍作修改:from pandas import DataFrameimport redf = {'id':[11,12,13,14,15,16],&nbsp; &nbsp; 'term': ['Ford', 'EXpensive', 'TOYOTA', 'Mercedes Benz', 'electric', 'cars'],&nbsp; &nbsp; &nbsp; &nbsp; 'sentence': ['F-FORD FORD/FORD is less expensive than Mercedes Benz.' ,'toyota, hyundai mileage is good compared to ford','tesla is an electric-car','toyota too has electric cars','CARS','CArs are expensive.']&nbsp; &nbsp; &nbsp; &nbsp; }#Dataframe creationdf = DataFrame(df,columns= ['id','term','sentence'])#Dictionary creationdct = {}l_term = list(df['term'])l_id = list(df['id'])for i,j in zip(l_term,l_id):&nbsp; &nbsp; dct[str(i).upper()] = j#Building patterns to replacepattern = r'(?i)(?<!-)(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(df["term"],key=len,reverse=True))))#Replacedf["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()]))注意事项:dict是保留名称,不要命名变量dict,使用dctdct[str(i).upper()] = j- 将大写的键添加到字典中以启用字典中的键不区分大小写的搜索df["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()]))是主(最后)行,它使用Series.str.replace它允许使用可调用作为替换参数,一旦模式匹配,匹配将作为 Match 对象传递给 lambda 表达式,x其中使用检索值dct[x.group().upper()]并使用 访问整个匹配x.group()。
随时随地看视频慕课网APP

相关分类

Python
我要回答