猿问

在python中使用正则表达式拆分字符串

我有多个字符串,如:


a = 'avg yearly income 25,07,708.33 '

b = 'current balance 1,25,000.00 in cash\n'

c = 'target savings 50,00,000.00 within next five years 1,000,000.00 '

我试图将它们拆分为文本字符串和数字字符串的块,示例输出如下:


aa = [('avg yearly income', '25,07,708.33')]

bb = [('current balance', '1,25,000.00', 'in cash')]

cc = [('target savings', '50,00,000.00', 'within next five years', '1,000,000.00')]

我正在使用以下代码:


import re

b = b.replace("\n","")

aa = re.findall(r'(.*)\s+(\d+(?:,\d+)*(?:\.\d){1,2})', a)

bb = re.findall(r'(.*)\s+(\d+(?:,\d+)*(?:\.\d){1,2})(.*)\s+', b)

cc = re.findall(r'(.*)\s+(\d+(?:,\d+)*(?:\.\d){1,2})(.*)\s+(\d+(?:,\d+)*(?:\.\d{1,2})?)', c)

我得到以下输出:


aa = [('avg yearly income', '25,07,708.3')]

bb = [('current balance', '1,25,000.0', '0 in')]

cc = [('target savings', '50,00,000.0', '0 within next five years', '1,000,000.00')]

正则表达式的模式有什么问题?


浮云间
浏览 233回答 3
3回答

萧十郎

代替re.findall,您可以使用re.split以字母和数字为界的空格分割字符串:import red = ['avg yearly income 25,07,708.33 ', 'current balance 1,25,000.00 in cash\n', 'target savings 50,00,000.00 within next five years 1,000,000.00 ']final_results = [re.split('(?<=[a-zA-Z])\s(?=\d)|(?<=\d)\s(?=[a-zA-Z])', i) for i in d]new_results = [[i.rstrip() for i in b] for b in final_results]输出:[['avg yearly income', '25,07,708.33'], ['current balance', '1,25,000.00', 'in cash'], ['target savings', '50,00,000.00', 'within next five years', '1,000,000.00']]

繁花不似锦

您可以re.split与ptrn一起使用r'(?<=\d)\s+(?=\w)|(?<=\w)\s+(?=\d)'>>> ptrn = r'(?<=\d)\s+(?=\w)|(?<=\w)\s+(?=\d)'>>> re.split(ptrn, a)['avg yearly income', '25,07,708.33 ']>>> re.split(ptrn, b)['current balance', '1,25,000.00', 'in cash\n']>>> re.split(ptrn, c)['target savings', '50,00,000.00', 'within next five years', '1,000,000.00 ']

杨魅力

使用re.split(); 这个例子使用你原来的正则表达式,它工作正常:>>> r = re.compile(r'(\d+(?:,\d+)*(?:\.\d{1,2}))')>>> r.split('avg yearly income 25,07,708.33 ')['avg yearly income ', '25,07,708.33', ' ']>>> r.split('current balance 1,25,000.00 in cash\n')['current balance ', '1,25,000.00', ' in cash\n']>>> r.split('target savings 50,00,000.00 within next five years 1,000,000.00 ')['target savings ', '50,00,000.00', ' within next five years ', '1,000,000.00', ' ']
随时随地看视频慕课网APP

相关分类

Python
我要回答