猿问

从字符串中提取信息并转换为列表

我有一个如下所示的字符串:


[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,


[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue


[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)

我想提取“X”的值和关联的文本并将其转换为列表。请参阅下面的预期输出:


预期输出:


['X=250.44','DECEMBER 31,']

['X=307.5','respectively. The net decrease in the revenue']

['X=49.5','(US$ in millions)']

我们如何在 Python 中解决这个问题?


我的方法:


mylist = []

for line in data.split("\n"):

    if line.strip():

        x_coord = re.findall('^(X=.*)\,$', line)

        text = re.findall('^(]\w +)', line)

        mylist.append([x_coord, text])

我的方法没有发现x_coord和的任何价值text。


临摹微笑
浏览 150回答 3
3回答

郎朗坤

re解决方案:import reinput = [    "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,",    "[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue",    "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)",]def extract(s):    match = re.search("(X=\d+(?:\.\d*)?).*?\](.*?)$",s)    return match.groups()output = [extract(item) for item in input]print(output)输出:[    ('X=250.44', 'DECEMBER 31,'),    ('X=307.5', 'respectively. The net decrease in the revenue'),    ('X=49.5', '(US$ in millions)'),]解释:\d... 数字\d+...一位或多位数字(?:...)...非捕获(“正常”)括号\.\d*... 点后跟零个或多个数字(?:\.\d*)?...可选(零或一)“小数部分”(X=\d+(?:\.\d*)?)...第一组,X=number.*?...零个或多个任何字符(非贪婪)\]...]符号$... 字符串结尾\](.*?)$...第二组,]字符串之间和结尾之间的任何内容

斯蒂芬大帝

尝试这个:(X=[^,]*)(?:.*])(.*)import resource = """[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)""".split('\n')pattern = r"(X=[^,]*)(?:.*])(.*)"for line in source:    print(re.search(pattern, line).groups())输出:('X=250.44', 'DECEMBER 31,')('X=307.5', 'respectively. The net decrease in the revenue')('X=49.5', '(US$ in millions)')您X=在所有捕获前面,所以我只做了一个捕获组,如果重要的话,请随意添加非捕获组。

MYYA

使用带有命名组的正则表达式来捕获相关位:>>> line = "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,">>> m = re.search(r'(?:\(X=)(?P<x_coord>.*?)(?:,.*])(?P<text>.*)$', line)>>> m.groups()('250.44', 'DECEMBER 31,')>>> m['x_coord']'250.44'>>> m['text']'DECEMBER 31,'
随时随地看视频慕课网APP

相关分类

Python
我要回答