从字符串中提取信息并转换为列表

3回答

郎朗坤

re解决方案：import reinput = [    "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,",    "[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue",    "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)",]def extract(s):    match = re.search("(X=\d+(?:\.\d*)?).*?\](.*?)$",s)    return match.groups()output = [extract(item) for item in input]print(output)输出：[    ('X=250.44', 'DECEMBER 31,'),    ('X=307.5', 'respectively. The net decrease in the revenue'),    ('X=49.5', '(US$ in millions)'),]解释：\d... 数字\d+...一位或多位数字(?:...)...非捕获（“正常”）括号\.\d*... 点后跟零个或多个数字(?:\.\d*)?...可选（零或一）“小数部分”(X=\d+(?:\.\d*)?)...第一组，X=number.*?...零个或多个任何字符（非贪婪）\]...]符号$... 字符串结尾\](.*?)$...第二组，]字符串之间和结尾之间的任何内容

斯蒂芬大帝

尝试这个：(X=[^,]*)(?:.*])(.*)import resource = """[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)""".split('\n')pattern = r"(X=[^,]*)(?:.*])(.*)"for line in source:    print(re.search(pattern, line).groups())输出：('X=250.44', 'DECEMBER 31,')('X=307.5', 'respectively. The net decrease in the revenue')('X=49.5', '(US$ in millions)')您X=在所有捕获前面，所以我只做了一个捕获组，如果重要的话，请随意添加非捕获组。

MYYA

使用带有命名组的正则表达式来捕获相关位：>>> line = "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,">>> m = re.search(r'(?:\(X=)(?P<x_coord>.*?)(?:,.*])(?P<text>.*)$', line)>>> m.groups()('250.44', 'DECEMBER 31,')>>> m['x_coord']'250.44'>>> m['text']'DECEMBER 31,'