正则表达式在嵌套括号之间查找文本

首页课程实战体系课手记专栏慕课教程

正则表达式在嵌套括号之间查找文本

我有一个很长的字符串，它有嵌套循环。我想从中提取一个模式。

String_Text：

some random texts......

........................

{{info .................

.....texts..............

...{{ some text }}...... // nested parenthesis 1

........................

...{{ some text }}...... // nested parenthesis 2

........................

}} // End of topmost parenthesis

........................

..again some random text

........................

........................ // can also contain {{ }}

......End of string.

我想提取最上括号之间的所有文本，即

Extracted_string：

info .................

.....texts..............

...{{ some text }}...... // nested parenthesis 1

........................

...{{ some text }}...... // nested parenthesis 2

........................

图案：

1.）以{开头，后跟任意数量的{。

2.) 之后可以有任意数量的空格。

3.) 后面的第一个词肯定是info。

4.）取出直到未关闭此支架。

到目前为止已经尝试了什么：

re.findall(r'\{+[^\S\r\n]*info\s*(.*(?:\r?\n.*)*)\}+')

我知道这是错误的，因为这样做是在字符串中找到}的最后一个实例。有人可以帮我提取这些括号之间的文字吗？TIA

烙印99

浏览 236回答 3

3回答

慕尼黑的夜晚无繁华

变通模式可以是匹配以开头的行{{info ，然后匹配任何 0+ 个尽可能少的字符直到仅}}在其上的行：re.findall(r'(?sm)^{{[^\S\r\n]*info\s*(.*?)^}}$', s)请参阅正则表达式演示。细节(?sm)- re.DOTALL（现在，.匹配换行符）和re.MULTILINE（^现在，匹配行的开头并$匹配行的结束位置）标志^ - 一行的开始{{- 一个{{子串[^\S\r\n]* - 0+ 个水平空格info - 一个子串\s* -0+空格(.*?) -第1组：0个以上的字符，尽可能少^}}$- 一行的开头和行的}}结尾。

0 0

守着一只汪

这个答案解释了如何使用递归来做到这一点（尽管是圆括号，但是很容易适应），但是，就我个人而言，我只是使用while循环来编写它：b = 1i = si = s.index('{')i += 1while b:    if s[i] == '{': b += 1    elif s[i] == '}': b -=1    i += 1ss = s[si:i]其中，您的字符串定义为：s，给出子字符串，ss，为：>>> print(ss){{info ......................texts.................{{ some text }}...... // nested parenthesis 1...........................{{ some text }}...... // nested parenthesis 2........................}}

0 0

随时随地看视频慕课网APP