贪婪和懒惰量词。使用 HTML 标签进行测试

输入是


<p>

The very <em>first</em> task is to find the beginning of a paragraph.

</p>

<p>

Then you have to find the end of the paragraph

</p>

预期的第一个输出是(因为我使用贪婪量词)


<p>

The very <em>first</em> task is to find the beginning of a paragraph.

</p>

<p>

Then you have to find the end of the paragraph

</p>

用于贪婪的代码如下


text = '''

<p>

The very <em>first</em> task is to find the beginning of a paragraph.

</p>

<p>

Then you have to find the end of the paragraph

</p>

'''

pattern=re.compile(r'\<p\>.*\<\/p\>')

data1=pattern.match(text,re.MULTILINE)

print('data1:- ',data1,'\n')

预期的第二个输出是(因为我使用的是惰性量词)


<p>

The very <em>first</em> task is to find the beginning of a paragraph.

</p>

用于懒惰的代码如下


text = '''

<p>

The very <em>first</em> task is to find the beginning of a paragraph.

</p>

<p>

Then you have to find the end of the paragraph

</p>

'''

#pattern=re.compile(r'\<p\>.*?\<\/p\>')

pattern=re.compile(r'<p>.*?</p>')

data1=pattern.match(text,re.MULTILINE)

print('data1:- ',data1,'\n')

我得到的实际输出都是 None


达令说
浏览 95回答 1
1回答

蝴蝶刀刀

你有几个问题。首先,使用 时Pattern.match,第二个和第三个参数是位置参数,而不是标志。需要将标志指定为re.compile。其次,您应该使用re.DOTALL来.匹配换行符,而不是re.MULTILINE. 最后 -match坚持匹配发生在字符串的开头(在您的情况下是换行符),因此它不会匹配。您可能想改用Pattern.search。这适用于您的示例输入:pattern=re.compile(r'<p>.*</p>', re.DOTALL)data1=pattern.search(text)print('data1:- ',data1.group(0),'\n')输出:data1:-  <p>The very <em>first</em> task is to find the beginning of a paragraph.</p><p>Then you have to find the end of the paragraph</p> 单场比赛:pattern=re.compile(r'<p>.*?</p>', re.DOTALL)data1=pattern.search(text)print('data1:- ',data1.group(0),'\n')输出:data1:-  <p>The very <em>first</em> task is to find the beginning of a paragraph.</p> 另请注意/, ,<和>在正则表达式中没有特殊含义,不需要转义。我已经在上面的代码中删除了它。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Html5