正则表达式 Python:负前瞻删除/保留开头的数字

目的是将基数和序数保留在字符串的开头,只要它们紧接在单词PERFORMANCE或之前SCORE:


#These numbers are kept:

100 SCORE FOR STUDENT

80 PERFORMANCE FOR TEACHER

但是,如果数字在开头并且以下单词不同,则应将其删除:


#These numbers are removed

10095TH 10097TH 179TH SCHOOL ANIVERSARY

11 12 10 SECONDARY LEVELS

100 100 100 100 SCHOOL AGREEMENT

我遇到的问题是在单词之前PERFORMANCE或SCORE有空格分隔的数字时:


#All numbers should be kept

3 10 100 PERFORMANCE

001 10 12345 SCORE

我正在应用以下正则表达式,但最后一部分很混乱(?!\s*\d*\s*\d*\s*(?:PERFORMANCE|SCORE)\b),因为目前这只是考虑之前PERFORMANCE或SCORE要保留的 3 组数字:


(?<=[A-Za-z]\b )([ 0-9]*(ST|[RN]D|TH)?\b)|^(([\d ]+(ST|[RN]D|TH)?)*\b)(?!\s*\d*\s*\d*\s*(?:PERFORMANCE|SCORE)\b)

以前的正则表达式适用于以下情况:


3 10 100 PERFORMANCE

001 10 12345 SCORE

但如果我添加一组额外的数字,它将不起作用:


3 10 100 1 PERFORMANCE

001 10 1 12345 SCORE

如何推广此规则以包含所有数字集?


慕仙森
浏览 64回答 1
1回答

Cats萌萌

尝试以下操作:^(?:\d+(?:ST|[RN]D|TH)?\s)+(?=[^\d]+$)(?!PERFORMANCE|SCORE)^&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;anchor to beginning(?:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;start non-capturing group&nbsp; &nbsp; \d+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;match one or more digits&nbsp; &nbsp; (?:ST|[RN]D|TH)?&nbsp; &nbsp; optionally followed by one of your approved suffixes&nbsp; &nbsp; \s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; then a whitespace)+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; one or more times(?=[^\d]+$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; assert that the rest of the line is number-free (forces the regex to not backtrack to the last number)(?!PERFORMANCE|SCORE)&nbsp; &nbsp;assert that the following characters are NOT 'PERFORMANCE' or 'SCORE'&nbsp;
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python