在 python 的段落中使用多条件正则表达式提取数字

我在 .txt 文件中有这段文字:


crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02

Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie

Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st

Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers

Swap: 456K total, 30897564k used, 785431k free, 23445897k cached


PID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24 455  36  63  700 800 900 456 87 35 46

2 root 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 thread


crt - 00:00:04 up 200 days, 23:39, 0 users, load average: 0.04, 0.05, 0.02

Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie

Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st

Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers

Swap: 456K total, 30897564k used, 785431k free, 23445897k cached


我想要所有段落中的所有数字值,crt并且不包括和cached之间的值。直到现在我正在使用这个:PIDthread


regex.findall(r'(?<!\d)(?<=\bcrt\b.*?)(?:\d{2}:\d{2}(?::\d{2})?|\d*\.?\d+)(?!\d)(?=.*\bcached\b)', text, regex.S)

但这给出了所有数字,包括PID和之间thread。有任何想法吗?


慕后森
浏览 82回答 2
2回答

长风秋雁

由于您已经在使用该regex模块(支持变量后视),因此您也可以轻松使用\Gand :\K(?:^crt|\G(?!\A))(?:(?!^$)\D)*\K[.:\d]+请参阅regex101.com 上的演示。

眼眸繁星

分解来看,这假设了几件事:(?:&nbsp; &nbsp; ^crt&nbsp; &nbsp; &nbsp; &nbsp; # start a line with crt&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# or&nbsp;&nbsp; &nbsp; \G(?!\A)&nbsp; &nbsp; # start after thre previous match (unless it is the very start of the string))(?:(?!^$)\D)*\K # match any non-digit character, but stop at empty lines[.:\d]+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# character class with ., : and digits在Python代码中可以是:import regex as rejunk = """crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombieCpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%stMem: 123456K total, 1234567k used, 989991k free, 11156793k buffersSwap: 456K total, 30897564k used, 785431k free, 23445897k cachedPID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND24 455&nbsp; 36&nbsp; 63&nbsp; 700 800 900 456 87 35 462 root 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 threadcrt - 00:00:04 up 200 days, 23:39, 0 users, load average: 0.04, 0.05, 0.02Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombieCpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%stMem: 123456K total, 1234567k used, 989991k free, 11156793k buffersSwap: 456K total, 30897564k used, 785431k free, 23445897k cached"""rx = re.compile(r'(?:^crt|\G(?!\A))(?:(?!^$)\D)*\K[.:\d]+', re.M)for match in rx.finditer(junk):&nbsp; &nbsp; print(match.group(0))产量(缩写):00:00:0020023:35...
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python