Python正则表达式分组查找器

输入:146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622


预期输出:


example_dict = {"host":"146.204.224.152", "user_name":"feest6811","time":"21/Jun/2019:15:45:24 -0700",

"request":"POST /incentivize HTTP/1.1"}

我的代码适用于单独分组,例如:


for item in re.finditer('(?P<host>\d*\.\d*\.\d*.\d*)',logdata):

        print(item.groupdict())


Output: {'host': '146.204.224.152'}


但我没有通过组合每个组来获得输出。下面是我的代码:


for item in re.finditer('(?P<host>\d*\.\d*\.\d*.\d*)(?P<user_name>(?<=-\s)[\w]+\d)(?P<time>(?<=\[).+(?=]))(?P<request>(?<=").+(?="))',logdata):

           print(item.groupdict())


富国沪深
浏览 82回答 2
2回答

收到一只叮咚

如果您连续粘贴两个正则表达式,它们将仅连续匹配文本。例如,如果组合a和b,则正则表达式ab将匹配文本ab,但不匹配acb。您的组合正则表达式遇到了这个问题;您已将正则表达式融合在一起,这些正则表达式显然可以单独工作,但它们与直接相邻的字符串不匹配,因此您必须添加一些填充来覆盖输入中的中间子字符串。这是一个稍微重构的版本,其中添加了填充,并且还进行了一些常规修复,以避免常见的正则表达式初学者错误。for item in re.finditer(r'''&nbsp; &nbsp; &nbsp; &nbsp; (?P<host>\d+\.\d+\.\d+.\d+)&nbsp; &nbsp; &nbsp; &nbsp; (?:[-\s]+)&nbsp; &nbsp; &nbsp; &nbsp; (?P<user_name>\w+\d)&nbsp; &nbsp; &nbsp; &nbsp; (?:[^[]+\[)&nbsp; &nbsp; &nbsp; &nbsp; (?P<time>[^]]+)&nbsp; &nbsp; &nbsp; &nbsp; (?:\][^]"]+")&nbsp; &nbsp; &nbsp; &nbsp; (?P<request>[^"]+)''',&nbsp; &nbsp; &nbsp; &nbsp; logdata, re.VERBOSE):&nbsp; &nbsp; print(item.groupdict())演示: https:&nbsp;//ideone.com/BsNLG7

温温酱

我可能会简化您的正则表达式模式并仅re.findall在此处使用:inp = '146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'matches = re.findall(r'(\d+\.\d+\.\d+\.\d+) - (\S+) \[(.*?)\] "(.*?)"', inp)print(matches)这将生成一个元组列表,其中包含您想要的四个捕获术语:[('146.204.224.152', 'feest6811', '21/Jun/2019:15:45:24 -0700', 'POST /incentivize HTTP/1.1')]
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python