使用 re 从 txt 文件制作字典

考虑 asset/logdata.txt 中的标准 Web 日志文件。该文件记录用户在访问网页时进行的访问(就像这个!)。日志的每一行都有以下项目:

  • 主机(例如,'146.204.224.152'

  • user_name(例如,'feest6811'注意:有时用户名会丢失!在这种情况下,请使用“-”作为用户名的值。)

  • 提出请求的时间(例如,'21/Jun/2019:15:45:24 -0700'

  • post 请求类型(例如,'POST /incentivize HTTP/1.1'注意:并非所有内容都是 POST!)

您的任务是将其转换为字典列表,其中每个字典如下所示:

example_dict = {"host":"146.204.224.152", 
                "user_name":"feest6811", 
                "time":"21/Jun/2019:15:45:24 -0700", 
                               "request":"POST /incentivize HTTP/1.1"}

这是 txt 数据文件的示例。

https://img2.mukewang.com/6500235100015a9209780331.jpg

我写了这几行代码:


import re

def logs():

    with open("assets/logdata.txt", "r") as file:

        logdata = file.read()

        #print(logdata)

        pattern="""

        (?P<host>.*)        

        (-\s)   

        (?P<user_name>\w*)  

        (\s) 

        ([POST]*)

        (?P<time>\w*)               

                 """

        for item in re.finditer(pattern,logdata,re.VERBOSE):

            print(item.groupdict())

        return(item)

logs()

它帮助我完成了任务"host","user_name"但是我无法继续完成其余的要求。有人可以帮忙吗?

https://img1.mukewang.com/650023600001477105620326.jpg

慕哥6287543
浏览 109回答 4
4回答

呼唤远方

试试这个我的朋友import redef logs():&nbsp; &nbsp; logs = []&nbsp; &nbsp; w = '(?P<host>(?:\d+\.){3}\d+)\s+(?:\S+)\s+(?P<user_name>\S+)\s+\[(?P<time>[-+\w\s:/]+)\]\s+"(?P<request>.+?.+?)"'&nbsp; &nbsp; with open("assets/logdata.txt", "r") as f:&nbsp; &nbsp; &nbsp; &nbsp; logdata = f.read()&nbsp; &nbsp; for m in re.finditer(w, logdata):&nbsp; &nbsp; &nbsp; &nbsp; logs.append(m.groupdict())&nbsp; &nbsp; return logs

千万里不及你

请看下面的代码:import reregex = re.compile(&nbsp; &nbsp; r'(?P<host>(?:\d+\.){1,3}\d+)\s+-\s+'&nbsp; &nbsp; r'(?P<user_name>[\w+\-]+)?\s+'&nbsp; &nbsp; r'\[(?P<time>[-\w\s:/]+)\]\s+'&nbsp; &nbsp; r'"(?P<request>\w+.+?)"')def logs():&nbsp; &nbsp; data = []&nbsp; &nbsp; with open("assets/logdata.txt", "r") as f:&nbsp; &nbsp; &nbsp; &nbsp; logdata = f.read()&nbsp; &nbsp; &nbsp; &nbsp; for item in regex.finditer(logdata):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x = item.groupdict()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if x["user_name"] is None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x["user_name"] = "-"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; data.append(x)&nbsp; &nbsp; return datalogs()请在下面找到输出部分:[{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/ 1.1'}, {'主机': '197.109.77.178', '用户名': 'kertzmann3129', '时间': '21/Jun/2019:15:45:25 -0700', '请求': '删除/ virtual/solutions/target/web+services HTTP/2.0'}, {'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '21/Jun/2019:15:45:27 -0700', '请求': '删除/interactive/transparent/niches/revolutionize HTTP/1.1'}, {'主机': '100.32.205.59', '用户名': 'ortiz8891', '时间': '21/ Jun/2019:15:45:28 -0700', 'request': 'PATCH /architectures HTTP/1.0'}, {'主机': '168.95.156.240', '用户名': 'stark2413', '时间': '21/Jun/2019:15:45:31 -0700', '请求': 'GET /参与 HTTP/2.0'}, .....] 文本文件的每一行有 979 个字典。

阿波罗的战车

import redef logs():mydata = []with open("assets/logdata.txt", "r") as file:logdata = file.read()pattern="""(?P<host>.*)(\s+)(?:\S+)(\s+)(?P<user_name>\S+)(\s+)\[(?P<time>.*)\]\(\s)(?P<request>"(.)*")"""for item in re.finditer(pattern,logdata,re.VERBOSE):new_item = (item.groupdict())mydata.append(new_item)return(mydata)

繁星淼淼

您正在使用\wget user_names,但\w不包括-可以在日志中的内容(通用日志格式(CLF)),因此您可以使用\S+(除空格之外的一个或多个任何内容)作为替代方案。对于time您可以创建一个捕获组,仅允许该字段的预期字符(类)(例如\w\s,-+时区、/日期和:时间)用方括号(文字)括起来,可以为request使用".import reregex = re.compile(    r'(?P<host>(?:\d+\.){3}\d+)\s+'    r'(?:\S+)\s+'    r'(?P<user_name>\S+)\s+'    r'\[(?P<time>[-+\w\s:/]+)\]\s+'    r'"(?P<request>POST.+?)"')def logs():    data = []    with open("sample.txt", "r") as f:        logdata = f.read()    for m in regex.finditer(logdata):        data.append(m.groupdict())    return dataprint(logs())(将第一行中的 user_name 替换为“-”以在第二行进行测试)[   {      "host":"146.204.224.152",      "user_name":"feest6811",      "time":"21/Jun/2019:15:45:24 -0700",      "request":"POST /incentivize HTTP/l.l"   },   {      "host":"146.204.224.152",      "user_name":"-",      "time":"21/Jun/2019:15:45:24 -0700",      "request":"POST /incentivize HTTP/l.l"   },   {      "host":"144.23.247.108",      "user_name":"auer7552",      "time":"21/Jun/2019:15:45:35 -0700",      "request":"POST /extensible/infrastructures/one-to-one/enterprise HTTP/l.l"   },    ...
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python