使用正则表达式从 4 个列表创建多个字典

我有以下txt文件:


197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554

156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701

100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048

168.95.156.240 - stark2413 [21/Jun/2019:15:45:31 -0700] "GET /engage HTTP/2.0" 201 9645

71.172.239.195 - dooley1853 [21/Jun/2019:15:45:32 -0700] "PUT /cutting-edge HTTP/2.0" 406 24498

180.95.121.94 - mohr6893 [21/Jun/2019:15:45:34 -0700] "PATCH /extensible/reinvent HTTP/1.1" 201 27330

我想创建一个函数,将它们转换为多个字典,其中每一行都是一个字典:


example_dict = {"host":"146.204.224.152", "user_name":"feest6811", "time":"21/Jun/2019:15:45:24 -0700", "request":"POST /incentivize HTTP/1.1"}


到目前为止,我能够做到这一点,为所有项目创建 4 个列表,但我不知道如何为每行创建多个 dic:


import re

def logs():

    with open("assets/logdata.txt", "r") as file:

        logdata = file.read()

        host = (re.findall('(.*?)\-',logdata))

        username = re.findall('\-(.*?)\[',logdata)

        time = re.findall('\[(.*?)\]', logdata)

        request = re.findall('\"(.*?)\"',logdata)

        #for line in range(len(logdata)):

            #dc = {'host':host[line], 'user_name':user_name[line], 'time':time[line], 'request':request[line]}

       


一只斗牛犬
浏览 1652回答 5
5回答

慕斯709654

以下代码片段将生成一个字典列表,日志文件中的每一行都有一个字典。import redef parse_log(log_file):    regex  = re.compile(r'^([0-9\.]+) - (.*) \[(.*)\] (".*")')        def _extract_field(match_object, tag, index, result):        if match_object[index]:            result[tag] = match_object[index]    result = []    with open(log_file) as fh:        for line in fh:            match = re.search(regex, line)            if match:                fields = {}                _extract_field(match, 'host'     , 1, fields)                _extract_field(match, 'user_name', 2, fields)                _extract_field(match, 'time'     , 3, fields)                _extract_field(match, 'request'  , 4, fields)            result.append(fields)    return resultdef main():    result = parse_log('log.txt')    for line in result:        print(line)if __name__ == '__main__':    main()

料青山看我应如是

我现在正在做这门课程,我得到的答案是import redef logs():with open("assets/logdata.txt", "r") as file:&nbsp; &nbsp; logdata = file.read()# YOUR CODE HEREpattern='''(?P<host>[\w.]*)(\ -\ )(?P<user_name>([a-z\-]*[\d]*))(\ \[)(?P<time>\w.*?)(\]\ \")(?P<request>\w.*)(\")'''lst=[]for item in re.finditer(pattern,logdata,re.VERBOSE):&nbsp; &nbsp; lst.append(item.groupdict())print(lst)return lst

跃然一笑

使用str.split()andstr.index()也可以工作,忽略正则表达式的需要。此外,您可以直接迭代文件处理程序,这会逐行生成一行,因此您不必将整个文件加载到内存中:result = []with open('logdata.txt') as f:&nbsp; &nbsp; for line in f:&nbsp; &nbsp; &nbsp; &nbsp; # Isolate host and user_name, discarding the dash in between&nbsp; &nbsp; &nbsp; &nbsp; host, _, user_name, remaining = line.split(maxsplit=3)&nbsp; &nbsp; &nbsp; &nbsp; # Find the end of the datetime and isolate it&nbsp; &nbsp; &nbsp; &nbsp; end_bracket = remaining.index(']')&nbsp; &nbsp; &nbsp; &nbsp; time_ = remaining[1:end_bracket]&nbsp; &nbsp; &nbsp; &nbsp; # Slice out the time from the request and strip the ending newline&nbsp; &nbsp; &nbsp; &nbsp; request = remaining[end_bracket + 1:].strip()&nbsp; &nbsp; &nbsp; &nbsp; # Create the dictionary&nbsp; &nbsp; &nbsp; &nbsp; result.append({&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'host': host,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'user_name': user_name,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'time': time_,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'request': request&nbsp; &nbsp; &nbsp; &nbsp; })from pprint import pprintpprint(result)

莫回无

一旦您解决了您遇到的正则表达式问题 - 下面的代码将适合您import reresult = []with open('data.txt') as f:&nbsp; &nbsp; lines = [l.strip() for l in f.readlines()]&nbsp; &nbsp; for logdata in lines:&nbsp; &nbsp; &nbsp; host = (re.findall('(.*?)\-',logdata))&nbsp; &nbsp; &nbsp; username = re.findall('\-(.*?)\[',logdata)&nbsp; &nbsp; &nbsp; _time = re.findall('\[(.*?)\]', logdata)&nbsp; &nbsp; &nbsp; request = re.findall('\"(.*?)\"',logdata)&nbsp; &nbsp; &nbsp; result.append({'host':host,'user_name':username,'time':_time,&nbsp; &nbsp; 'request':request})print(result)

FFIVE

assets/logdata.txt下面的函数返回一个字典列表,其中包含根据您的原始问题每行匹配的所需键/值。值得注意的是,应在此基础上实施适当的错误处理,因为存在明显的边缘情况可能会导致代码执行意外停止。请注意您的模式的变化host,这很重要。示例中使用的原始模式不仅仅匹配host每行的部分,在模式开头添加锚点会re.MULTILINE停止匹配误报,这些误报将与原始示例中的每行的其余部分匹配。import redef logs():&nbsp; &nbsp; with open("assets/logdata.txt", "r") as file:&nbsp; &nbsp; &nbsp; &nbsp; logdata = file.read()&nbsp; &nbsp; host = (re.findall('^(.*?)\-',logdata, re.MULTILINE))&nbsp; &nbsp; username = re.findall('\-(.*?)\[',logdata)&nbsp; &nbsp; time = re.findall('\[(.*?)\]', logdata)&nbsp; &nbsp; request = re.findall('\"(.*?)\"',logdata)&nbsp; &nbsp; return [{ "host": host[i].strip(), "username": username[i], "time": time[i], "request": request[i] } for i,h in enumerate(host)]以上是基于您原始帖子的简单/最小解决方案。有很多更干净、更有效的方法可以解决这个问题,但是我认为从您现有的代码开始工作,让您了解如何纠正它是相关的,而不仅仅是为您提供一个更好的优化解决方案,相对而言,对你来说意义不大。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python