将 .txt 字典转换为数据框并跳过一些值

首页课程实战体系课手记专栏慕课教程

将 .txt 字典转换为数据框并跳过一些值

我有一个（大部分）字典格式的 .txt 性能日志，如下所示：

10:07:49.1396 信息 {"message":"杀死进程...","level":"信息","logType":"用户","timeStamp":"2020-10-19T10:07:49.1386035+ 02:00"}

10:07:49.4102 信息 {"message":"打开应用程序...","level":"信息","logType":"用户","timeStamp":"2020-10-19T10:07:49.4092373+ 02:00"}

我想将其放入这样的数据框中：

message level logType timeStamp

Killing processes... Information User 2020-10-19T10:07:49.1386035+02:00

Opening applications... Information User 2020-10-19T10:07:49.4092373+02:00

所以基本上只有大括号内的内容。我不需要日志条目开头的“10:07:49.1396 Info”。

我现在正在学习 NumPy 和 Pandas，但作为一个绝对的初学者，我什至不确定仅使用这两个库是否可行。我还需要使用其他东西吗？

桃花长相依

浏览 249回答 2

2回答

陪伴而非守候

您必须手动解析日志以收集相关数据：import re, jsonpattern = re.compile(r'.+? .+? (.+)')logs = []with open('data.txt') as fp: for line in fp: match = pattern.match(line) if match: try: data = json.loads(match.group(1)) logs.append(data) except json.JSONDecodeError: passdf = pd.DataFrame(logs)要实时执行此操作，您必须监视文件的更改。

0 0

三国纷争

这是另一种使用方法json_normalize：import jsonimport repattern = re.compile('{.*}')rows = []with open('a.txt', 'r+') as f:    for line in f:        for match in re.finditer(pattern, line):            data = json.loads(match.group())            dfx = pd.json_normalize(data)            rows.append(dfx)df = pd.concat(rows)print(df)                   message        level logType                          timeStamp0     Killing processes...  Information    User  2020-10-19T10:07:49.1386035+02:000  Opening applications...  Information    User  2020-10-19T10:07:49.4092373+02:00

0 0

随时随地看视频慕课网APP