在日期(使用正则表达式)拆分聊天日志文件并计算每月的消息数

我有几个聊天记录日志,想计算每月发送和接收的消息数量。一些消息对应于文本文件中的一行,但不是全部。因此,我想在日期和时间拆分消息。然后我想从每个日期中提取月份和年份,并计算消息的数量并在字典中调整这个数字。最后,我想打印月/年和消息数。


这是源文件的样子(日期是d/m/Y):


09/10/2017, 10:55 - Name omitted: Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

09/10/2017, 11:17 - Name omitted: Pellentesque massa tellus, porttitor et iaculis vitae, sodales ac mauris.

Aliquam ullamcorper dictum laoreet. Proin ornare ultrices eros, ut fermentum ex accumsan at. Curabitur dignissim massa a nisi molestie, id hendrerit elit convallis. 


Etiam tincidunt gravida arcu, vel lacinia tellus dignissim eu. Praesent ullamcorper neque eu tellus interdum, in semper nibh sagittis. Fusce dignissim sollicitudin mauris in tempus. Sed in magna ante.

09/10/2017, 11:29 - Name omitted: Nam eu risus laoreet, commodo neque eget, tincidunt risus. Suspendisse eu ullamcorper metus. 

这是我的代码,不幸的是它不起作用。结果我得到一长串 1:


import os

import re


nummessages = {}


datafiles = ("file1.txt", "file2.txt")


for file in datafiles:

    with open(file, "r", encoding="utf8") as infile:

        for line in infile: 

            regexdate = re.compile("([0-9]{2})(\/)([0-9]{2})(\/)([0-9]{4})(,)(\s)([0-9]{2})(:)([0-9]{2})")

            messages = regexdate.split(line)

            for message in messages:

                key = re.search("([0-9]{2})(\/)([0-9]{4})", message)

                value = message.count(message)


                if key in nummessages.keys():

                    nummessages[key].append(value)

                else: 

                    nummessages[key] = [value]



for key in sorted(nummessages.items()):

    print(str(key[0]) + "\t"  + str(key[1]))

我想要的输出如下所示:


09/2017: 45 messages

10/2017: 10 messages

...

我究竟做错了什么?(仅供参考,我是 Python 新手)


慕哥9229398
浏览 165回答 2
2回答

繁华开满天机

尝试这个:此解决方案的主要思想是解析日志的月份和年份,并将其用作data字典中的键。现在,对于匹配相同月份和年份的每个日志,字典的值都会增加 1data = {} # outsidefor file in datafiles:  with open(file, "r", encoding="utf8") as infile:    for l in infile:       m = re.match(r'\d{2}/(\d{2})/(\d{4})', l)      if m:        key = '{}/{}'.format(m.group(1), m.group(2))        if key not in data.keys():          data[key] = 0        data[key] += 1# printingfor k in data:  print '{}: {} messages'.format(k, data[k])lines 参考日志文件中的每一行

扬帆大鱼

使用 collections.defaultdict前任:import refrom collections import defaultdictresult = defaultdict(int)with open(file, "r", encoding="utf8") as infile:    for line in infile:                              #Iterate Each line        line = line.strip()        m = re.match("(\d{2}/(\d{2})/(\d{4}))", line)   #Check if line starts with date        if m:            result["{}/{}".format(m.group(2), m.group(3))] += 1   #form month/year and get count. print(result)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python