计算从 HTTP 日志传输的字节数

我是python的新手,我需要解析状态代码。我有一个解析HTTP日志文件的任务:

  • 按 IP 地址或 HTTP 状态代码(由用户选择)对记录的请求进行分组。

  • 为每个组计算以下项之一(由用户选择):

    1. 请求计数

    2. 所有已记录请求的请求计数百分比

    3. 传输的总字节数。

我已经计算了请求和百分比。现在我不知道如何计算传输的字节(第3个任务)。

日志文件的示例(此处在状态代码:6146、52315、12251、54662 之后显示字节):

93.114.45.13 - - [17/May/2015:10:05:17 +0000] "GET /images/jordan-80.png HTTP/1.1" 200 6146 "http://www.semicomplete.com/articles/dynamic-dns-with-dhcp/" "Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0"

93.114.45.13 - - [17/May/2015:10:05:21 +0000] "GET /images/web/2009/banner.png HTTP/1.1" 200 52315 "http://www.semicomplete.com/style2.css" "Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0"

66.249.73.135 - - [17/May/2015:10:05:40 +0000] "GET /blog/tags/ipv6 HTTP/1.1" 200 12251 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1;

+http://www.google.com/bot.html)"83.149.9.216 - - [17/May/2015:10:05:25 +0000] "GET /presentations/logstash-monitorama-2013/images/elasticsearch.png HTTP/1.1" 200 8026 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"

83.149.9.216 - - [17/May/2015:10:05:59 +0000] "GET /presentations/logstash-monitorama-2013/images/logstashbook.png HTTP/1.1" 200 54662 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"


精慕HU
浏览 161回答 2
2回答

慕婉清6462132

此解决方案不是使用多个正则表达式在日志中进行多次传递,而是使用一个正则表达式在一次传递中拉出所有相关值。函数将日志的文本作为单个字符串传递。在下面的演示程序中,使用测试字符串。实际实现将调用此函数,并获取读取实际日志文件的结果。process_log要跟踪 IP 地址/状态对,请使用 使用 a 作为default_factory。列表中的项目数计算 IP 地址/状态组合的查看次数,每个列表项是为该 HTTP 请求传输的字节数。例如,字典的键/值对可能是:defaultdictlistip_statuskey: ('123.12.11.9', '200')  value: [6213, 9876, 376]对上述情况的解释是,发现了3个IP地址“123.12.11.9”的状态代码“200”实例。为这 3 个实例传输的字节数为 6213、9876 和 376。正则表达式的解释:(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - .*?HTTP/1.1" (\d+) (\d+)(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - -这本质上是OP用于识别IP地址的正则表达式,因此不需要太多的解释。我遵循它以提供额外的后续上下文,以防万一有其他类似外观字符串的实例。IP 地址在捕获组 1 中捕获。- -.*?这将不贪婪地匹配 0 个或多个非换行符,直到以下内容。HTTP/1.1"匹配字符串以提供以下各项的左侧上下文。HTTP/1.1"(\d+)匹配捕获组 2 中的一个或多个数字(状态)。匹配单个空格。(\d+)匹配捕获组 3 中的一个或多个数字(传输的字节)。请参阅正则表达式演示换句话说,我只是想确保我从正确的地方选择正确的字段,方法是匹配我期望在我正在寻找的字段旁边找到的内容。当您的正则表达式返回多个组时,通常比 . 返回一个迭代器,该迭代为每次迭代生成一个匹配对象。我添加了代码来生成ip/状态代码/传输字节和状态代码的统计信息。您只需要一个或另一个,具体取决于用户的需求。finditerfindallfinditer代码:import refrom collections import defaultdictdef process_log(log):    ip_counter = defaultdict(list)    status_counter = defaultdict(int)    total_count = 0    for m in re.finditer(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - .*?HTTP/1.1" (\d+) (\d+)', log):        total_count += 1        ip = m[1]        status = m[2]        bytes = int(m[3])        ip_counter[(ip, status)].append(bytes)        status_counter[status] += 1    for k, v in ip_counter.items():        count = len(v)        percentage = count/total_count        total_bytes = sum(v)        ip = k[0]        status = k[1]        print(f"IP Address => {ip}, status => {status}, Count => {count}, Percentage => {percentage}, Total Bytes Transferred => {total_bytes}")    for k, v in status_counter.items():        count = v        percentage = count/total_count        print(f"Status Code => {k}, Percentage => {percentage}")log = """93.114.45.13 - - [17/May/2015:10:05:17 +0000] "GET /images/jordan-80.png HTTP/1.1" 200 6146 "http://www.semicomplete.com/articles/dynamic-dns-with-dhcp/" "Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0"93.114.45.13 - - [17/May/2015:10:05:21 +0000] "GET /images/web/2009/banner.png HTTP/1.1" 200 52315 "http://www.semicomplete.com/style2.css" "Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0"66.249.73.135 - - [17/May/2015:10:05:40 +0000] "GET /blog/tags/ipv6 HTTP/1.1" 200 12251 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1;+http://www.google.com/bot.html)"83.149.9.216 - - [17/May/2015:10:05:25 +0000] "GET /presentations/logstash-monitorama-2013/images/elasticsearch.png HTTP/1.1" 200 8026 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:59 +0000] "GET /presentations/logstash-monitorama-2013/images/logstashbook.png HTTP/1.1" 200 54662 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36""""process_log(log)指纹:IP Address => 93.114.45.13, status => 200, Count => 2, Percentage => 0.4, Total Bytes Transferred => 58461IP Address => 66.249.73.135, status => 200, Count => 1, Percentage => 0.2, Total Bytes Transferred => 12251IP Address => 83.149.9.216, status => 200, Count => 2, Percentage => 0.4, Total Bytes Transferred => 62688Status Code => 200, Percentage => 1.0

拉风的咖菲猫

要从日志文件中获取传输的字节数,请执行以下操作:def getBytes(filename):    with open(filename, 'r') as logfile:        for line in logfile:            regex = r'\/.+?\sHTTP\/1\..\"\s.{3}\s(.+?)\s'            bytesCount = re.search(regex, line)[1]            print("Bytes transfered: "+bytesCount)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python