作者:SRE运维博客
躺了好久,诈尸了。因为换了工作,所以比较忙一直没有时间去更新博客的内容(主要还是因为懒🤔)
话不多说 直接上干货。
需求背景
最近在看费用的时候发现有很大一部分费用都是 cloudwatch log中存储了大量的数据,是因为ec2 将日志传输到了存储到了cloudwatch中。这个存储的多的查询日志的时候收费特别的高。另外一个是因为数据分析用途,大数据分析的同事如果想那到数据的话,还是存储在 S3 中是比较划算和方便的,一个是拿取数据比较方便,另外一个是S3 可以最归档存储,后面的大量数据可以分层储存,以此来降低费用。
如果你也想将你的cloudwatch 中日志组中的日志存储到S3中的话可以参考下这篇文章。
前置条件
-
创建 一个 S3 桶,并修改权限
-
创建 lambda 函数
-
有一个Cloudwatch 日志组并且有一天以上的日志
-
给 lambda分配所需的权限
创建 S3 桶并修改权限
{{< tabs 国内S3桶权限配置 国外S3桶权限配置 >}}
{{< tab >}}
国内S3桶权限配置
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "logs.cn-north-1.amazonaws.com.cn"
},
"Action": "s3:GetBucketAcl",
"Resource": "arn:aws-cn:s3:::<bucket name>"
},
{
"Effect": "Allow",
"Principal": {
"Service": "logs.cn-north-1.amazonaws.com.cn"
},
"Action": "s3:PutObject",
"Resource": "arn:aws-cn:s3:::<bucket name>/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
]
}
{{< /tab >}}
{{< tab >}}
国外S3桶权限配置
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "s3:GetBucketAcl",
"Effect": "Allow",
"Resource": "arn:aws:s3:::<bucket name>",
"Principal": { "Service": "logs.us-west-2.amazonaws.com" }
},
{
"Action": "s3:PutObject" ,
"Effect": "Allow",
"Resource": "arn:aws:s3:::<bucket name>*",
"Condition": { "StringEquals": { "s3:x-amz-acl": "bucket-owner-full-control" } },
"Principal": { "Service": "logs.us-west-2.amazonaws.com" }
}
]
}
{{< /tab >}}
{{< /tabs >}}
<ins class=“adsbygoogle”
style=“display:block; text-align:center;”
data-ad-layout=“in-article”
data-ad-format=“fluid”
data-ad-client=“ca-pub-4855142804875926”
data-ad-slot=“5670838583”>
创建 lambda 函数
创建 lambda
import boto3
import logging
import time
import datetime
import json
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def export_s3_logs(bucket_name, log_group_name, log_stream_name, days_of_logs=1, timeout=1000):
'''
today = datetime.datetime.combine(datetime.datetime.utcnow(), datetime.datetime.min.time())
day_end = today
day_start = today - datetime.timedelta(days=days_of_logs)
'''
today = datetime.datetime.combine(datetime.datetime.utcnow() + datetime.timedelta(hours=8),
datetime.datetime.min.time()) # UTC+8
day_end = today - datetime.timedelta(hours=8) # UTC
day_start = today - datetime.timedelta(days=days_of_logs, hours=8) # UTC
#print(day_start)
ts_start = '{0:.0f}'.format(((day_start - datetime.datetime(1970, 1, 1)).total_seconds())*1000)
ts_end = '{0:.0f}'.format(((day_end - datetime.datetime(1970, 1, 1)).total_seconds())*1000)
the_date = '/'.join([str(today.year), '0'+str(today.month)[-2:], '0'+str(today.day)[-2:]])
#folder_name = '/'.join([log_group_name, log_stream_name, the_date])
folder_name = '/'.join([log_group_name,the_date])
client = boto3.client('logs')
#print (ts_start, ts_end)#, day_start, day_end,the_date
task_id = client.create_export_task(
logGroupName=log_group_name,
#logStreamNamePrefix=log_stream_name,
fromTime=int(ts_start),
to=int(ts_end),
destination=bucket_name,
destinationPrefix=folder_name
)['taskId']
i = 1
while i<timeout:
response = client.describe_export_tasks(
taskId=task_id
)
status = response['exportTasks'][0]['status']
if status == 'COMPLETED':
result = True
break
elif status != 'RUNNING':
result = False
break
i += 1
time.sleep(interval)
return result
def lambda_handler(event, context):
region = 'cn-northwest-1' # 日志组所在区域
bucket_name = '<bucket name>' #同区域内的S3桶名称
log_group_name = '<log group name>' #日志组名称
log_stream_name = '1' #默认即可
log_export_days = 1 #默认即可
export_s3_logs(bucket_name, log_group_name, log_stream_name, log_export_days)
给 lambda 分配权限
-
AmazonS3的读写权限
-
CloudWatchLogsFullAccess
验证桶内的文件
最后会以日期的目录将日志归档起来,以方便日后对归档文件进行梳理。
作者:SRE运维博客