Linux系统查看文件内容的特殊方法:
最基本的有cat和less,more,如果有特殊的要求的话。
1/如果只想看文件的前5行,可以使用head
命令,如:head -5 /etc/passwd
2/如果想查看文件的后10行,可以使用tail
命令,如:tail -10 /etc/passwd
3/参数-f
使tail
不停地去读最新的内容,这样有实时监视的效果:tail -f /var/log/messages
定时调度工具的使用
1/各种工具聚集的网站:https://tool.lu/crontab
2/linux crontab 定时
,crontab -e
然后在里面编辑:*/1 * * * * //1代表1分钟
3/vi log_generator.sh
4/把模拟生产日志的脚本generate_log.py
执行脚本放进去:python /home/hadoop/data/project/generate_log.py
5/添加sh
执行权限chmod u+x log_generator.sh
6/验证日志能否输出,在日志文件生成的文件目录下执行:tail -200f logs/access.log
,定时监控
应用服务器产生access.log ==> 控制台输出
1/Flume配置:exec +memory +logger
2/配置文件accesslog_to_logger.conf
(exec-memory-logger)
:先输出到控制台测试一下
exec-memory-logger.sources = exec-sourceexec-memory-logger.sinks = logger-sinkexec-memory-logger.channels = memory-channelexec-memory-logger.sources.exec-source.type = execexec-memory-logger.sources.exec-source.command = tail -F /home/hadoop/data/project/logs/access.logexec-memory-logger.sources.exec-source.shell = /bin/sh -cexec-memory-logger.channels.memory-channel.type = memoryexec-memory-logger.sinks.logger-sink.type = loggerexec-memory-logger.sources.exec-source.channels = memory-channelexec-memory-logger.sinks.logger-sink.channel = memory-channel
3/启动flume-ng agent
flume-ng agent \ --name exec-memory-logger \ --conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/accesslog_to_logger.conf \ -Dflume.root.logger=INFO,console
4/每隔1分钟即可在Flume控制台看到日志输出
日志文件==>Flume==>Kafka
1/启动zk:./zkServer.sh start
2/启动Kafka Server:kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
3/修改Flume配置文件使得flume sink数据到Kafka
选型:exec-memory-kafkatype:org.apache.flume.sink.kafka.KafkaSinkbrokerList、topic、requiredAck、batchSize
accesslog_to_kafka.conf
exec-memory-kafka.sources = exec-sourceexec-memory-kafka.sinks = kafka-sinkexec-memory-kafka.channels = memory-channelexec-memory-kafka.sources.exec-source.type = execexec-memory-kafka.sources.exec-source.command = tail -F /home/hadoop/data/project/logs/access.logexec-memory-kafka.sources.exec-source.shell = /bin/sh -cexec-memory-kafka.channels.memory-channel.type = memoryexec-memory-kafka.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSinkexec-memory-kafka.sinks.kafka-sink.brokerList = hadoop:9092exec-memory-kafka.sinks.kafka-sink.topic = flume-kafka-streaming-topicexec-memory-kafka.sinks.kafka-sink.batchSize = 5exec-memory-kafka.sinks.kafka-sink.requiredAcks = 1exec-memory-kafka.sources.exec-source.channels = memory-channelexec-memory-kafka.sinks.kafka-sink.channel = memory-channel
4/启动flume-ng agent
flume-ng agent \ --name exec-memory-kafka \ --conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/accesslog_to_kafka.conf \ -Dflume.root.logger=INFO,console
5/启动kafka消费者进行消费kafka-console-consumer.sh --zookeeper hadoop:2181 --topic flume-kafka-streaming-topic
6/代码消费hadoop:2181 test flume-kafka-streaming-topic 1
package com.feiyue.bigdata.sparkstreamingimport org.apache.spark.SparkConfimport org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext} object FlumeKafkaStreamingTest { def main(args: Array[String]): Unit = { if (args.length != 4) { println("Usage: FlumeKafkaStreamingTest <zkQuorum> <group> <topics> <numThreads>") System.exit(1) } val sparkConf = new SparkConf().setMaster("local[2]").setAppName("FlumeKafkaStreamingTest") val ssc = new StreamingContext(sparkConf, Seconds(60)) val Array(zkQuorum, group, topics, numThreads) = args val topicsMap = topics.split(",").map((_, numThreads.toInt)).toMap val messages = KafkaUtils.createStream(ssc, zkQuorum, group, topicsMap) messages.map(_._2).count().print() ssc.start() ssc.awaitTermination() } }
map(_._2) 等价于 map(t => t.2) //t是个2项以上的元组
map(._2, _) 等价与 map(t => t._2, t) //这会返回第二项为首后面项为旧元组的新元组
作者:sparkle123
链接:https://www.jianshu.com/p/00735e28dcc5