Spark Streaming容错机制保障
Spark Streaming主要有三处做了数据容错机制:
Reciever测:
WriteAheadLogBasedStoreResult通过storeBlock()方法保存到blockManager和WAL中;
Driver测:
ReceivedBlockTracker: 处理收到reciever和driver scheduler的调度信息时,会将触发的时间信息保存至wal中(此处类似mysql的redo日志);
Checkpoint机制: 在driver shechuler触发time时间下的generateJob()之后保存这个时间的checkpoint信息,以保障任务突然失败后的恢复逻辑;
Reciever测
WriteAheadLogBasedStoreResult容错逻辑,并行地保存block至blockManager和WAL中,分两步介绍。
Reciever将block保存至blockManager
如果不配置使用wal,保存至blockManager的storeageLevel是用户手动指定的,在kafka中默认的level为:StorageLevel.MEMORY_AND_DISK_SER_2;
如果配置使用wal,则会忽略用户使用的storageLevel,使用如下的storageLevel等级,默认可以使用memory和disk,同时1个备份:
private val blockStoreTimeout = conf.getInt( "spark.streaming.receiver.blockStoreTimeout", 30).seconds private val effectiveStorageLevel = { if (storageLevel.deserialized) { logWarning(s"Storage level serialization ${storageLevel.deserialized} is not supported when" + s" write ahead log is enabled, change to serialization false") } if (storageLevel.replication > 1) { logWarning(s"Storage level replication ${storageLevel.replication} is unnecessary when " + s"write ahead log is enabled, change to replication 1") } StorageLevel(storageLevel.useDisk, storageLevel.useMemory, storageLevel.useOffHeap, false, 1) } if (storageLevel != effectiveStorageLevel) { logWarning(s"User defined storage level $storageLevel is changed to effective storage level " + s"$effectiveStorageLevel when write ahead log is enabled") }
写入WAL
该次write,会调用flush()强制落盘,所以一旦返回,一定保障数据写入、备份成功。
问题1: 该wal并不会用于recover,因为在reciver测并没有找到recover的接口,那该wal有什么用途呢?
当然保障数据的安全性了,在driver测会保存blockInfo信息,一定要保障blockInfo信息对应的block存在;
问题2:该wal因为保存真实的数据,会占用不少空间,它的清理逻辑是怎样的?
当该batch完成之后,会触发一个ClearMetadata()事件,程序判定是否开启wal,如果开启则会清理该batch对应的wal;
def onBatchCompletion(time: Time) { eventLoop.post(ClearMetadata(time)) }
Driver测
ReceivedBlockTracker
ReceivedBlockTracker测的wal是跟配置没有关系的,具体参考该issue:https://issues.apache.org/jira/browse/SPARK-7139,它的作用是将接收到的各个事件(保存的信息很少),输出至wal中(该名字虽然叫wal,跟上述的wal概念还是不一样的);
其保存的具体信息有,在ReceivedBlockTracker类中搜索writeToLog方法即可,可以发现有如下三处:
writeToLog(BlockAdditionEvent(receivedBlockInfo) writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks) writeToLog(BatchCleanupEvent(timesToCleanup)// 对应的事件类private[streaming] case class BlockAdditionEvent(receivedBlockInfo: ReceivedBlockInfo) extends ReceivedBlockTrackerLogEventprivate[streaming] case class BatchAllocationEvent(time: Time, allocatedBlocks: AllocatedBlocks) extends ReceivedBlockTrackerLogEventprivate[streaming] case class BatchCleanupEvent(times: Seq[Time]) extends ReceivedBlockTrackerLogEvent ReceivedBlockInfo( streamId: Int, numRecords: Option[Long], metadataOption: Option[Any], blockStoreResult: ReceivedBlockStoreResult ) private[streaming] trait ReceivedBlockStoreResult { // Any implementation of this trait will store a block id def blockId: StreamBlockId // Any implementation of this trait will have to return the number of records def numRecords: Option[Long] } private[streaming] case class WriteAheadLogBasedStoreResult( blockId: StreamBlockId, numRecords: Option[Long], walRecordHandle: WriteAheadLogRecordHandle ) private[streaming] case class FileBasedWriteAheadLogSegment(path: String, offset: Long, length: Int) extends WriteAheadLogRecordHandle case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]]) { def getBlocksOfStream(streamId: Int): Seq[ReceivedBlockInfo] = { streamIdToAllocatedBlocks.getOrElse(streamId, Seq.empty) } }
可以看出其保存的核心信息为ReceivedBlockInfo,其具体包含有:
streamId: 每个stream的唯一标示;
numRecords: 该batch包含的记录数量;
metaDataOption: 可选metaData信息;
blockStoreResult: ReceivedBlockStoreResult是一个trait,根据该字段可以判定其在reciever测是否使用wal,同时会保存blockId -> (path, offset, length)的映射;
该实现默认是在初始化时开启恢复逻辑的,其逻辑类似于许多存储引擎的回放,具体实现如下:
// Recover block information from write ahead logs if (recoverFromWriteAheadLog) { recoverPastEvents() } llocated block info) prior to failure. */ private def recoverPastEvents(): Unit = synchronized { // Insert the recovered block information def insertAddedBlock(receivedBlockInfo: ReceivedBlockInfo) { logTrace(s"Recovery: Inserting added block $receivedBlockInfo") receivedBlockInfo.setBlockIdInvalid() getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo } // Insert the recovered block-to-batch allocations and clear the queue of received blocks // (when the blocks were originally allocated to the batch, the queue must have been cleared). def insertAllocatedBatch(batchTime: Time, allocatedBlocks: AllocatedBlocks) { logTrace(s"Recovery: Inserting allocated batch for time $batchTime to " + s"${allocatedBlocks.streamIdToAllocatedBlocks}") streamIdToUnallocatedBlockQueues.values.foreach { _.clear() } timeToAllocatedBlocks.put(batchTime, allocatedBlocks) lastAllocatedBatchTime = batchTime } // Cleanup the batch allocations def cleanupBatches(batchTimes: Seq[Time]) { logTrace(s"Recovery: Cleaning up batches $batchTimes") timeToAllocatedBlocks --= batchTimes } writeAheadLogOption.foreach { writeAheadLog => logInfo(s"Recovering from write ahead logs in ${checkpointDirOption.get}") writeAheadLog.readAll().asScala.foreach { byteBuffer => logInfo("Recovering record " + byteBuffer) Utils.deserialize[ReceivedBlockTrackerLogEvent]( JavaUtils.bufferToArray(byteBuffer), Thread.currentThread().getContextClassLoader) match { case BlockAdditionEvent(receivedBlockInfo) => insertAddedBlock(receivedBlockInfo) case BatchAllocationEvent(time, allocatedBlocks) => insertAllocatedBatch(time, allocatedBlocks) case BatchCleanupEvent(batchTimes) => cleanupBatches(batchTimes) } } } }
Checkpoint
class Checkpoint(ssc: StreamingContext, val checkpointTime: Time) extends Logging with Serializable { val master = ssc.sc.master val framework = ssc.sc.appName val jars = ssc.sc.jars val graph = ssc.graph val checkpointDir = ssc.checkpointDir val checkpointDuration = ssc.checkpointDuration val pendingTimes = ssc.scheduler.getPendingTimes().toArray val sparkConfPairs = ssc.conf.getAll def createSparkConf(): SparkConf = { } }
通过Checkpoint类可以看出,其保存至hdfs的信息有:
master: Spark运行master;
framework: Spark启动名字;
jars: Spark运行依赖jars;
graph: Streaming运行依赖graph图(我理解是所依赖的rdd信息);
checkpointDir: checkpoint路径;
checkpointDuration: checkpoint周期;
pendingTimes: 调度pending时间;
sparkConfPairs: sparkConf;
其保存和恢复逻辑较为简单:
保存:每个batch时间都会保存该checkpoit(当然checkpoint周期也可以设置);
恢复:启动driver时,会首先尝试从checkpoint中恢复;
作者:分裂四人组
链接:https://www.jianshu.com/p/f721f6cb681a