Spark Streaming中Driver的容错主要是ReceiverTracker、Dstream.graph、JobGenerator的容错
第一、看ReceiverTracker的容错,主要是ReceiverTracker接收元数据的存入WAL,看ReceiverTracker的addBlock方法,代码如下
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
try {
val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo)) if (writeResult) {
synchronized {
getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
}
logDebug(s"Stream ${receivedBlockInfo.streamId} received " +
s"block ${receivedBlockInfo.blockStoreResult.blockId}")
} else {
logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " +
s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.")
}
writeResult
} catch { case NonFatal(e) =>
logError(s"Error adding block $receivedBlockInfo", e) false
}
}writeToLog方法就是进行WAL的操作,看writeToLog的代码
private def writeToLog(record: ReceivedBlockTrackerLogEvent): Boolean = { if (isWriteAheadLogEnabled) {
logTrace(s"Writing record: $record") try {
writeAheadLogOption.get.write(ByteBuffer.wrap(Utils.serialize(record)),
clock.getTimeMillis()) true
} catch { case NonFatal(e) =>
logWarning(s"Exception thrown while writing record: $record to the WriteAheadLog.", e) false
}
} else { true
}
}首先判断是否开启了WAL,根据一下isWriteAheadLogEnabled值
private[streaming] def isWriteAheadLogEnabled: Boolean = writeAheadLogOption.nonEmpty
接着看writeAheadLogOption
private val writeAheadLogOption = createWriteAheadLog()
再看createWriteAheadLog()方法
private def createWriteAheadLog(): Option[WriteAheadLog] = {
checkpointDirOption.map { checkpointDir =>
val logDir = ReceivedBlockTracker.checkpointDirToLogDir(checkpointDirOption.get)
WriteAheadLogUtils.createLogForDriver(conf, logDir, hadoopConf)
}
}根据checkpoint的配置,获取checkpoint的目录,这里可以看出,checkpoint可以有多个目录。
写完WAL才将receivedBlockInfo放到内存队列getReceivedBlockQueue中
第二、看ReceivedBlockTracker的allocateBlocksToBatch方法,代码如下
def allocateBlocksToBatch(batchTime: Time): Unit = synchronized { if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) {
val streamIdToBlocks = streamIds.map { streamId =>
(streamId, getReceivedBlockQueue(streamId).dequeueAll(x => true))
}.toMap
val allocatedBlocks = AllocatedBlocks(streamIdToBlocks) if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) {
timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
lastAllocatedBatchTime = batchTime
} else {
logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery")
}
} else { // This situation occurs when:
// 1. WAL is ended with BatchAllocationEvent, but without BatchCleanupEvent,
// possibly processed batch job or half-processed batch job need to be processed again,
// so the batchTime will be equal to lastAllocatedBatchTime.
// 2. Slow checkpointing makes recovered batch time older than WAL recovered
// lastAllocatedBatchTime.
// This situation will only occurs in recovery time.
logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery")
}
}从getReceivedBlockQueue中获取每一个receiver的ReceivedBlockQueue队列赋值给streamIdToBlocks,然后包装一下
val allocatedBlocks = AllocatedBlocks(streamIdToBlocks)
allocatedBlocks就是根据时间获取的一批元数据,交给对应batchDuration的job,job在执行的时候就可以使用,在使用前先进行WAL,如果job出错恢复后,可以知道数据计算到什么位置
val allocatedBlocks = AllocatedBlocks(streamIdToBlocks) if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) {
timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
lastAllocatedBatchTime = batchTime
} else {
logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery")
}第三、看cleanupOldBatches方法,cleanupOldBatches的功能是从内存中清楚不用的batches元数据,再删除WAL的数据,再删除之前把要删除的batches信息也进行WAL
def cleanupOldBatches(cleanupThreshTime: Time, waitForCompletion: Boolean): Unit = synchronized { require(cleanupThreshTime.milliseconds < clock.getTimeMillis())
val timesToCleanup = timeToAllocatedBlocks.keys.filter { _ < cleanupThreshTime }.toSeq
logInfo("Deleting batches " + timesToCleanup) if (writeToLog(BatchCleanupEvent(timesToCleanup))) {
timeToAllocatedBlocks --= timesToCleanup
writeAheadLogOption.foreach(_.clean(cleanupThreshTime.milliseconds, waitForCompletion))
} else {
logWarning("Failed to acknowledge batch clean up in the Write Ahead Log.")
}
}总结一下上面的三种WAL,对应下面的三种事件,这就是ReceiverTracker的容错
/** Trait representing any event in the ReceivedBlockTracker that updates its state. */private[streaming] sealed trait ReceivedBlockTrackerLogEventprivate[streaming] case class BlockAdditionEvent(receivedBlockInfo: ReceivedBlockInfo) extends ReceivedBlockTrackerLogEventprivate[streaming] case class BatchAllocationEvent(time: Time, allocatedBlocks: AllocatedBlocks) extends ReceivedBlockTrackerLogEventprivate[streaming] case class BatchCleanupEvent(times: Seq[Time]) extends ReceivedBlockTrackerLogEvent
看一下Dstream.graph和JobGenerator的容错,从开始
private def generateJobs(time: Time) {
SparkEnv has been removed.
SparkEnv.set(ssc.env)
Try { // allocate received blocks to batch
// 分配接收到的数据给batch
jobScheduler.receiverTracker.allocateBlocksToBatch(time) // 使用分配的块生成jobs
graph.generateJobs(time) // generate jobs using allocated block
} match { case Success(jobs) => // 获取元数据信息
val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time) // 提交jobSet
jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos)) case Failure(e) =>
jobScheduler.reportError("Error generating jobs for time " + time, e)
}
eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = false))
}jobs生成完成后发送DoCheckpoint消息,最终调用doCheckpoint方法,代码如下
private def doCheckpoint(time: Time, clearCheckpointDataLater: Boolean) { if (shouldCheckpoint && (time - graph.zeroTime).isMultipleOf(ssc.checkpointDuration)) {
logInfo("Checkpointing graph for time " + time)
ssc.graph.updateCheckpointData(time)
checkpointWriter.write(new Checkpoint(ssc, time), clearCheckpointDataLater)
}
}updateCheckpointData和checkpointWriter.write做了什么,后续
作者:海纳百川_spark
链接:https://www.jianshu.com/p/5397bc160c6b
随时随地看视频