DAGScheduler
DAGScheduler的主要任务是基于Stage构建DAG,决定每个任务的最佳位置
记录哪个RDD或者Stage输出被物化
面向stage的调度层,为job生成以stage组成的DAG,提交TaskSet给TaskScheduler执行
重新提交shuffle输出丢失的stage
每一个Stage内,都是独立的tasks,他们共同执行同一个computefunction,享有相同的shuffledependencies。DAG在切分stage的时候是依照出现shuffle为界限的。
DAGScheduler实例化
下面的代码是SparkContext实例化DAGScheduler的过程:
@volatile private[spark] var dagScheduler: DAGScheduler = _ try { dagScheduler = new DAGScheduler(this) } catch { case e: Exception => { try { stop() } finally { throw new SparkException("Error while constructing DAGScheduler", e) } } }
下面代码显示了DAGScheduler的构造函数定义中,通过绑定TaskScheduler的方式创建,其中次构造函数去调用主构造函数来将sc的字段填充入参:
private[spark]class DAGScheduler( private[scheduler] val sc: SparkContext, private[scheduler] val taskScheduler: TaskScheduler, listenerBus: LiveListenerBus, mapOutputTracker: MapOutputTrackerMaster, blockManagerMaster: BlockManagerMaster, env: SparkEnv, clock: Clock = new SystemClock()) extends Logging { def this(sc: SparkContext, taskScheduler: TaskScheduler) = { this( sc, taskScheduler, sc.listenerBus, sc.env.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster], sc.env.blockManager.master, sc.env) } def this(sc: SparkContext) = this(sc, sc.taskScheduler)
作业提交与DAGScheduler操作
Action的大部分操作会进行作业(job)的提交,源码1.0版的job提交过程的大致调用链是:sc.runJob()
-->dagScheduler.runJob
-->dagScheduler.submitJob
--->dagSchedulerEventProcessActor.JobSubmitted
-->dagScheduler.handleJobSubmitted
-->dagScheduler.submitStage
-->dagScheduler.submitMissingTasks
-->taskScheduler.submitTasks
。
具体的作业提交执行期的函数调用为:
sc.runJob->dagScheduler.runJob->submitJob
DAGScheduler::submitJob会创建JobSummitted的event发送给内嵌类eventProcessActor(在源码1.4中,submitJob函数中,使用DAGSchedulerEventProcessLoop类进行事件的处理)
eventProcessActor在接收到JobSubmmitted之后调用processEvent处理函数
job到stage的转换,生成finalStage并提交运行,关键是调用submitStage
在submitStage中会计算stage之间的依赖关系,依赖关系分为宽依赖和窄依赖两种
如果计算中发现当前的stage没有任何依赖或者所有的依赖都已经准备完毕,则提交task
提交task是调用函数submitMissingTasks来完成
task真正运行在哪个worker上面是由TaskScheduler来管理,也就是上面的submitMissingTasks会调用TaskScheduler::submitTasks
TaskSchedulerImpl中会根据Spark的当前运行模式来创建相应的backend,如果是在单机运行则创建LocalBackend
LocalBackend收到TaskSchedulerImpl传递进来的ReceiveOffers事件
receiveOffers->executor.launchTask->TaskRunner.run
DAGScheduler的runJob函数
DAGScheduler.runjob最后把结果通过resultHandler保存返回。
这里DAGScheduler的runJob函数调用DAGScheduler的submitJob函数来提交任务:
def runJob[T, U: ClassTag]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], callSite: CallSite, allowLocal: Boolean, resultHandler: (Int, U) => Unit, properties: Properties): Unit = { val start = System.nanoTime val waiter = submitJob(rdd, func, partitions, callSite, allowLocal, resultHandler, properties) waiter.awaitResult() match { case JobSucceeded => { logInfo("Job %d finished: %s, took %f s".format (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9)) } case JobFailed(exception: Exception) => logInfo("Job %d failed: %s, took %f s".format (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9)) throw exception } }
作业提交的调度
在Spark源码1.4.0中,DAGScheduler的submitJob函数不再使用DAGEventProcessActor进行事件处理和消息通信,而是使用DAGSchedulerEventProcessLoop类实例eventProcessLoop进行JobSubmitted事件的post动作。
下面是submitJob函数代码:
/** * Submit a job to the job scheduler and get a JobWaiter object back. The JobWaiter object * can be used to block until the the job finishes executing or can be used to cancel the job. */ def submitJob[T, U]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], callSite: CallSite, allowLocal: Boolean, resultHandler: (Int, U) => Unit, properties: Properties): JobWaiter[U] = { // Check to make sure we are not launching a task on a partition that does not exist. val maxPartitions = rdd.partitions.length partitions.find(p => p >= maxPartitions || p < 0).foreach { p => throw new IllegalArgumentException( "Attempting to access a non-existent partition: " + p + ". " + "Total number of partitions: " + maxPartitions) } val jobId = nextJobId.getAndIncrement() if (partitions.size == 0) { return new JobWaiter[U](this, jobId, 0, resultHandler) } assert(partitions.size > 0) val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _] val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler) eventProcessLoop.post(JobSubmitted( jobId, rdd, func2, partitions.toArray, allowLocal, callSite, waiter, properties)) waiter }
当eventProcessLoop对象投递了JobSubmitted事件之后,对象内的eventThread线程实例对事件进行处理,不断从事件队列中取出事件,调用onReceive函数处理事件,当匹配到JobSubmitted事件后,调用DAGScheduler的handleJobSubmitted函数并传入jobid、rdd等参数来处理Job。
handleJobSubmitted函数
Job处理过程中handleJobSubmitted比较关键,该函数主要负责RDD的依赖性分析,生成finalStage,并根据finalStage来产生ActiveJob。
在handleJobSubmitted函数源码中,给出了部分注释:
private[scheduler] def handleJobSubmitted(jobId: Int, finalRDD: RDD[_], func: (TaskContext, Iterator[_]) => _, partitions: Array[Int], allowLocal: Boolean, callSite: CallSite, listener: JobListener, properties: Properties) { var finalStage: Stage = null try { // New stage creation may throw an exception if, for example, jobs are run on a // HadoopRDD whose underlying HDFS files have been deleted. finalStage = newStage(finalRDD, partitions.size, None, jobId, callSite) } catch { //错误处理,告诉监听器作业失败,返回.... case e: Exception => logWarning("Creating new stage failed due to exception - job: " + jobId, e) listener.jobFailed(e) return } if (finalStage != null) { val job = new ActiveJob(jobId, finalStage, func, partitions, callSite, listener, properties) clearCacheLocs() logInfo("Got job %s (%s) with %d output partitions (allowLocal=%s)".format( job.jobId, callSite.shortForm, partitions.length, allowLocal)) logInfo("Final stage: " + finalStage + "(" + finalStage.name + ")") logInfo("Parents of final stage: " + finalStage.parents) logInfo("Missing parents: " + getMissingParentStages(finalStage)) val shouldRunLocally = localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 1 val jobSubmissionTime = clock.getTimeMillis() if (shouldRunLocally) { // 很短、没有父stage的本地操作,比如 first() or take() 的操作本地执行 // Compute very short actions like first() or take() with no parent stages locally. listenerBus.post( SparkListenerJobStart(job.jobId, jobSubmissionTime, Seq.empty, properties)) runLocally(job) } else { // collect等操作走的是这个过程,更新相关的关系映射,用监听器监听,然后提交作业 jobIdToActiveJob(jobId) = job activeJobs += job finalStage.resultOfJob = Some(job) val stageIds = jobIdToStageIds(jobId).toArray val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo)) listenerBus.post( SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos, properties)) // 提交stage submitStage(finalStage) } } // 提交stage submitWaitingStages() }
小结
该篇文章介绍了DAGScheduler从SparkContext中进行实例化,到执行Action操作时提交任务调用runJob函数,进而介绍了提交任务的消息调度,和处理Job函数handleJobSubmitted函数。
由于在handleJobSubmitted函数中涉及到依赖性分析和stage的源码内容,于是我计划在下一篇文章里进行介绍和源码分析。
转载请注明作者Jason Ding及其出处
GitCafe博客主页(http://jasonding1354.gitcafe.io/)
Github博客主页(http://jasonding1354.github.io/)
CSDN博客(http://blog.csdn.net/jasonding1354)
简书主页(http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
作者:JasonDing
链接:https://www.jianshu.com/p/3d2202105315