Action/Transformation
所谓的Action与Transformation的区别: Action就是会触发DAGScheduler的runJob()方法,向DAGScheduler提交任务而已罢了;
在RDD类中,可以显式地搜索runJob
,找到如下所谓的Action方法:
foreach(f: T => Unit): Unit foreachPartition(f: Iterator[T] => Unit): Unit collect(): Array[T] toLocalIterator: Iterator[T] reduce(f: (T, T) => T): T fold(zeroValue: T)(op: (T, T) => T): T aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): U count(): Long take(num: Int): Array[T]
而另一类方法,比如saveAsTextFile()方法则隐式地,在函数内部会调用runJob方法:
saveAsTextFile(path: String): Unit def saveAsNewAPIHadoopDataset(conf: Configuration): Unit = self.withScope { ... jobCommitter.setupJob(jobTaskContext) // 此处触发runJob() self.context.runJob(self, writeShard) jobCommitter.commitJob(jobTaskContext) }
窄依赖/宽依赖
窄依赖:查看其dependency,如果为如下dependency则为窄依赖
OneToOneDependency PruneDependency RangeDependency
通常对应的RDD方法为:
map mapValues flatMap filter mapPartitions mapPartitionsWithIndex
宽依赖:其依赖的dependency为ShuffleDependency,其通常对应的RDD方法为(有些RDD方法支持参数可配置是否进行shuffle的):
cogroup groupWith join leftOuterJoin rightOuterJoin groupByKey reduceByKey combineByKey distinct intersection repartition coalesce
DAGScheduler调度算法
DAGScheduler调度的核心为,按照宽依赖(Shuffle)分成各阶段的;
Job: 也就是上述将的submitJob()级别的任务,比如说count()是一个job, saveAsTextFile()也是一个job, take()也是一个job;
Stage: Job按照下述的算法分割成的一个单元模块,如果该stage下没有了宽依赖的RDD或者一个几个RDD组成的;
Task: Spark执行任务的最小单元;
// 通过递归方法完成stage的调度/** Submits stage, but first recursively submits any missing parents. */ private def submitStage(stage: Stage) { val jobId = activeJobForStage(stage) if (jobId.isDefined) { logDebug("submitStage(" + stage + ")") if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) { // 此处获取ShuffleMapStage val missing = getMissingParentStages(stage).sortBy(_.id) logDebug("missing: " + missing) if (missing.isEmpty) { // 如果该stage下没有了宽依赖的RDD,则执行该RDD logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents") submitMissingTasks(stage, jobId.get) } else { for (parent <- missing) { submitStage(parent) } waitingStages += stage } } } else { abortStage(stage, "No active job for stage " + stage.id, None) } }private def getMissingParentStages(stage: Stage): List[Stage] = { val missing = new HashSet[Stage] val visited = new HashSet[RDD[_]] // We are manually maintaining a stack here to prevent StackOverflowError // caused by recursively visiting val waitingForVisit = new Stack[RDD[_]] def visit(rdd: RDD[_]) { if (!visited(rdd)) { visited += rdd val rddHasUncachedPartitions = getCacheLocs(rdd).contains(Nil) if (rddHasUncachedPartitions) { for (dep <- rdd.dependencies) { dep match { // 如果是宽依赖且mapStage还不可用,则添加该stage至missing stage集合 case shufDep: ShuffleDependency[_, _, _] => // 将stage转换为ShuffleMapStage val mapStage = getShuffleMapStage(shufDep, stage.firstJobId) // 如果该stage的输出==其partition则任务已经完成并可用的,该动作是在task完成后更新的 if (!mapStage.isAvailable) { missing += mapStage } case narrowDep: NarrowDependency[_] => waitingForVisit.push(narrowDep.rdd) } } } } } waitingForVisit.push(stage.rdd) while (waitingForVisit.nonEmpty) { visit(waitingForVisit.pop()) } missing.toList }
ShuffleMapStage/ResultStage和ShuffleMapTask/ResultTask
ShuffleMap和Result为DAGScheduler调度算法(参考上部分)对stage的划分,runJob()提交任务的rdd会被转换为ResultStage,而其他由宽依赖所划分的stage则会被转换为ShuffleMapStage;
在针对ShuffleMapStage/ResultStage这两者stage进行任务分发和任务完成处理时是需要分开处理的,在任务分发阶段其处理如下:
val tasks: Seq[Task[_]] = try { stage match { case stage: ShuffleMapStage => // 对于ShuffleMapStage遍历dependencies构造ShuffleMapTask,其runTask()需要依赖shuflleManager partitionsToCompute.map { id => val locs = taskIdToLocations(id) val part = stage.rdd.partitions(id) new ShuffleMapTask(stage.id, stage.latestInfo.attemptId, taskBinary, part, locs, stage.latestInfo.taskMetrics, properties) } //对于ResultStage遍历dependencies构造ResultTask case stage: ResultStage => val job = stage.activeJob.get partitionsToCompute.map { id => val p: Int = stage.partitions(id) val part = stage.rdd.partitions(p) val locs = taskIdToLocations(id) new ResultTask(stage.id, stage.latestInfo.attemptId, taskBinary, part, locs, id, properties, stage.latestInfo.taskMetrics) } } } catch { case NonFatal(e) => abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e)) runningStages -= stage return }
在任务完成阶段,针对ResultTask,判定该job是否成功;
针对ShuffleMapTask,则需要注册mapOutputTracker更新shuffle完成信息;
作者:分裂四人组
链接:https://www.jianshu.com/p/5c5ad8f0f34b