猿问

为什么此Spark代码会使NullPointerException?

我在执行Spark应用程序时遇到问题。


源代码:


// Read table From HDFS

val productInformation = spark.table("temp.temp_table1")

val dict = spark.table("temp.temp_table2")


// Custom UDF

val countPositiveSimilarity = udf[Long, Seq[String], Seq[String]]((a, b) => 

    dict.filter(

        (($"first".isin(a: _*) && $"second".isin(b: _*)) || ($"first".isin(b: _*) && $"second".isin(a: _*))) && $"similarity" > 0.7

    ).count

)


val result = productInformation.withColumn("positive_count", countPositiveSimilarity($"title", $"internal_category"))


// Error occurs!

result.show

错误信息:


org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 54.0 failed 4 times, most recent failure: Lost task 0.3 in stage 54.0 (TID 5887, ip-10-211-220-33.ap-northeast-2.compute.internal, executor 150): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (array<string>, array<string>) => bigint)

    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)

    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)

    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)

    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)

    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)

    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)

    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

这是一个示例代码,与我的原始代码相似。示例代码运行良好。我应在哪一点检入原始代码和数据?



繁星点点滴滴
浏览 974回答 2
2回答
随时随地看视频慕课网APP
我要回答