Spark:读取/写入CSV时发生ClassNotFoundException

我正在尝试将以下DataFrame写入HDFS上的CSV文件


df.write()

  .format("com.databricks.spark.csv")

  .option("header", "true")

  .save("/user/cloudera/csv");

但我收到以下错误


Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat

...

Caused by: java.lang.ClassNotFoundException: org.apache.commons.csv.CSVFormat

... 21 more

我的pom.xml具有以下依赖关系


<dependency>

  <groupId>com.databricks</groupId>

  <artifactId>spark-csv_2.10</artifactId>

  <version>1.5.0</version>

</dependency>


<dependency>

  <groupId>org.apache.commons</groupId>

  <artifactId>commons-csv</artifactId>

  <version>1.5</version>

</dependency>


<dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-sql_2.10</artifactId>

    <version>1.6.0</version>

</dependency>

我将scala 1.0.5与scala 2.10.5结合使用,并使用以下命令提交作业


spark-submit --jars /path/spark-csv_2.10-1.5.0.jar --class com.iris.Begin /path/CsvSolver.jar

我在.m2存储库中也有commons-csv / 1.1和commons-csv / 1.5。


有人可以帮我吗?


尚方宝剑之说
浏览 346回答 3
3回答

GCT1015

最好构建将包含所有依赖项的胖jar(spark-core应标记为provided),并仅提交此jar,而不带任何其他--jars选项。在Maven中,您可以通过使用具有预定义配置文件jar-with-dependencies的Maven Assembly插件来生成胖子jar 。就像是:<build>&nbsp; <plugins>&nbsp; &nbsp; <plugin>&nbsp; &nbsp; &nbsp; <artifactId>maven-assembly-plugin</artifactId>&nbsp; &nbsp; &nbsp; <version>3.1.0</version>&nbsp; &nbsp; &nbsp; <executions>&nbsp; &nbsp; &nbsp; &nbsp; <execution>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <phase>package</phase>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <goals>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <goal>single</goal>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </goals>&nbsp; &nbsp; &nbsp; &nbsp; </execution>&nbsp; &nbsp; &nbsp; </executions>&nbsp; &nbsp; &nbsp; <configuration>&nbsp; &nbsp; &nbsp; &nbsp; <descriptorRefs>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <descriptorRef>jar-with-dependencies</descriptorRef>&nbsp; &nbsp; &nbsp; &nbsp; </descriptorRefs>&nbsp; &nbsp; &nbsp; </configuration>&nbsp; &nbsp; </plugin>&nbsp; </plugins></build>
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java