向 DataFrame 添加一个新列，其文字值类型为 set

我自己找到了解决方案（在一些帮助下）：Map<File, Dataset<Row> allWords = ...StructField[] structFields = new StructField[] {        new StructField("word", DataTypes.StringType, false, Metadata.empty()),        new StructField("count", DataTypes.IntegerType, false, Metadata.empty()),        new StructField("files", DataTypes.createArrayType(DataTypes.IntegerType), true, Metadata.empty())};StructType structType = new StructType(structFields);Dataset<Row> allFilesWords = spark.createDataFrame(new ArrayList<>(), structType);for (Map.Entry<File, Dataset<Row>> entry : allWords.entrySet()) {    Integer fileIndex = files.indexOf(entry.getKey());    allFilesWords.unionAll(            allWords.get(entry.getKey())                    .withColumn("files", functions.typedLit(seq, MyTypeTags.SeqInteger()))    );}问题是这TypeTag是来自 Scala 的编译时工件，根据我在另一个问题中得到的内容，它需要由 Scala 编译器生成，而您无法在 Java 中生成一个。因此，我必须TypeTag在 Scala 文件中编写自定义数据结构并将其添加到我的 Maven Java 项目中。为此，我关注了这篇文章。这是我的MyTypeTags.scala文件：import scala.reflect.runtime.universe._object MyTypeTags {  val SeqInteger = typeTag[Seq[Integer]]}

向 DataFrame 添加一个新列，其文字值类型为 set

1回答