使用来自json数组的java spark sql将表保存在hive中

Dataset<Row> ds = spark.read().option("multiLine", true).option("mode", "PERMISSIVE").json("/user/administrador/prueba_diario.txt").toDF();

ds.printSchema();

Dataset<Row> ds2 = ds.select("articles").toDF();

ds2.printSchema();

spark.sql("drop table if exists table1");

ds2.write().saveAsTable("table1");

我有这个json格式

root

|-- articles: array (nullable = true)

| |-- element: struct (containsNull = true)

| | |-- author: string (nullable = true)

| | |-- content: string (nullable = true)

| | |-- description: string (nullable = true)

| | |-- publishedAt: string (nullable = true)

| | |-- source: struct (nullable = true)

| | | |-- id: string (nullable = true)

| | | |-- name: string (nullable = true)

| | |-- title: string (nullable = true)

| | |-- url: string (nullable = true)

| | |-- urlToImage: string (nullable = true)

|-- status: string (nullable = true)

|-- totalResults: long (nullable = true)

我想将数组文章保存为具有数组格式的 hive 表

我想要的蜂巢表示例：

author (string)

content (string)

description (string)

publishedat (string)

source (struct<id:string,name:string>)

title (string)

url (string)

urltoimage (string)

问题是只用一个名为 article 的列保存表，而竞争就在这个唯一的列中

侃侃无极

浏览 347回答 1